Can ChatGPT subvert medical AI?

Since 2023, the heated discussion around chatgpt has rekindled the market's interest in medical AI.

In the past, most of the artificial intelligence models in the medical and health field could only process single-modal data to solve relatively narrow medical problems, such as identifying black spots in chest CT, while chatgpt can accept multiple data types for training, making it like a Provide complete medical advice like a famous doctor.

However, there are different recognitions of the value of chatgpt in the market. Some believe that the large language model (llm) can subvert the reasoning logic of AI and optimize the algorithm for reasoning of medical images and medical texts. insufficient. In order to clarify whether chatgpt can reconstruct the global medical AI pattern,

explores the future development prospects of the industry. Arterial Network had a dialogue with many experts in the industry to try to answer the above questions one by one.

passed the approval, but could not enter the clinic

ibm watson’s exit once warned the entire life science field: when faced with a possible emerging technology, we cannot judge the subversiveness of the technology solely based on the “subjective impression” of the non-medical industry And usability, it is also necessary to consider practical issues such as how to cut into the diagnosis and treatment process, how to face review and approval, and how to realize commercialization after it is applied in the medical field. The review and approval of

is an important link to determine whether AI can exist in the market, and it is also a core hurdle that cannot be avoided when chatgpt enters the clinic. We might as well assume: If the chatgpt-based AI is to realize auxiliary diagnosis in the form of a medical device, what kind of approval path does it need to rely on? What medical device standards are referred to?

medtech dive once made comprehensive statistics on the AI ​​products approved by FDA. As of October 5, 2022, the FDA has authorized a total of 521 ai/ml medical device applications, most of which have gone through the 510(k) route, a small part have obtained PMA authorization, and only 18 devices have passed de novo approval process. After all, 510(k) simplifies the medical AI approval process, especially for many imaging equipment manufacturers, their AI applications may only work on a specific module, as long as the developers can prove that their equipment is compatible with the equipment already on the market" Essentially equivalent", there is no need to re-conduct clinical trials.

nmpa is relatively cautious about the authorization of ai/ml medical devices, and there is no fast track like 510(k) available. However, with the continuous improvement of the approval system, a large number of second- and third-class smart medical devices have emerged after 2018, especially in Keya Medical's "Deep Pulse Score" which has obtained the third-class certificate, and "deep learning" has been written into it for the first time. After the basic information of the registration certificate, the approval of medical artificial intelligence products ushered in explosive growth.

The number of artificial intelligence medical devices approved by nmpa and fda over the years (nmpa only counts the third type of medical devices)

Therefore, only talking about the approval path, both nmpa and fda are open to valuable ai technology. If an enterprise implants chatgpt-based AI into its own equipment, and it is "basically equivalent" to equipment already on the market, then it is likely to be successfully listed with the help of 510(k); the NMPA released in March 2022 The "Guiding Principles for the Registration Review of Artificial Intelligence Medical Devices" expands the scope of the approval of artificial intelligence core algorithms. If lls can prove its value, it is also possible to use the existing framework to enter the approval process.

will talk about the possible application scenarios of chatgpt. The composition of NMPA and FDA in terms of approval items is roughly similar. As of October 5, 2022, among the 521 ai/ml medical device applications authorized by FDA, more than 75% are auxiliary diagnostic products and 13% are auxiliary treatment products; NMPA Of the 70 authorized ai/ml medical device applications, more than 71% are auxiliary diagnostic products and 24% are auxiliary therapeutic products.

Auxiliary diagnostic products and adjuvant treatment products strictly rely on clinical evidence, requiring algorithms to reproduce the given results and provide corresponding evidence. In contrast to the current application of the chatgpt model, it can give a definite output result based on keyword input, but multiple inputs of the same keyword are not consistent. In other words, when the input information is too complex and the pursuit of precision, chatgpt cannot accurately reproduce the answers given, so it is difficult to be used in the above two fields.

The new generation of clinical assistant decision-making system (cdss) is one of the tracks that chatgpt is most likely to disrupt. The new generation of cdss relies on the help of nlp and can only process text information. In contrast, the LLM supporting ChatGPT not only includes NLP, but also many other systems, giving it the ability to integrate electronic medical records, images, inspection data, genomes, and even microbiome sequence information. After sorting out the AI ​​projects approved by the FDA from 2020 to 2022,

artery.com found that although the auxiliary diagnosis and auxiliary treatment AI still occupy the mainstream position, the number of cdss products that have passed the review has increased significantly compared with before 2020 (domestic cdss Generally, it is not necessary to go through NMPA review and approval, only Senyi Intelligent's vte risk assessment software medical device has obtained the second type of medical device certification).

2020-2022, FDA approves ai medical devices (partially)

For the entire medical system, the supervision function brought by ai and the empowerment of primary medical care can effectively improve the efficiency of disease prevention. By promoting early treatment of diseases, long-term From this point of view, chatgpt-based applications may have the potential to land.

Who will endorse chatgpt's decision?

Researchers from the American start-up company Ansible Health published the research results in the journal "Plos Digital Health". ChatGPT was able to achieve "approximately a 60% passing threshold" in the licensure exam. Another study used 45 cases to evaluate chatgpt's performance in diagnosing diseases. The experimental results found that chatgpt was able to find the correct diagnosis in 39 cases (87% accuracy), which was much higher than previous symptom detection tools. It is higher than the discrimination ability of the old version of chatgpt (82%). Therefore, in the opinion of many experts, cdss is an effective landing path for chatgpt. With the support of

data, chatgpt can obviously serve as an effective clinical auxiliary decision-making tool, but to truly implement it in the clinic, AI needs to do more than just a ratio.

"Whether it is Baidu or Google, when you ask it a question, it will give you a large number of web pages as answers, allowing you to screen and filter by yourself, but chatgpt is different, it is like an evolved version of the search engine, it will give you a unique The answer." Wang Shi, CTO of Huimei Technology, told Arterial.com. "This is its advantage, but also its hidden danger." The CDSS currently used by

Hospital is mainly composed of three core departments: human-computer interaction, reasoning machine, and knowledge base. The machine uses nlp to understand the doctor's input. The process deals with the interaction problem, and does not involve the real decision-making of replacing the doctor with AI. This is not because AI cannot surpass the doctor in some specific scenarios, but that AI cannot respond to any possible situation. Responsible for mistakes.

Wang Shi said: "We are experiencing the development of smart medical care. Especially from 2018 to 2020, the National Health and Medical Commission has successively introduced policies such as electronic medical record rating, interconnection rating, and smart hospital rating. Medical institutions are transforming and upgrading to digital in an all-round way, and many emerging technologies are also used in this process. Among them, CDSS, as one of the core projects of high-level review, also makes strict regulations on the construction mechanism of CDSS, that is, it must be based on Evidence-based medical evidence.

Therefore, the tips and suggestions of cdss are based on the premise of conforming to the standard of diagnosis and treatment, comprehensive guideline reference, and assisting doctors in decision-making. In contrast, chatgtp may give a better answer to some questions answer, but it cannot endorse its own answers and cite materials, cannot be responsible for its own possible mistakes, and no doctor is willing to pay for the mistakes of the algorithm."

This is a fatal problem for the implementation of chatgpt technology test. Similar to IBM Watson back then, chatgdp's subversion lies in its ability to make decisions like doctors, while doctors hope that AI will do a good job of information processing and hold decision-making power by themselves.

Cost is the key to restricting chatgpt

From the perspective of the development path of cnn and nlp, technology developers can always make adjustments in the applicationTechnology is chosen to make the final product meet the needs of the market. If we want to develop medical applications around llm technology, it is inevitable to achieve results. It’s just that for developers, not every start-up company can invest a lot of money in model training like open.ai. According to the public data of

, the LLM model gpt-3 launched by openai in the past has 175 billion parameters, and the corresponding training cost is as high as 12 million U.S. dollars (about 1.4 million U.S. dollars per time). There are different opinions on the training cost of chatgpt, but It can be roughly speculated that it is within the range of US$2 million-US$12 million.

For subdivided vertical tracks such as medical care that need to build similar models, they must first have a basic model at the gpt level. Then it takes a lot of time, energy, and money to perform long-term and continuous calculation and data training on the basic model to create a new model. To achieve the above conditions, only bat-level enterprises in China have the capital to get involved.

At the same time, under the high training costs, even large companies cannot make clear adjustments to the models that have been trained. If a model of the size of chatgpt goes astray in the exploration of the medical field, relevant researchers think To continue to tap the potential of LLM, we may have to wait for the next model to appear.

Under various influencing factors, the value of chatgpt and even other llms in medical clinics may be quite limited. Just discussing the current situation, the scenes of search-related science popularization and Internet hospitals are obviously more potential. Breaking away from the clinic, the uniqueness of chatgpt may be able to open up new growth space for the above scenarios.

In general, the discussion about the clinical application of chatgpt may be somewhat disappointing. chatgpt is not entirely born for medical treatment, and it is difficult for chatgpt-based AI to go deep into clinical links like the auxiliary diagnosis and auxiliary treatment AI that has been polished for many years.

But in the long run, LLM still has the ability to subvert the existing AI. If it can build comprehensive analysis capabilities across multi-modal medical data such as electronic medical records, images, and genomes, it will surely be able to break the current situation facing AI and redefine the value of AI.

*Cover image source: 123rf

arterial network, future medical service platform