For most advanced economies, healthcare accounts for a substantial part of the gross domestic product, often exceeding 10%. Being such a large segment, there are massive benefits to automating and optimizing these processes and systems, and that’s where NLP comes in.
Healthcare as an industry encompasses both medicines, equipment and services such as consultation or diagnostic testing for curative, preventive, palliative, and rehabilitative care.
Healthcare deals with large amounts of unstructured text, and NLP can be used in such places to improve health outcomes. Broad areas where NLP can help include but are not limited to analyzing medical records, billing, and ensuring drug safety. There are a range of applications built on health records and information. Medical information extraction (IE) helps to identify clinical syndromes, medical conditions, medication, dosage, strength, and common biomedical concepts from health records, radiology reports, and discharge summaries, as well as nursing documentation and medical education documents. Here are some applications:
Electronic Health Records
Increased adoption of storing clinical and healthcare data electronically has led to an explosion of medical data and overwhelmingly large personal records. With this increasing adoption and larger document size and history, it’s getting harder for doctors and clinical staff to access this data, leading to an information overload. This, in turn, leads to more errors, omissions, and delays and affects patient safety.
Clinical Decision Support Systems
Decision support systems assist medical workers in making healthcare-related decisions. These include screening, diagnosis, treatments, and monitoring. Various text data can be used as an input to these systems, including electronic health records, column-tabulated laboratory results, and operative notes. NLP is utilized on all of these to improve the decision support systems.
Health assistants and chatbots can improve the patient and caregiver experiences by using various aspects of expert systems and NLP. For instance, they can keep the spirits of patients suffering from mental illness and depression high. They also combines NLP with cognitive therapy to do this by asking relevant questions reinforcing positive thoughts. Similarly, assistants can assess patients’ symptoms to diagnose potential medical issues. Depending on the urgency and critical nature of the diagnoses, chatbots can book appointments with relevant doctors. Those systems can also be built based on the user’s specific needs by utilizing existing diagnostic frameworks.
Pharmacovigilance entails all activities that are needed to ensure that a drug is safe. This involves collection and detection and monitoring of adverse drug or medication reactions. A medical procedure or drug can have unintended or noxious effects, and monitoring and preventing these effects is essential to making sure the drug acts as intended. With increasing use of social media, more of such side effects are being mentioned in social media messages; monitoring and identifying these is part of the solution.
QUESTION ANSWERING FOR HEALTH
To take the user experience to the next level, we can consider building a question answering (QA) system on top of these records. To create such datasets of questions and answers and build a QA system on them, a general question-answering dataset creation framework consists of:
1. Collecting domain-specific questions and then normalizing them. For instance, a patient’s treatment can be asked about in multiple ways, like, “How was the problem managed?” or “What was done to correct the patient’s problem?” These all have to be normalized in the same logical form.
2. Question templates are mapped with expert domain knowledge and logical forms are assigned to them. The question template is an abstract question. For example, for a certain type of question, we expect a number or a medication type as a response. More concretely, a question template is: “What is the dosage of medication?”, which then maps to an exact question, like, “What is the dosage of Nitroglycerin?” This question is of a logical form that expects a dosage as response.
3. Existing annotations and the information collected in (1) and (2) are used to create a range of question-and-answer pairs. Here, already available information like NE tags as well as answer types linked to the logical form are used to bootstrap data. This step is especially relevant, as it reduces the manual effort needed in the creation of the QA dataset. More specifically for emrQA, this process involved polling physicians at the Veterans Administration to gather prototypical questions, which led to over 2,000 noisy templates that were normalized to around 600. More broadly, this is an interesting use case on how to build complex datasets using heuristics, mapping, and other simpler annotated datasets. These learnings can be applied to a range of other problems, beyond processing health records, that require generation of a QA-like dataset.