Role of Natural Language Processing in Healthcare

Aug 18, 2022by, Melvin Jacob Joseph


NLP or Natural Language Processing is the branch of Artificial Intelligence that helps machines to comprehend meaning from human languages.  The use of NLP or Natural Language processing technologies in our life is increasing day by day. Everybody must have used the virtual Assistants on their smartphones at least once. Siri and Alexa are the two leading virtual assistants we may have come across. The primary technology used here is Natural Language Processing and speech recognition. Even a plagiarism checker uses NLP to detect the hidden similarities between sentences.  So, in this blog, I would like to discuss why Natural Language Processing has so much demand and importance in health care and how it is used in the health realm?

In this era, more and more health care systems are migrating to EMR systems. What is  EMR or Electronic Medical Record?. EMRs are the digital records of patients in a standardized format. It may contain details like diagnosis, test results, allergies, etc of the patient. EMRs are replacing the traditional method of keeping large piles of paper-based documents.

As shown in the figure the hospital may have a local system that contains clients that connect to the EMR system, using this client they may upload and retrieve documents as per requirements. These records may be shared within the organization or with other health care providers so that the patient is delivered with the best care. Also, EMRs help in faster lookup of patient data thereby saving time and as this is stored online, the risk of losing data is less.  And of course, these are sensitive data that is to be kept securely and should be shared with authorized personnel. But these systems are taking up so much time in entering, extracting, and structuring data. Many reports have shown that physicians are exhausted and are feeling extra workload while using EMRs. If we are able to save this time, it means that doctors can spend more time with patients. This is where Natural Language Processing takes the role.

NLP can help tremendously reduce this time and redundant work done by the health care providers.  NLP can be used to synthesize text from human speech and can extract the data points from this text to convert it into a structured format. Also, EMRs have a large set of existing unstructured data and it is hard for computers to extract the essential information without NLP. NLP can be used along with machine learning algorithms to discover diseases.

Now let us discuss how NLP is solving the problem. There are various steps involved in Natural language processing. This involves tokenization, Lemmatization, POS tags, co-referencing resolution, Named Entity Recognition.

Tokenization: It is the process of splitting a given sentence into individual words.

POS tagging (Part of Speech): It is done to map each of the words in a sentence to a noun, adjective, qualifier, verb, punctuation, etc.

Lemmatization: Individual tokens in a sentence is the process of identifying the root word. For example, Words like medicines, medication, medicated can be mapped to the word medicine.

Named Entity Recognition: This is done to identify key entities in a text. Consider the sentence: George is infected with Hepatitis.  Here George can be identified as a person and Hepatitis can be identified as a disease.

Coreference resolution: It is the process where we identify all the expressions that are referring to the same entity. Consider the following two sentences “George has contracted with Covid. He is also having a previous history of fits.” Here in the second sentence, the system needs to know “he” is again referring to the entity George”.

Various libraries are available in the market to integrate NLP with various programming languages. So that we can avoid building the core algorithms from scratch. (Natural language tool kit )NLTK, spaCy, Genism are some of the widely used python NLP libraries. OpenNLP, lsa, koRpus, etc are some of the libraries that are used with the R programming language. Apache OpenNLP, Standford CoreNLP are examples of some libraries used with java.

NLP can also be used with a knowledge graph so that a larger and much variety of data sets can be analyzed and processed at scale. The graph contains nodes and edges. Nodes represent the entities and edges represent the relationships between entities. 

The below figure shows a graph with some nodes and edges: 

Graph DB can be used to create meaningful associations between entities. This can be used to generate meaningful inferences from the existing nodes in the graph. As the graph becomes bigger, we will be able to achieve more accurate inferences.  

Here in Dexlock, we have built an application that records the doctor’s consultation, extracts relevant data, and writes the necessary data to the EMR system with the support of an AI engine. It uses the google text-to-speech engine tuned specially for medical recordings for speech recognition. The text is converted to meaningful templates and given to our own built AI engine to extract the data points and fill in appropriate fields. The inferred text is again gone through multiple reviews so that any possible fault that arises through the conversion process can be corrected. The inference engine is built in java using Stanford NLP parser with graph DB. The application saves doctors time effectively by providing an easy-to-use interface to record the patient’s dictations. These dictations are uploaded to the transcription panel where the inferred texts are confirmed and uploaded to EMRs. All the data is handled with proper encryption methods. So that data is protected at rest as well as in transit. Also, the application is in compliance with HIPAA policies.

Modern-day problems that arise with the progress of technologies require out-of-the-box solutions. Being a technologically fast-moving data-driven company, we help you build scalable and progressive solutions in the health domain. For more information, contact us here

Disclaimer: The opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Dexlock.