The Effect of Urban Language on NLP
Sep 15, 2022by, Vaisali Sairam
How would a conversation with just ones and zeros look?
Something like this, right?
01001000 01100101 01111001 00100000 01100010 01110010 01101111 00100001
01010111 01100001 01110011 01110011 01110101 01110000 00100000 01101100
01101111 01101100 00100001 00001101 00001010.
As we all know, the conversion of this machine language or binary code to human language is carried out by the central processing system of the computer. Combinations of these 1’s and 0’s send various electrical signals to the transistors in the CPU. But all these are stories of the past! It has advanced to the point that the computer understands not just the language but also the sender’s emotions.
Ever wondered how chatbots, smart assistants, and other auto-responding tech work?
Natural Language Processing (NLP), the little kid of Artificial Intelligence (AI), has made it possible. Natural language processing helps computers communicate with humans in their language. It equips the computer to read text, hear speech, interpret it, measure sentiment, and determine which parts are important.
As kids, we were made to read passages over and over again to teach us the language. Humans also pick up on words and slang from their surroundings, but it is not possible to teach a computer in the same way. We should train the system at an advanced level to process, analyze large amounts of natural language data and get the same level of human-like response.
This involves a lot of steps and phases like data preprocessing, syntax and semantic analysis, word segmentation, lemmatization, POS Tagging, algorithm development, etc. Each step is important to understand the sentence and train the system accordingly to generate the right automated reply.
But a major drawback of NLP is the lack of its ability to understand semantics, urban language, and slang. Though the technology was able to solve this problem to an extent, cracking urban language and semantics completely over text remains a tricky question.
Internet slang like ROFL, LOL, etc. or the colloquial usage of certain words in totally unrelated situations confuses the system and makes it difficult to understand the context. And sometimes, the same words may have different meanings in different situations. All these impacts the working of NLP.
Let us check some of the situations where urban language affects NLP.
- The data input may contain slang and urban language uses of phrases in different situations. And sometimes, the data input the machine receives may have words that are offensive or biased in their literal meaning. It may result in creating a negative image or response. For example, a bias toward particular communities in the NLP response system is created due to racist, casteist, and negative comments that are inputted into the training data.
- The language phrases may also have different phases in a different context. For example, the word “jam” is a food (noun) and it also means “pack tightly” (verb) according to the situation. The word arrangements differ from sentence to sentence. So the POS Tagging system gets confused and may not produce the right response.
- The abbreviations like GM, HLO, TTYL, etc. are not easily understood and decoded by the system. NLP reads it as just random letters without any meaning. Thus, it doesn’t give the appropriate response to the user.
- The urban language will have a lot of double-meaning words and sarcasm that the system may not understand. The user may give sarcastic remarks that may mean positive in literal meaning but are negative or vice-versa.
- It interferes with semantic analysis of responses based on the textual data to help businesses monitor brand and product sentiment in customer responses and understand customer needs.
All these may result in ineffective training of the NLP, which in turn leads to negative results or feedback.
If you have a project in mind that includes advanced NLP techniques, contact us here.
Disclaimer: The opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Dexlock.