NLP Techniques For Information Extraction


Language is considered as one of the greatest achievement of humankind. It helps us to communicate happiness, sadness, fear, anger, dissatisfaction, intricate, complex ideas and deep emotions, with very easily. There has been plenty of work being done to integrate language into the field of artificial intelligence in the with the help of Natural Language Processing (NLP).

Now a days there has been a huge data collection in the form of texts, videos, audios, photos which are in an unstructured form. NLP techniques help in extracting the valuable information from such unstructured data.

Let’s explore 8 common NLP techniques used for extracting information from the text...

  • Named Entity Recognition :The most basic technique of NLP is extracting entities i.e. words referenced in the data set which can be a person, or his location, organization, dates, etc. NRE(Natural Language Recognition) is based on grammar rules and  mostly do entity identification, entity chunking and entity extraction
  • Here is an example where we have used spacy library to get the entities present the the sentence we pass.

  • TokenizationTokenization is splitting of the sentences in the data set into a list of words or characters or numbers basically tokens. It is the known as building block of NLP. Its important as it reduces the overhead on the algorithms that will be applies thus reducing the complexity by increasing the efficiency of the search as it can be used by indexing and ranking and reducing the space required to storage the data. It gives a clear separation of the tokens with the help of segment boundaries which recognizes the punctuation and special case rules based on the language we set by the language sub class. There are various tokenization techniques. There are various tokenization techniques which can be used based on the purpose and language we are using for the modelling. To name a few these are Rule Bases, Dictionary based ,White space Tokenization, Subworld Tokenization, Mouse Tokenization.
  • Baggage of Words : 

  • Stemming and Lemmatization

  • Part of Speech Tagging 

  • Sentimental Analysis

  • Natural Language Generator










  • Comments