
Language is considered as one of the greatest
achievement of humankind. It helps us to communicate happiness,
sadness, fear, anger, dissatisfaction, intricate, complex ideas and deep
emotions, with very easily. There has been plenty of work being done to
integrate language into the field of artificial intelligence in the with the
help of Natural Language Processing (NLP).
Now a days there has been a huge data collection in
the form of texts, videos, audios, photos which are in an unstructured form. NLP
techniques help in extracting the valuable information from such unstructured data.
Let’s
explore 8 common NLP techniques used for extracting information from the text...
Named Entity Recognition :The most basic technique of NLP is extracting
entities i.e. words referenced in the data set which can be a person, or his
location, organization, dates, etc. NRE(Natural Language Recognition) is based on
grammar rules and mostly do entity
identification, entity chunking and entity extractionHere is an example where we have used spacy library to get the entities present the the sentence we pass.
Tokenization : Tokenization is splitting of the
sentences in the data set into a list of words or characters or numbers
basically tokens. It is the known as building block of NLP. Its important as it
reduces the overhead on the algorithms that will be applies thus reducing the
complexity by increasing the efficiency of the search as it can be used by
indexing and ranking and reducing the space required to storage the data. It
gives a clear separation of the tokens with the help of segment boundaries
which recognizes the punctuation and special case rules based on the language
we set by the language sub class. There are various tokenization techniques. There are various tokenization techniques which can be used based on the purpose and language we are using for
the modelling. To name a few these are Rule Bases, Dictionary based ,White
space Tokenization, Subworld Tokenization, Mouse Tokenization.
Baggage of Words :
Stemming and Lemmatization
Part of Speech Tagging
Sentimental Analysis
Natural Language Generator
Comments
Post a Comment