Named Entity Recognition with NLTK and SpaCy using Python
What is Named Entity Recognition?
It is a term in Natural Language Processing that helps in identifying the organization, person, or any other object which indicates another object. As the name suggests it helps to recognize any entity like any company, money, name of a person, name of any monument, etc.
It classifies a text into predefined categories or real-world object entities. It takes a string of text usually sentence or paragraph as input and identifies relevant nouns such as people, places, and organizations that are mentioned in that string. It recognizes various entities as follows;
Applications of Named Entity Recognition:
- Improve Search Algorithms
- Classifying content
- Content Recommendation
- Simplifying Customer Support
In this article we will discuss the process of Name Entity Recognition with NLTK and SpaCy.
NLTK
Natural Language tool kit (NLTK) is a famous python library which is used in NLP. It is one of the leading platforms for working with human language and developing an application, services that can understand it.
First let’s start by installing the NLTK library. You can do it by using the following command.
pip install nltk
After installing the nltk library, let’s start by importing important libraries and their submodules.
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
Now, we tokenize the sentence by using the ‘word_tokenize()’ method. Tokenize the sentence means breaking the sentence into words.
Next, we tag each word with their respective part of speech by using the ‘pos_tag()’ method.
str= ''''Prime Minister Narendra Modi on Tuesday announced the 20 Lakh Crore package for the India to fight against the coronavirus pandemic.'''
words= word_tokenize(sent)
postags=pos_tag(words)
Then we use the ‘ne_chunk()’ method to recognize each named entity in the sentence.
ner = nltk.ne_chunk(postag,binary=False)
print(ner)
Output:
Below is the complete code:
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
str= ''''Prime Minister Narendra Modi on Tuesday announced the 20 Lakh Crore package for the India to fight against the coronavirus pandemic.'''
words= word_tokenize(sent)
postags=pos_tag(words)
ner = nltk.ne_chunk(postag,binary=False)
print(ner)
As you can see, Narendra Modi is chunked together and classified as a person. In this way the NLTK does the named entity recognition.
Now let’s try to understand name entity recognition using SpaCy.
SpaCy
Spacy is an open-source library for Natural Language Processing. It is considered as the fastest NLP framework in python. It provides a default model that can recognize a wide range of named or numerical entities, which include person, organization, language, event, etc.. It’s becoming popular for processing and analyzing data in NLP.
Let’s start by installing Spacy. You can do it by using the following command.
pip install spacy
SpaCy has different types of models. The default model for the English language is en_core_web_sm. We need to download models and data for the English language. It can be done by the following command
python -m spacy download en_core_web_sm
Now we are done with installing all the required modules, so we ready to go for our name entity recognition.
import spacy
nlp = spacy.load(‘en_core_web_sm’)
str= ''' Prime Minister Narendra Modi on Tuesday announced the 266 billion dollars package for the India to fight against the coronavirus pandemic. '''
doc = nlp(str)
for ent in doc.ents:
print(ent.text, ent.label_)
Output:
As you can see spacy has classified all the entities and it also looks quite better than the NLTK.
SpaCy also provides a method to plot this. It comes with built-in visualizer displaCy. You can use it to visualize named entity.
from spacy import displacy
displacy.render(doc, style=‘ent’, Jupyter=True)
Output:
Below is the complete code:
import spacy
nlp = spacy.load(‘en_core_web_sm’)
str= ''' Prime Minister Narendra Modi on Tuesday announced the 266 billion dollars package for the India to fight against the coronavirus pandemic. '''
doc = nlp(str)
for ent in doc.ents:
print(ent.text, ent.label_)
from spacy import displacy
displacy.render(doc, style=‘ent’, Jupyter=True)
So far in this article we have discussed Name Entity Recognition with NLTK and SpaCy. I hope you understand it.