Parts of Speech (POS) Tagging with NLTK and SpaCy Using Python

May 25, 2020

Parts of Speech (POS) Tagging with NLTK and SpaCy Using Python

What is Parts of Speech Tagging?

Part of Speech Tagging is the process of marking each word in the sentence to its corresponding part of speech tag, based on its context and definition. POS has various tags that are given to the words token as it distinguishes the sense of the word which is helpful in the text realization.

It takes a string of text usually sentence or paragraph as input and identifies relevant parts of speech such as verb, adjective, pronoun, etc. that are mentioned in that string. Here’s the list of the some of the tags :

PRP personal pronoun I, he, she
PRP possessive pronoun my, his, hers
RB adverb very, silently,
VB verb, base form take
WP$ possessive wh-pronoun whose
FW foreign word
JJ adjective ‘small’
NN noun, singular ‘desk’
DT determiner
WDT wh-determiner which
IN preposition/subordinating conjunction

Applications of POS tagging :

Sentiment Analysis
Text to Speech (TTS) applications
Linguistic research for corpora

In this article we will discuss the process of Parts of Speech tagging with NLTK and SpaCy.

SpaCy

Spacy is an open-source library for Natural Language Processing. It is considered as the fastest NLP framework in python. It provides a default model that can classify words into their respective part of speech such as nouns, verbs, adverb, etc. It’s becoming popular for processing and analyzing data in NLP.

Let’s start by installing Spacy. You can do it by using the following command.

pip install spacy

SpaCy has different types of models. The default model for the English language is en_core_web_sm. We need to download models and data for the English language. It can be done by the following command

python -m spacy download en_core_web_sm

Now we are done with installing all the required modules, so we ready to go for our Parts of Speech Tagging.

import spacy
nlp = spacy.load(‘en_core_web_sm’)
 
str= ''' My name is Tony Stark and I am Iron Man. '''
 
doc = nlp(str)
for ent in doc:
      print(ent, ent.pos_)

Output :

As you can see spacy has marked all the words with its respective part of speech.

SpaCy also provides a method to plot this. It comes with built-in visualizer displaCy. You can use it to visualize POS.

from spacy import displacy 
displacy.render(doc,style="dep" ,jupyter=True, options = {'distance' : 100})

Output :

Below is the complete code:

import spacy
nlp = spacy.load(‘en_core_web_sm’)
 
str= ''' My name is Tony Stark and I am Iron Man. '''
 
doc = nlp(str)
for ent in doc:
      print(ent, ent.pos_)

from spacy import displacy 
displacy.render(doc,style="dep" ,jupyter=True, options = {'distance' : 100})

Now let’s try to understand Parts of speech tagging using NLTK.

NLTK

Natural Language tool kit (NLTK) is a famous python library which is used in NLP. It is one of the leading platforms for working with human language and developing an application, services that can understand it.

First let’s start by installing the NLTK library. You can do it by using the following command.

pip install nltk

After installing the nltk library, let’s start by importing important libraries and their submodules

import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

Now, we tokenize the sentence by using the ‘word_tokenize()’ method. Tokenize the sentence means breaking the sentence into words.

Next, we tag each word with their respective part of speech by using the ‘pos_tag()’ method.

str= '''My name is Tony Stark and I am Iron Man. '''
words = word_tokenize(str)
postag = pos_tag(words)
print(postag)

Output :

Below is the complete code:

import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
 
str= '''My name is Tony Stark and I am Iron Man. '''
words = word_tokenize(str)
postag = pos_tag(words)
print(postag)

So far we have learned parts of speech tagging in this article. I hope you will understand it.

Tag: Parts of Speech (POS) Tagging with NLTK and SpaCy Using Python