Resume Filtering using NLP

Suppose you own a company and luckily you bagged a project for which two data scientists are required. Job postings are done in LinkedIn where 400 resumes were received. In such scenario it is a hectic task to choose the appropriate resume that fulfill your need Then you might think of ”Is there a way to select the best out of them without manually checking one-by-one ”, for such purposes these latest technologies comes very handy.  NATURAL LANGUAGE PROCESSING (NLP) provide you the functionality that can manipulate the text


Problem statement:

Your company need a candidate with Deep Learning as her/his core competency and also know how Machine Learning algorithms work. The other person was required to have experience in Scala, AWS, Dockers, Kubernetes etc.

Resume Filtering using NLP 1 (i2tutorials)


Approach to solve this:

After identifying the problem statement you have to device an approach that can correctly address the problem as follows

1. Maintain table or dictionary which comprises of various skill sets categorized i.e. if we encounter some words like CNN, RNN, tensorflow, keras then segregate them under one column titled ‘Deep Learning’.

2. Build an NLP algorithm that scans the entire resume and searches for the words mentioned in the table or dictionary

3. Then count the occurrence of the words which belongs to various category i.e. something like below for each and every candidate.


Resume Filtering using NLP 2 (i2tutorials)


Then getting into the real task of developing the approach in programatically . we need many pre-built libraries that are available in python like Spacy(NLP related manipulations) and PYPDF (Reading the Resume)

Now that we have the whole code, I would like emphasize on two things.


1. The Keywords csv

The keywords csv can be identified in the code as ‘template_new.csv’

And if You want you can replace it with any DB of your choice (and make required changes in the code) but it’s good to be in csv format for simplicity. below is the table of words used to do the phrase matching against the resumes.

Resume Filtering using NLP 3 (i2tutorials)


2. The Candidate — Keywords table


Above code snippet produces a csv file which shows the candidates’ keyword category counts. Here is how it looks.

Resume Filtering using NLP 4 (i2tutorials)


Below  data visualization through Matplotlib.

Resume Filtering using NLP 5 (i2tutorials)


‘DE’ stands for Data Engineering, and others are self explanatory

1. Automatic reading of resume

Instead of manually scrutinize one-by-one, The program automatically opens the resumes and parses the content. If this were to be done manually it would take a lot of time.


2. Phrase matching and categorization

It would be very difficult to manually check all the resumes,  say whether a person has expertise in Data engineering or Machine learning because we are not keeping a count of the phrases while reading. The code on the other hand just hunts for the keywords ,keeps a tab on the occurrence and the categorizes them.


3. Data Visualization

The Data Visualization is a key aspect as It speeds up the decision making process in the following ways

a. We get to know which candidate has more keywords under a particular category, there by letting us infer that she/he might have extensive experience in that apropriate category

b. We can do a relative comparison of candidates with respect to each other, there by helping us filter out the candidates that don’t meet our requirement.