/    /  NLP- Interview questions Part 4

1. What is tokenization ?

Answer: Splitting the sentence into words is called tokenizaation.

 

2. What are stop words ? 

Answer: a, the , an etc like repeated words in text, that doesn’t give any additional value to context. we can filter those words by using nltk library standard function.

 

3. What is Noise Removal ? 

Answer: Remove unwanted data from corpus. Like if you are working sentiment analysis, we have to remove ?”! etc.

 

4. What is Wordnet ? 

Answer: WordNet is a lexical database for the English language. It provides short definitions and usage examples, also groups English words into sets of synonyms called synsets, , and records a number of relations among these synonym sets or their members.

 

5. What is NLG (Natural language Generation) ?

Answer: It’s about generating new text from understanding old data.