1. What is text mining ?
Answer: Text mining, also referred to as text data mining, roughly equivalent to text analytics is the process of deriving high-quality information from text. Whereas the high-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
2. What is Information Extraction ?
Answer: Information extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing.
3. What is object standardization ? When it will be used ?
Answer: Text data often contains words or phrases which are not present in any standard lexical dictionaries. These pieces are not recognized by search engines and models.
–
4. What is Topic Modeling ? When we will do it ?
Answer: Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. An example for topic model is Latent Dirichlet Allocation (LDA), is used to classify text in a document to a particular topic.
5. What is document-term matrix ?
Answer: It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. Is also called as term-document matrix.