1. What is text mining ?
Answer: Text mining, also referred to as text data mining, roughly equivalent to text analytics is the process of deriving high-quality information from text. Whereas the high-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
2. What is Information Extraction ?
Answer: Information extraction is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing.
3. What is object standardization ? When it will be used ?
Answer: Text data often contains words or phrases which are not present in any standard lexical dictionaries. These pieces are not recognized by search engines and models.
4. What is Topic Modeling ? When we will do it ?
Answer: Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. An example for topic model is Latent Dirichlet Allocation (LDA), is used to classify text in a document to a particular topic.
5. What is document-term matrix ?
Answer: It is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. Is also called as term-document matrix.