Demo video: Google N-Grams and HathiTrust Bookworm
Links to the tools:
Computational Text Analysis, Computer-aided Text Analysis, Text Mining, and the abbreviation TDM are broad terms for searching, organizing, and analyzing large amounts of text data.
TDM can help reveal new patterns or information from a large body of work - leading to the development of new knowledge, of a larger evidence-based practice. TDM enables researchers to analyze thousands of documents and terabytes of data, allowing for a comprehensive look into research questions.
The methods used to process corpora vary widely between disciplines, and are based on insights from machine learning, statistics, computational linguistics, sociology, and many other fields.
Examples where researchers used text analysis to answer their research question:
Much of the content of this page comes from the University of Pennsylvania's Text Analysis Guide by Jajwalya Karajgikar.
Common methods of text analysis include:
Sentiment Analysis: Sentiment analysis employs natural language processing techniques to identify and extract subjective information from text, such as opinions and emotions expressed in the textual data, and is commonly used to analyze social media posts, customer reviews, and other text data to determine the overall sentiment.
Text Classification: Text classification involves categorizing text data into predefined classes or categories based on the content of the text and is frequently used for tasks such as spam filtering, topic identification, and sentiment classification.
Topic Modeling: Topic modeling is a statistical method used to identify topics or themes that occur in a collection of documents, allowing hidden patterns and relationships within text data to be discovered. It is widely applied in fields such as social sciences and humanities.
Named Entity Recognition: Named Entity Recognition (NER) is the process of identifying and extracting named entities from text, such as names of people, places, and organizations. It is commonly used for information extraction, retrieval, and data analysis.
Text Clustering: Text clustering is the process of grouping similar documents together based on their content, which is frequently used to identify patterns and similarities in large text datasets, particularly in fields such as marketing and customer service.
Text Summarization: Text summarization involves creating a concise summary of a longer text document and can be used to quickly understand the main points and themes of a large document or set of documents.
Text Mining: Text mining involves extracting useful information from unstructured text data using techniques such as natural language processing, machine learning, and information retrieval to discover patterns, relationships, and trends in large text datasets.
Named Entity Disambiguation: Named Entity Disambiguation is the process of disambiguating named entities by distinguishing between entities with similar names or referring to the same real-world entity, thereby reducing ambiguity in text data.
Word Frequencies: Word frequency analysis involves counting the number of times each word appears in a text document or corpus to identify common words or phrases, which can provide insights into the content of the text data.
Visualization: Text visualization involves creating visual representations of text data, such as word clouds, topic models, and graphs, to identify patterns, trends, and relationships in the data and communicate insights to stakeholders in a clear and concise manner.
Elihu Burritt Library
Central Connecticut State University, 1615 Stanley Street,
New Britain, CT 06050 - Map
CCSU Home | Central Pipeline | CentralSearch / Catalog | Sign In to CentralSearch | Search Library Website