Text Analitics with Jaseci | Jaseci Documentation

📄️ Preparation

1. Installing Jaseci

📄️ Map the movie data in to a Graph.

The one and only data structure used Jaseci is Graph. So first let's map the movie script in to a graph. Movie script is saved in a json file, and the basic structure of the movie script is as follows;

📄️ Advance information extraction from Scene Descriptions

We created a graph from the movie script data in the previous part. You may have noticed that some movie scene descriptions are rather lengthy and take a while to read. How simple it would be if we could summarize that lengthy phaseges and extract keywords from each text phrases. Yes, that is what we will accomplish in this section.

📄️ Find Semantically Similar Sentences

Semantic similarity of two sentences is a measure of how closely related their meanings are. It involves comparing the underlying semantic representations of the sentences to determine the degree of overlap or similarity between them. This is typically done using techniques from natural language processing (NLP), such as word embeddings or semantic networks. The resulting similarity score can be used in various applications, such as text classification, question answering, or information retrieval, to identify relevant and related content. A high semantic similarity score suggests that the two sentences convey similar ideas, while a low score indicates that they are dissimilar.

📄️ Find Similar clusters in a set of documents

Text clustering, also known as text grouping or document clustering, is a technique used in natural language processing (NLP) and machine learning to categorize large sets of unstructured textual data into meaningful groups or clusters. The goal of text clustering is to identify patterns and relationships within the text that can be used to group similar documents together based on their content, topics, or other features. This can help researchers, businesses, and organizations to better understand the underlying structure of their textual data and to identify important insights or trends that may be hidden within it. Text clustering is often used in applications such as document organization, information retrieval, and text summarization.

📄️ Analyze sentiments in Dialogues

Sentiment analysis is a process of analyzing text data to determine the emotional tone or attitude expressed in it. It involves using natural language processing and machine learning techniques to identify and extract subjective information from text, such as opinions, attitudes, and emotions. Sentiment analysis is commonly used in social media monitoring, customer feedback analysis, brand reputation management, and market research. It can help organizations gain insights into customer sentiment, identify emerging trends, and make data-driven decisions. Sentiment analysis can be performed at different levels, including document-level, sentence-level, and aspect-level analysis. The output of sentiment analysis is typically a numerical score or a categorical label that indicates the polarity of the text, such as positive, negative, or neutral.

📄️ Creating custom action to scrap movie data

This is a Python script that scrapes movie scripts from https://imsdb.com/ and extracts information about scenes and actors. It also stores the extracted information in a JSON file called movie_data.json.