Installing Jaseci

Jaseci can be installed on a single machine or on a Kubernetes cluster.

The setup section is split into two parts:

  • a standalone local setup
  • a cloud and kubernetes setup (Coming Soon)

Setup (Local)

We've built a command line tool to help you effectively work with Jaseci from your terminal. This tool gives you complete control over jaseci and makes working with instances even better. Let's get started!

Requirements

  1. Python 3
  2. pip3

Installation (for Users of Jaseci and Jac coders)

Generally, installing Jaseci requires the following commands:

  1. Install Jaseci by running: pip3 install jaseci
  2. Install Jaseci Server by running: pip3 install jaseci-serv
  3. (for AI) Install Jaseci Kit by running: pip3 install jaseci-ai-kit

Here are step-by-step guides on getting Jaseci Installed on different platforms:

Installing on Windows

To run commands for Jaseci we need a terminal that accepts bash arguments. We recommend using the Ubuntu terminal that comes as the default with WSL.

  1. Check if WSL is installed by running the following the Windows powershell terminal :

 python  wsl -l -v    This will return  the flavour of the distribution used for WSL. The version column will show the version of WSL.

  1. If no version is specified open windows powershell in  adminstrator mode and install WSL by running :
  wsl --install
  1. Restart your Computer

4.Open the Ubuntu terminal. for more information on installation see here.

Install Python and Pip packet Manager

  1. Check version of Python and Pip by running :
python3 --version
pip3 --version

If these packages are installed they will return a version number. Move to step 7 if a version number is present.

6.Install Python3 and pip3 by running the following:

sudo apt update
sudo apt install python3-dev python3-pip

7.Once the Python and pip packages are installed. Now to install Jaseci and Jaseci Kit

pip install jaseci
pip install jaseci-ai-kit
  1. To ensure our installation is working run :
jsctl

The Jsctl terminal will be activated. It will look like this :

>jsctl

Installing on Mac

Install Python and Pip packet Manager

  1. Check the version of Python and Pip by running :
python3 --version
pip3 --version

If these packages are installed they will return a version number. Move to step 3 if a version number is present.

  1. Install Python3 and pip3 by running the following:
brew update
brew install python
  1. Once the Python and pip packages are installed. Now to install Jaseci and Jaseci Kit
pip install jaseci
pip install jaseci-ai-kit
  1. To ensure our installation is working run :
jsctl

Once it shows a list of options and commands, you're installation is complete.

Installing on Linux

Install Python and Pip packet Manager

  1. Check the version of Python and Pip by running :
python3 --version
pip3 --version

If these packages are installed they will return a version number. Move to step x if a version number is present.

  1. Install Python3 and pip3 by running the following:
sudo apt update
sudo apt install python3-dev python3-pip
  1. Once the Python and pip packages are installed. Now to install Jaseci and Jaseci Kit
pip install jaseci
pip install jaseci-kit
  1. To ensure our installation is working run :
jsctl

Once it shows a list of options and commands, your installation is complete.

Installation (for Contributors of Jaseci)

  1. Install black: pip3 install black
  2. Install pre-commit: pip3 install pre-commit; pre-commit install
  3. Install Jaseci from main branch: cd jaseci_core; source install.sh; cd -
  4. Install Jaseci Server from main branch: cd jaseci_serv; source install.sh; cd -
  5. (for AI) Install Jaseci Kit from main branch: cd jaseci_ai_kit; source install.sh; cd -

Note: You'll have to add --max-line-length=88 --extend-ignore=E203 args to flake8 for linting. If you use VSCode, you should update it there too.

Quickstart (Hello, World!)

In this section, we'll take a look at how easy it is to get started with a simple Hello World program in Jac. Ensure that jaseci is installed before proceeding with the quickstart guide.

  1. Create a new directory for your project called hello_jac
  2. Create a file called hello.jac within the directory created in the previous step.
  3. Give hello.jac the following contents:
walker init {
    std.out("Hello, World!");
}
  1. Navigate to the hello_jac directory using your terminal.
  2. Run the program with the following command: jsctl jac run hello.jac
  3. This should print the following string to the console: Hello World

Let's get into the details. walker is the keyword used to define a Walker. init is a reserved word; in this case, it provides the entry point to the jac program. Therefore, once the compiler encounters walker init {} it starts the program execution. Line 2 contains the std library for standard operation and out() is a module for printing. Simply put, the command std.out() displays the string Hello World as an output in your terminal.

Workflow

In the quickstart section, the run command was used to prompt the execution of the hello.jac program. We only recommend doing this for very small programs. Using the run command on larger programs may increase run time.

How the run command works

When you use the run command on a .jac file, the program is sent to Jaseci to be compiled (larger programs take longer to compile). After compilation, Jaseci then runs the program.

Using the build command

To ensure that our Jac programs runs fast, it's recommended that you build programs first using the jac build command and then run them.

To build the hello.jac program from the Quickstart section, run the following command:

jsctl jac build hello.jac

Running the above command will build the program and output a file called hello.jir. This is our compiled jac program.

You can then run the compiled hello.jac program with:

jsctl jac run hello.jir

You'll notice how fast it runs. Instantly!

The Jaseci shell

You hate the idea of typing jsctl every time you want to do something... There is a shell for this. Let's learn more.

  1. To access the shell, type jsctl in your terminal and hit Enter.

You'll get the following output:

Starting Jaseci Shell...
jaseci >

If you're still in the hello_jac directory, try building or running the hello.jac program, this time without typing jsctl in front of the commands:

run: jac run hello.jac

build: jac build hello.jac

Getting help

  1. To see a list of commands you can run with the jaseci shell, type help and press Enter. You'll see the following output:
jaseci > help

Documented commands (type help <topic>):
========================================
actions    clear   global  logger  master  sentinel  walker
alias      config  graph   login   object  stripe
architype  edit    jac     ls      reset   tool

Undocumented commands:
======================
exit  help  quit

To get help on a particular command, type: help NAME_OF_COMMAND

For example, to see all the commands for jac, type: help jac

You should see an output like:

Usage: jac [OPTIONS] COMMAND [ARGS]...

Group of `jac` commands

Options:
--help  Show this message and exit.

Commands:
build  Command line tooling for building executable jac ir
run    Command line tooling for running all test in both .jac code files...
test   Command line tooling for running all test in both .jac code files...

Visualizing a graph

As you get to know Jaseci and Jac, you'll want to try things and tinker a bit. In this section, we'll get to know how jsctl can be used as the main platform for this play. A typical flow will involve jumping into shell-mode, writing some code, running that code to observe its output, visualizing the state of the graph, and rendering that graph in dot to see its visualization.

Installing Graphiz

Graphiz is a software package that comes with a tool called dot. Dot is a standardized and open graph description language that is a key primitive of Graphiz. The dot tool takes in dot code and renders it nicely.

Windows (WSL)

Run the following command to install Graphiz on Linux:

sudo apt install graphiz

MacOS

Run the following command to install Graphiz on MacOS:

brew install graphiz

That's it!

Using Graphiz

Now that we have Graphiz installed, let's use it.

  1. In the hello_jac directory that you created earlier, create a file called fam.jac and give it the following content:
node man;
node woman;

edge mom;
edge dad;
edge married;

walker create_fam {
    root {
        spawn here --> node::man;
        spawn here --> node::woman;
        --> node::man <-[married]-> --> node::woman;
        take -->;
    }
    woman {
        son = spawn here <-[mom]- node::man;
        son -[dad]-> <-[married]->;
    }
    man {
        std.out("I didn't do any of the hard work.");
    }
}

Don't worry if that looks confusing. As you learn the Jac language, this will become clearer.

  1. Let's "register" a sentinel based on our Jac program. A sentinel is the abstraction Jaseci uses to encapsulate compiled walkers and architype nodes and edges. You can think of registering a sentinel as compiling your jac program. The walkers of a given sentinel can then be invoked and run on arbitrary nodes of any graph. Let's register fam.jac

  2. Open the jaseci shell by typing jsctl

  3. Run the following command to register a sentinel: sentinel register -name fam -code fam.jac -set_active true

You should see the following output:

jaseci > sentinel register -name fam -code fam.jac -set_active true
2022-03-21 13:56:29,443 - INFO - compile_jac: fam: Processing Jac code...
2022-03-21 13:56:29,558 - INFO - register: fam: Successfully registered code
[
{
    "version": null,
    "name": "fam",
    "kind": "generic",
    "jid": "urn:uuid:04385141-7d65-4467-bf51-d251bb9e5a84",
    "j_timestamp": "2022-03-21T17:56:29.443318",
    "j_type": "sentinel"
},
{
    "context": {},
    "anchor": null,
    "name": "root",
    "kind": "generic",
    "jid": "urn:uuid:9df56101-f831-4791-8326-ca6657b4b23c",
    "j_timestamp": "2022-03-21T17:56:29.443427",
    "j_type": "graph"
}
]

This output shows that the sentinel was created. Note that we've also made this the "active" sentinel that will be used as the default setting for any calls to the Jaseci Core APIs that require a sentinel be specified.

At this point, Jasei has registered our code and we are ready to run walkers!

  1. Now let's run our walker on the root node of the graph we created and see what happens!

Run the following command:

walker run -name create_fam

You should see the following output:

walker run -name create_fam
I didn't do any of the hard work.
[]

But how do we visualize that the graph produced by the program is right? If you've guessed it, we can use the Jaseci dot feature to take a look at our graph!!

Run the following command:

graph get -mode dot -o fam.dot

You should see the following output:

jaseci > graph get -mode dot -o fam.dot
strict digraph root {
    "n0" [ id="9df56101f83147918326ca6657b4b23c", label="n0:root"  ]
    "n1" [ id="011d88ae58744e5a87ca27fd6875ce3e", label="n1:man"  ]
    "n2" [ id="2099b359f4024a94bc167dead2b8d15d", label="n2:woman"  ]
    "n3" [ id="efa326feadc94b2fad2399e787907885", label="n3:man"  ]
    "n0" -> "n1" [ id="10b075a1b3714ff986f9cbb37160f601", label="e0" ]
    "n1" -> "n2" [ id="a7bae4f6c8ae4a3496cd8f942bb40aa8", label="e1:married", dir="both" ]
    "n3" -> "n1" [ id="35a76964f7144e9aba04200368cdab29", label="e2:dad" ]
    "n3" -> "n2" [ id="285d4f89f6144b2ca208807d8471fa54", label="e3:mom" ]
    "n0" -> "n2" [ id="4caffc3f14884965b48d64a005d57427", label="e4" ]
}
[saved to fam.dot]

To see a pretty visual of the graph itself, we can use the dot command from Graphiz. Exit the shell by typing exit and then run the following command:

dot -Tpdf fam.dot -o fam.pdf

A new file called fam.pdf will now appear in the hello_jac directory. Open this file to see your graph!

References

  • Official Documentation: https://docs.jaseci.org/
  • Jaseci Bible: https://github.com/Jaseci-Labs/jaseci_bible

Setting up your Code Editor

Visual Studio code is a popular IDE used by developers of all operating systems. This IDE comes with a Jac extension to aid in your coding Journey.

  1. If you already have VS code installed move to step 3. Download Visual Studio code from their website here

  2. Once download is completed . Open and follow the installation instructions.

  3. Once Installation is completed open VS code and install the JAC extension.

  4. Go to View > Command Palette . Type Install and select extensions.

  5. search JAC and select it then install.

Writing your first app

Let's create a simple conversational Agent using Jaseci and Jaseci Kit. We're gonna create a Chatbot for students to sign up for Jaseci Dojo !

Before we begin ensure you have Jaseci and Jaseci Kit installed. If not, see the Installation here

Create a file called graph.jac. Here we are going to create the conversational flow for the chatbot .


# state is the name of the node
node state {
    has title;
    has message;
    has prompts;
}

Nodes can be thought of as the representation of an entity. Nodes are the fundamental unit of  a graph. These can be considered to be the steps in which the Walker can take.

  • Nodes are composed of Context and excutable actions.
  • Nodes execute a set of actions upon entry and exit.  Here we are creating a node of name "state" The has keyword is used to declare a variable for the node.

# state is the name of this node
node state {
    has title;
    has message;
    has prompts;
}

# transition is the name of this edge
edge transition {
    has intent;
}

Edges are the link between nodes. They walker will use these edges to determine the next node to traverse to. The has key word is used to declare the variable "intent". This "intent" is what the Walker will use to to determine which node to go to next.


# state is the name of this node
node state {
    has title;
    has message;
    has prompts;
}
# transition is the name of this edge
edge transition {
    has intent;
}

# main_graph is name of the graph
graph main_graph {

    has anchor main_root

The graph is a collection of initialized nodes. The has anchor key word is used to identify the root node. The Root node is the node where the walker's traversal begins. The has anchor key word is used to state the root node. The Root node is the node where the walker's traversal begins.

# state is the name of this node
node state {
    has title;
    has message;
    has prompts;
}

edge transition {
    has intent;
}
graph main_graph {

    has anchor main_root

spawn {
     # this is the first node in the graph.
     main_root = spawn node::state(
        title = "Welcome",
        message = "Welcome to Jaseci Dojo, how can i help?",
        prompts = ["class","times","prices","quit"]
    );


    # this creates a node that goes from main_root to class.
    prices = spawn main_root -[transition(intent="prices")] -> node::state(
        title = "prices",
        message = "Prices Vary based on age",
        prompts = ["12 and younger", "18 and younger" ,"Older than 18", "quit"]
    );

    # this creates a node from the prices node to here.
     prices_12 = spawn prices -[transition(intent="12 and younger")] -> node::state(
        title = "prices<12",
        message = "Childer under 12 pay $100 per month",
        prompts = ["more prices", "quit"]
    );

    # this create an edge from prices_12 back to prices.
     prices_12 -[transition(intent="more prices")] -> prices;


}

spawn is used to create to create child nodes, which is used to design flow of the conversational experience. We are able to create additional edges to connnect nodes which which do not share a parent -child relationship. This is shown in the last line.

node state {
    has title;
    has message;
    has prompts;
}


edge transition {
    has intent;
}

graph main_graph {
    has anchor main_root;

    spawn {

        main_root = spawn node::state(
        title = "Welcome",
        message = "Welcome to Jaseci Dojo, how can i help?",
        prompts = ["class","times","prices","quit"]
    );

    prices = spawn main_root -[transition(intent="prices")] -> node::state(
        title = "prices",
        message = "Prices Vary based on age",
        prompts = ["12 and younger", "18 and younger" ,"Older than 18", "quit"]
    );

    prices_12 = spawn prices -[transition(intent="12 and younger")] -> node::state(
        title = "prices<12",
        message = "Childer under 12 pay $100 per month",
        prompts = ["more prices", "quit"]
    );
     prices_12 -[transition(intent="more prices")] -> prices;

     prices_18 = spawn prices -[transition(intent="18 and younger")] -> node::state(
        title = "prices<18",
        message = "Childer under 18 pay $110 per month",
        prompts = ["more prices", "quit"]
    );

    prices_18 -[transition(intent="more prices")] -> prices;

     pricesabove18 = spawn prices -[transition(intent="Older than 18")] -> node::state(
        title = "pricesadults",
        message = "Adults over 18 pay $150 per month",
        prompts = ["more prices","quit"]
    );
     pricesabove18 -[transition(intent="more prices")] -> prices;


    class = spawn main_root -[transition(intent="class")]-> node::state(
        title = "class",
        message = "There are 3 classes per week and you are required to attend a minimum of 2.",
        prompts = ["time","days","prices","quit"]

    );



    time = spawn class -[transition(intent="time")]-> node::state(
        title = "time",
        message = "Classes are from 3 pm to 4 pm",
        prompts = ["other times","days","quit"]
    );

    main_root -[transition(intent="times")] -> time;

    other_time = spawn time -[transition(intent="other times")]-> node::state(
        title = "Other times",
        message ="The clases are at 4 pm to 5 pm but you need at least 4 other students to start",
        prompts = ['days',"quit"]
    );




     days = spawn time -[transition(intent="days")]-> node::state(
        title = "days",
        message ="The classes are on Monday ,Wednesday , Friday",
        prompts = ['time',"quit"]
    );


     other_time - [transition(intent="days")] -> days ;
     days - [transition(intent="time")] -> time ;

    }
}

This last code block we created several nodes and connected them together. To move from node to node we use the intent to sepcify which route to take.

Walker

  • Walkers traverse the nodes of the graph triggering execution at the node level.

Now lets create a file called walker.jac Here is where we will create  the method for traveral of the graph.


#here we initialize the walker which we named talker.
walker talker {

    has utterance;

    state {
        #prints out the message and prompts variables for the node the walker is currently on
        std.out(here.message,here.prompts);

        #here we take the input from the terminal.
        utterance =  std.input("> ");

        #if the user enters "quit" the programs ends.
        if(utterance=="quit"): disengage;

        #checks the utterance and determine which node to traverse too.
        take -[transition(intent==utterance )] -> node::state else{
            take here ;
        }
    }

}

The Walker will start from the main root and from the utterance intered it will determine which node to go to next. It should be noted the utterance must match the prompts chosen or the walker will not move from the graph. In a future code through we will use an AI model from Jaseci Kit that can understand our intent by analayzing our various inputs.

Main

Create a file named main.jac .

# import the graph and walker made earlier.
import {*} with "./graph.jac";
import {*} with "./walker.jac";

# this walker is reponsible for starting the program.
walker init {

    root {
        #creates an instant of the graph
        spawn here --> graph::main_graph;

        #creates an instance of the walker, talker
        spawn  --> walker::talker;
    }


}

Once we run main.jac we can use the Chatbot. Play around with graph and add your own nodes and link other nodes together to create an even better chatbot.

Understanding JAC Programs

JAC Programs are authored by the JAC language which is used to define structure and behaviour. Behaviour can be modeled in the form of actions and abilities within node elements of the graph as well as walker elements which are specifically designed to traverse the nodes and edges of a graph. Structure can be modeled by arranging a number of nodes and edges in a particular manner, compelte with state, to form a graph.

When a JAC program executes, the structural and behavioral definitions encoded in JAC are registered with the Jaseci Runtime Machine in the form of a Sentinel. Named Walkers may then be launched on a graph via an API call.

Walkers may be designed to report their output. Reports come back via API once a walker has completed its walk, these reports will be a json payload of objects.

Graphs comprise nodes and edges. Nodes have abilities that can be activated when a walker travers on it. The Walker traverse the graph and decides which paths to take by using the edges.

All traversal begins at the Init or default node. This init node will connect to the main root of our graph.

Pic of Main Root

The Walkers are initialized and added on the root node and from there they begin traveral. The walkers decide which node to travel to based on which edge satisfies it's intent. The intent being a criteria meet by the edge.

Pic of Nodes and Edges

The Walker can move from node to node along edges. It can also be spawned directly on any node without the need for a traversal

Build a Conversational AI System with Jaseci

In this tutorial, you are going to learn how to build a state-of-the-art conversational AI system with Jaseci and the Jac language. You will learn the basics of Jaseci, training state-of-the-art AI models, and everything in between, in order to create an end-to-end fully-functional conversational AI system.

Excited? Hell yeah! Let's jump in.

Table of Contents

  1. Preparation and Background
  2. Automated FAQ Answering Chatbot
  3. Multi-turn Dialogue System
  4. Unify the Dialogue and FAQ Systems
  5. Bring Your Application to Production
  6. Improve Your AI Models

Preparation

To install jaseci, run this in your development environment:

pip install jaseci

To test the installation is successful, run:

jsctl --help

jsctl stands for the Jaseci Command Line Interface. If the command above displays the help menu for jsctl, then you have successfully installed jaseci.

Note

Take a look and get familiarized with these commands while you are at it. jsctl will be frequently used throughout this journey.

Key Concepts

A few essential concepts to get familiar with.

Graph, nodes, edges

Refer to relevant sections of the Jaseci Bible.

Walker

Refer to relevant sections of the Jaseci Bible.

Automated FAQ answering chatbot

Our conversational AI system will consist of multiple components. To start, we are going to build a chatbot that can answer FAQ questions without any custom training, using powerful zero-shot NLP models. At the end of this section, you will have a chatbot that, when given a question, searches in its knowledge base for the most relevant answer and returns that answer.

The chatbot we'll build here is a Tesla FAQ chatbot. We will be using the list of FAQs from https://www.tesla.com/en_SG/support/faq.

Note

This architecture works for any FAQ topics and use cases. Feel free to pick another product/website/company's FAQ if you'd like!

Define the Nodes

We have 3 different types of nodes:

  • root: This is the root node of the graph. It is a built-in node type and each graph has one root node only.
  • faq_root: This is the entry point of the FAQ handler. We will make the decision on the most relevant answer at this node.
  • faq_state: This node represents a FAQ entry. It contains a candidate answer from the knowledge base.

Now let's define the custom node types.

node faq_root;
node faq_state {
    has question;
    has answer;
}

The has keyword defines a node's variables. In this case, each faq_state has a question and answer.

Important

The root node does not need explicit definition. It is a built-in node type. Avoid using root as a custom node type.

Build the Graph

For this FAQ chatbot, we will build a graph as illustrated here:

Architecture of FAQ Bot

The idea here is that we will decide which FAQ entry is the most relevant to the incoming question at the faq_root node and then we will traverse to that node to fetch the corresponding answer.

To define this graph architecture:

// Static graph definition
graph faq {
    has anchor faq_root;
    spawn {
        // Spawning the nodes
        faq_root = spawn node::faq_root;
        faq_answer_1 = spawn node::faq_state(
            question="How do I configure my order?",
            answer="To configure your order, log into your Tesla account."
        );
        faq_answer_2 = spawn node::faq_state(
            question="How do I order a tesla",
            answer="Visit our design studio to place your order."
        );
        faq_answer_3 = spawn node::faq_state(
            question="Can I request a test drive",
            answer="Yes. You must be a minimum of 25 years of age."
        );

        // Connecting the nodes together
        faq_root --> faq_answer_1;
        faq_root --> faq_answer_2;
        faq_root --> faq_answer_3;
    }
}

Let's break down this piece of code.

We observe two uses of the spawn keyword. To spawn a node of a specific type, use the spawn keyword for:

faq_answer_1 = spawn node::faq_state(
    question="How do I configure my order?",
    answer="To configure your order, log into your Tesla account.",
);

In the above example, we just spawned a faq_state node called faq_answer_1 and initialized its question and answer variables.

Note

The spawn keyword can be used in this style to spawn many different jaseci objects, such as nodes, graphs and walkers.

The second usage of spawn is with the graph:

graph faq {
    has anchor faq_root;
    spawn {
       ...
    }
}

In this context, the spawn designates a code block with programmatic functionality to spawn a subgraph for which the root node of that spawned graph will be the has anchor faq_root.

In this block:

  • We spawn 4 nodes, one of the type faq_root and three of the type faq_state.
  • We connect each of the faq answer states to the faq root with faq_root --> faq_answer_*.
  • We set the faq_root as the anchor node of the graph. As we will later see, spawning a graph will return its anchor node.

Warning

An anchor node is required for every graph block. It must be assigned inside the spawn block of the graph definition.

Initialize the Graph

Similar to nodes, in order to create the graph, we will use the spawn keyword.

walker init {
    root {
        spawn here --> graph::faq;
    }
}

This is the first walker we have introduced, so let's break it down.

  • The walker is called init.
  • It contains logic specifically for the root node, meaning that the code inside the root {} block will run only on the root node. This syntax applies for any node types, as you will see very soon. Every Jac program starts with a single root node, but as you will later learn, a walker can be executed on any node, though the root is used by default if none is specified.
  • spawn here --> graph::faq creates an instance of the faq graph and connects its anchor node to here, which is the node the walker is currently on.

Note

init can be viewed as similar to main in Python. It is the default walker to run when no specific walkers are specified for a jac run command.

here is a very powerful keyword. It always evaluates to the specific node the walker is currently on. You will be using here a lot throughout this tutorial.

Run the init Walker

Now, let's run the init walker to initialize the graph. First put all of the above code snippet into a single jac file and name it faq.jac, including

  • nodes definition
  • graph definition
  • init walker

Run jsctl to get into the jaseci shell environment:

jsctl

Inside the jsctl shell,

jaseci > jac dot faq.jac

This command runs the init walker of the faq.jac program and prints the state of its graph in DOT format after the walker has finished. The DOT language is a popular graph description language widely used for representing complex graphs.

The output should look something like this

Dot output for Faq graph

strict digraph root {
    "n0" [ id="0955c04e4ff945b4b836748ef2bbd98a", label="n0:root"  ]
    "n1" [ id="c1240d79110941c1bc2feb18581951bd", label="n1:faq_root"  ]
    "n2" [ id="55333be285c246db88181ac34d16cd20", label="n2:faq_state"  ]
    "n3" [ id="d4fa8f2c46ca463f9237ef818e086a29", label="n3:faq_state"  ]
    "n4" [ id="f7b1c8ae82af4063ad53646adc5544e9", label="n4:faq_state"  ]
    "n0" -> "n1" [ id="a718fd6c938149269d3ade2af2eb023c", label="e0" ]
    "n1" -> "n2" [ id="3757cb15851249b4b6083d7cb3c34f8e", label="e1" ]
    "n1" -> "n4" [ id="626ce784a8f5423cae5d5d5ca857fc5c", label="e2" ]
    "n1" -> "n3" [ id="a609e7b54bde4a6a9c9711afdb123241", label="e3" ]
}

Note

We are not going to cover the DOT syntax. There are many resources online if you are interested, e.g., https://graphviz.org/doc/info/lang.html

Note

There are tools available to render a graph in DOT format. For example, https://dreampuf.github.io/GraphvizOnline has a WSIWYG editor to render dot graph in real time.

Congratulations! You have just created your first functional jac program!

Ask the Question

Alright, we have initialized the graph. Now it's time to create the code for the question-answering. We will start with a simple string matching for the answer selection algorithm. For this, we will create a new walker called ask.

walker ask {
    has question;
    root {
        question = std.input("AMA > ");
        take --> node::faq_root;
    }
    faq_root {
        take --> node::faq_state(question==question);
    }
    faq_state {:
        std.out(here.answer);
    }
}

This walker is more complex than the init one and introduces a few new concepts so let's break it down!

  • Similar to nodes, walkers can also contain has variables. They define variables of the walker. They can also be passed as parameters when calling the walker.
  • std.input and std.out read and write to the command line respectively.
  • This walker has logic for three types of node: root, faq_root and faq_state.
    • root: It simply traverses to the faq_root node.
    • faq_root: This is where the answer selection algorithm is. We will find the most relevant faq_state and then traverse to that node via a take statement. In this code snippet, we are using a very simple (and limited) string matching approach to try to match the predefined FAQ question with the user question.
    • faq_state: Print the answer to the terminal.

Before we run this walker, we are going to update the init walker to speed up our development process

walker init {
    root {
        spawn here --> graph::faq;
        spawn here walker::ask;
    }
}

This serves as a shorthand so that we can initialize the graph and ask a question in one command.

Note

This demonstrates how one walker can spawn another walker using the spawn keyword.

Time to run the walker!

jaseci > jac run faq.jac

jac run functions very similarly to jac dot, with the only difference being that it doesn't return the graph in DOT format. Try giving it one of the three questions we have predefined and it should respond with the corresponding answer.

Introducing Universal Sentence Encoder

Now, obviously, what we have now is not very "AI" and we need to fix that. We are going to use the Universal Sentence Encoder QA model as the answer selection algorithm. Universal Sentence Encoder is a language encoder model that is pre-trained on a large corpus of natural language data and has been shown to be effective in many NLP tasks. In our application, we are using it for zero-shot question-answering, i.e. no custom training required.

Jaseci has a set of built-in libraries or packages that are called Jaseci actions. These actions cover a wide-range of state-of-the-art AI models across many different NLP tasks. These actions are packaged in a Python module called jaseci_ai_kit.

To install jaseci_ai_kit:

pip install jaseci_ai_kit

Now we load the action we need into our jaseci environment

jaseci > actions load module jaseci_ai_kit.use_qa

Let's update our walker logic to use the USE QA model:

walker ask {
    can use.qa_classify;
    has question;
    root {
        question = std.input(">");
        take --> node::faq_root;
    }
    faq_root {
        answers = -->.answer;
        best_answer = use.qa_classify(
            text = question,
            classes = answers
        );
        take --> node::faq_state(answer==best_answer["match"]);
    }
    faq_state {
        std.out(here.answer);
    }
}

Even though there are only 5 lines of new code, there are many interesting aspects, so let's break it down!

  • -->.answer collects the answer variable of all of the nodes that are connected to here/faq_root with a --> connection.
  • use.qa_classify is one of the action supported by the USE QA action set. It takes in a question and a list of candidate answers and return the most relevant one.

Now let's run this new updated walker and you can now ask questions that are relevant to the answers beyond just the predefined ones.

Scale it Out

So far we have created a FAQ bot that is capable of providing answer in three topics. To make this useful beyond just a prototype, we are now going to expand its database of answers. Instead of manually spawning and connecting a node for each FAQ entry, we are going to write a walker that automatically expands our graph:

walker ingest_faq {
    has kb_file;
    root: take --> node::faq_root;
    faq_root {
        kb = file.load_json(kb_file);
        for faq in kb {
            answer = faq["answer"];
            spawn here --> node::faq_state(answer=answer);
        }
    }
}

An example knowledge base file look like this

[
  {
    "question": "I have a Model 3 reservation, how do I configure my order?",
    "answer": "To configure your order, log into your Tesla Account and select manage on your existing reservation to configure your Tesla. Your original USD deposit has now been converted to SGD."
  },
  {
    "question": "How do I order a Tesla?",
    "answer": "Visit our Design Studio to explore our latest options and place your order. The purchase price and estimated delivery date will change based on your configuration."
  },
  {
    "question": "Can I request a Test Drive?",
    "answer": "Yes, you can request for a test drive. Please note that drivers must be a minimum of 25 years of age and not exceeding 65 years of age, hold a full driving license with over 2 years of driving experience. Insurance conditions relating to your specific status must be reviewed and accepted prior to the test drive."
  }
]

Save the above json in a file named tesla_faq.json and make sure it is in the same location as faq.jac. Let's now update the init walker. Because we are going to use the ingest_faq walker to generate the graph, we won't need the static graph definition.

walker init {
    root {
        spawn here --> node::faq_root;
        spawn here walker::ingest_faq(kb_file="tesla_faq.json");
        spawn here walker::ask;
    }
}

What we are doing here is

  • Spawning a faq_root node
  • Running the ingest_faq walker to create the neccessary faq_state nodes based on the question-answer entries in the tesla_faq.json file.
  • Launching the ask walker

Let's run the program one more time and test it out!

jaseci > jac run faq.jac

Note

Try more varied questions. Now we have a longer answer with more rich information, it has a higher coverage of information that will be able to answer more questions.

Note

If you are feeling adventurous, try downloading the complete list of entires on the Tesla FAQ page and use it to create a production-level FAQ bot. See if you can push the model to its limit!

Next up!

Full architecture of Tesla AI

Here is a preview on what's next to come in this journey!

On the right is the architecture diagram of the complete system we are going to build. Here are the major components:

  • Zero-shot FAQ (what we have built so far).
  • Action-oriented Multi-turn Dialogue System.
  • Training and inference with an intent classification model.
  • Training and inference with an entity extraction model.
  • Testing.
  • Deploying your Jac application to a production environment.
  • Training data collection and curation.

A Multi-turn Action-oriented Dialogue System

Introduction

In the previous section, we built a FAQ chabot. It can search in a knowledge base of answers and find the most relevant one to a user's question. While ths covers many diverse topics, certain user request can not be satisfied by a single answer. For example, you might be looking to open a new bank account which requires mulitple different pieces of information about you. Or, you might be making a reservation at a restaurant which requires information such as date, time and size of your group. We refer to these as action-oriented conversational AI requests, as they often lead to a certain action or objective.

When interacting with a real human agent to accomplish this type of action-oriented requests, the interaction can get messy and unscripted and it also varies from person to person. Again, use the restaurant reservation as an example, one migh prefer to follow the guidance of the agent and provide one piece of information at a time, while others might prefer to provide all the neccessary information in one sentence at the beginning of the interaction.

Therefore, in order to build a robust and flexible conversational AI to mimic a real human agent to support these types of messy action-oriented requests, we are going to need an architecture that is different than the single-turn FAQ.

And that is what we are going to build in this section -- a multi-turn action-oriented dialogue system.

Warning

Create a new jac file (dialogue.jac) before moving forward. We will keep this program separate from the FAQ one we built. But, KEEP the FAQ jac file around, we will integrate these two systems into one unified conversational AI system later.

State Graph

Let's first go over the graph architecture for the dialogue system. Put all the code so far in a new file and name it dialogue.jac. We will be building a state graph. In a state graph, each node is a conversational state, which represents a possible user state during a dialgoue. The state nodes are connected with transition edges, which encode the condition required to hop from one state to another state. The conditions are often based on the user's input.

Define the State Nodes

We will start by defining the node types.

node dialogue_root;

node dialogue_state {
    has name;
    has response;
}

Here we have a dialogue_root as the entry point to the dialogue system and multiple dialogue_state nodes representing the conversational states. These nodes will be connected with a new type of edge intent_transition.

Custom Edges

edge intent_transition {
    has intent;
}

This is the first custom edge we have introduced. In jac, just like nodes, you can define custom edge types. Edges are also allowed has variables.

In this case, we created an edge for intent transition. This is a state transition that will be triggered conditioned on its intent being detected from the user's input question.

Note

Custom edge type and variables enable us to encode information into edges in addition to nodes. This is crucial for building a robust and flexible graph.

Build the graph

Let's build the first graph for the dialogue system.

graph dialogue_system {
    has anchor dialogue_root;
    spawn {
        dialogue_root = spawn node::dialogue_root;
        test_drive_state = spawn node::dialogue_state(
            name = "test_drive",
            response = "Your test drive is scheduled for Jan 1st, 2023."
        );
        how_to_order_state = spawn node::dialogue_state (
            name = "how_to_order",
            response = "You can order a Tesla through our design studio."
        );

        dialogue_root -[intent_transition(intent="test drive")]-> test_drive_state;
        dialogue_root -[intent_transition(intent="order a tesla")]-> how_to_order_state;
    }
}

We have already covered the syntax for graph definition, such as the anchor node and the spawn block in the previous section. Refer to the FAQ graph definition step if you need a refresher.

We have a new language syntax here dialogue_root -[intent_transition(intent="test drive")]-> test_drive_state;. Let's break this down!

  • If you recall, we have used a similar but simpler syntax to connect two nodes with an edge faq_root --> faq_state;. This connect faq_root to faq_state with a generic edge pointing to faq_state;
  • In dialogue_root -[intent_transition(intent="test drive")]-> test_drive_state;, we are connecting the two states with a custom edge of the type intent_transition.
  • In addition, we are initializing the variable intent of the edge to be test drive.

To summarize, with this graph, a user will start at the dialogue root state when they first start the conversation. Then based on the user's question and its intent, we will move to the corresponding test_drive or how_to_order dialogue state node.

Initialize the graph

Let's create an init walker to for this new jac program.

walker init {
    root {
        spawn here --> graph::dialogue_system;
    }
}

Let's initialize the graph and visualize it.

jaseci > jac dot dialogue.jac
strict digraph root {
    "n0" [ id="7b4ee7198c5b4dcd8acfcf739d6971fe", label="n0:root"  ]
    "n1" [ id="7caf939cfbce40d4968d904052368f30", label="n1:dialogue_root"  ]
    "n2" [ id="2e06be95aed449b59056e07f2077d854", label="n2:dialogue_state"  ]
    "n3" [ id="4aa3e21e13eb4fb99926a465528ae753", label="n3:dialogue_state"  ]
    "n1" -> "n3" [ id="6589c6d0dd67425ead843031c013d0fc", label="e0:intent_transition" ]
    "n1" -> "n2" [ id="f4c9981031a7446b855ec91b89aaa5ee", label="e1:intent_transition" ]
    "n0" -> "n1" [ id="bec764e7ee4048898799c2a4f01b9edb", label="e2" ]
}

DOT of the dialogue system

Build the Walker Logic

Let's now start building the walker to interact with this dialogue system.

walker talk {
    has question;
    root {
        question = std.input("> ");
        take --> node::dialogue_root;
    }
    dialogue_root {
        take -[intent_transition(intent==question)]-> node::dialogue_state;
    }
    dialogue_state {
        std.out(here.response);
    }
}

Similar to the first walker we built for the FAQ system, we are starting with a simple string matching algorithm. Let's update the init walker to include this walker.

walker init {
    root {
        spawn here --> graph::dialogue_system;
        spawn here walker::talk;
    }
}

Try out the following interactions

$ jsctl jac run dialogue.jac
> test drive
Your test drive is scheduled for Jan 1st, 2023.
{
  "success": true,
  "report": [],
  "final_node": "urn:uuid:9b8d9e1e-d7fb-4e6e-ae86-7ef7c7ad28a7",
  "yielded": false
}

and

$ jsctl jac run dialogue.jac
> order a tesla
You can order a Tesla through our design studio.
{
  "success": true,
  "report": [],
  "final_node": "urn:uuid:168590aa-d579-4f22-afe7-da75ab7eefa3",
  "yielded": false
}

What is happening here is based on the user's question, we are traversing the corresponding dialogue state and then return the response of that state. For now, we are just matching the incoming question with the intent label as a simple algorithm, which we will now replace with an AI model.

Note

Notice we are running jsctl commands directly from the terminal without first entering the jaseci shell? Any jsctl commands can be launched directly from the terminal by just prepending it with jsctl. Try it with the other jsctl comamnds we have encountered so far, such as jac dot.

Intent classificaiton with Bi-encoder

Let's introduce an intent classification AI model. Intent Classification is the task of detecting and assigning an intent to a given piece of text from a list of pre-defined intents, to summarize what the text is conveying or asking. It's one of the fundamental tasks in Natural Language Processing (NLP) with broad applications in many areas.

There are many models that have been proposed and applied to intent classification. For this tutorial, we are going to use a Bi-Encoder model. A Bi-encoder model has two transformer-based encoders that each encodes the input text and candidate intent labels into embedding vectors and then the model compare the similarity between the embedding vectors to find the most relevant/fitting intent label.

Note

If you don't fully understand the Bi-encoder model yet, do not worry! We will provide the neccessary code and tooling for you to wield this model as a black box. But, if you are interested, here is a paper for you to read up on it https://arxiv.org/pdf/1908.10084.pdf!

Now let's train the model. We have created a jac program and sample training data for this. They are in the code directory next to this tutorial. Copy bi_enc.jac and clf_train_1.json to your working directory.

Let's first load the Bi-encoder action library into Jaseci.

$ jsctl
jaseci > actions load module jaseci_ai_kit.bi_enc

We have provided an example training file that contains some starting point training data for the two intents, test drive and order a tesla.

jaseci > jac run bi_enc.jac -walk train -ctx "{\"train_file\": \"clf_train_1.json\"}"

We are still using jac run but as you have noticed, this time we are using some new arguments. So let's break it down.

  • -walk specifies the name of the walker to run. By default, it runs the init walker.
  • -ctx stands for context. This lets us provide input parameters to the walker. The input parameters are defined as has variables in the walker.

Warning

-ctx expects a json string that contains a dictionary of parameters and their values. Since we are running this on the command line, you will need to escape the quotation marks " properly for it to be a valid json string. Pay close attention to the example here -ctx "{\"train_file\": \"clf_train_1.json\"}" and use this as a reference.

You should see an output block that looks like the following repeating many times on your screen:

...
Epoch : 5
loss : 0.10562849541505177
LR : 0.0009854014598540146
...

Each training epoch, the above output will print with the training loss and learning rate at that epoch. By default, the model is trained for 50 epochs.

If the training successfully finishes, you should see "success": true at the end.

Now that the model has finished training, let's try it out! You can use the infer walker to play with the model and test it out! infer is short for inference, which means using a trained model to run prediction on a given input.

jaseci > jac run bi_enc.jac -walk infer -ctx "{\"labels\": [\"test drive\", \"order a tesla\"]}"

Similar to training, we are using jac run to specifically invoke the infer walker and provide it with custom parameters. The custom paremeter is the list of candidate intent labels, which are test drive and order a tesla in this case, as these were the intents the model was trained on.

jaseci > jac run bi_enc.jac -walk infer -ctx "{\"labels\": [\"test drive\", \"order a tesla\"]}"
Enter input text (Ctrl-C to exit)> i want to order a tesla
{"label": "order a tesla", "score": 9.812651595405981}
Enter input text (Ctrl-C to exit)> i want to test drive
{"label": "test drive", "score": 6.931458692617463}
Enter input text (Ctrl-C to exit)>

In the output here, label is the predicted intent label and score is the score assigned by the model to that intent.

Note

One of the advantage of the bi-encoder model is that candidate intent labels can be dynamically defined at inference time, post training. This enables us to create custom contextual classifiers situationally from a single trained model. We will leverage this later as our dialogue system becomes more complex.

Congratulations! You just trained your first intent classifier, easy as that.

The trained model is kept in memory and active until they are explicitly saved with save_model. To save the trained model to a location of your choosing, run

jaseci > jac run bi_enc.jac -walk save_model -ctx "{\"model_path\": \"dialogue_intent_model\"}"

Similarly, you can load a saved model with load_model

jaseci > jac run bi_enc.jac -walk load_model -ctx "{\"model_path\": \"dialogue_intent_model\"}"

Always remember to save your trained models!

Warning

save_model works with relative path. When a relative model path is specified, it will save the model at the location relative to location of where you run jsctl. Note that until the model is saved, the trained weights will stay in memory, which means that it will not persisit between jsctl session. So once you have a trained model you like, make sure to save them so you can load them back in the next jsctl session.

Integrate the Intent Classifier

Now let's update our walker to use the trained intent classifier.

walker talk {
    has question;
    can bi_enc.infer;
    root {
        question = std.input("> ");
        take --> node::dialogue_root;
    }
    dialogue_root {
        intent_labels = -[intent_transition]->.edge.intent;
        predicted_intent = bi_enc.infer(
            contexts = [question],
            candidates = intent_labels,
            context_type = "text",
            candidate_type = "text"
        )[0]["predicted"]["label"];
        take -[intent_transition(intent==predicted_intent)]-> node::dialogue_state;
    }
    dialogue_state {
        std.out(here.response);
    }
}

intent_labels = -[intent_transition]->.edge.intent collects the intent variables of all the outgoing intent_transition edges. This represents the list of candidate intent labels for this state.

Try playing with different questions, such as

$ jsctl
jaseci > jac run dialogue.jac
> hey yo, I heard tesla cars are great, how do i get one?
You can order a Tesla through our design studio.
{
  "success": true,
  "report": [],
  "final_node": "urn:uuid:af667fdf-c2b0-4443-9ccd-7312bc4c66c4",
  "yielded": false
}

Making Our Dialogue System Multi-turn

Dialogues in real life have many turn of interaction. Our dialogue system should also support that to provide a human-like conversational experinece. In this section, we are going to take the dialogue system to the next level and create a multi-turn dialogue experience.

Before we do that we need to introduce two new concepts in Jac: node abilities and inheritance.

Node Abilities

Node abilities are code that encoded as part of each node type. They often contain logic that read, write and generally manipulate the variables and states of the nodes. Node abilities are defined with the can keyword inside the definition of nodes, for example, in the code below, get_plate_number is an ability of the vehicle node.

node vehicle {
    has plate_numer;
    can get_plate_numer {
        report here.plate_number;
    }
}

To learn more about node abilities, refer to the relevant sections of the Jaseci Bible.

Note

Node abilities look and function similarly to member functions in object-oriented programming (OOP). However, there is a key difference in the concepts. Node abilities are the key concept in data-spatial programming, where the logic should stay close to its working set data in terms of the programming syntax.

Inheritance

Jac supports inheritance for nodes and edges. Node variables (defined with has) and node abilities (defined with can) are inherited and can be overwritten by children nodes.

Here is an example:

node vehicle {
    has plate_number;
    can get_plate_number {
        report here.plate_number;
    }
}

node car:vehicle {
    has plate_number = "RAC001";
}

node bus:vehicle {
    has plate_number = "SUB002";
}

To learn more about inheritance in Jac, refer to the relevant sections of the Jaseci Bible.

Build the Multi-turn Dialogue Graph

Now that we have learnt about node abilities and node inheritance, let's put these new concepts to use to build a new graph for the multi-turn dialogue system

There are multiple parts to this so let's break it down one by one

Dialogue State Specific Logic

With the node abilities and node inheritance, we will now introduce state specific logic. Take a look at how the dialogue_root node definition has changed.

node dialogue_state {
    can bi_enc.infer;
    can tfm_ner.extract_entity;

    can classify_intent {
        intent_labels = -[intent_transition]->.edge.intent;
        visitor.wlk_ctx["intent"] = bi_enc.infer(
            contexts = [visitor.question],
            candidates = intent_labels,
            context_type = "text",
            candidate_type = "text"
        )[0]["predicted"]["label"];
    }

    can extract_entities {
        // Entity extraction logic will be added a bit later on.
    }

    can init_wlk_ctx {
        new_wlk_ctx = {
            "intent": null,
            "entities": {},
            "prev_state": null,
            "next_state": null,
            "respond": false
        };
        if ("entities" in visitor.wlk_ctx) {
            // Carry over extracted entities from previous interaction
            new_wlk_ctx["entities"] = visitor.wlk_ctx["entities"];
        }
        visitor.wlk_ctx = new_wlk_ctx;
    }
    can nlu {}
    can process {
        if (visitor.wlk_ctx["prev_state"]): visitor.wlk_ctx["respond"] = true;
        else {
            visitor.wlk_ctx["next_state"] = net.root();
            visitor.wlk_ctx["prev_state"] = here;
        }
    }
    can nlg {}
}

node dialogue_root:dialogue_state {
    has name = "dialogue_root";
    can nlu {
        ::classify_intent;
    }
    can process {
        visitor.wlk_ctx["next_state"] = (-[intent_transition(intent==visitor.wlk_ctx["intent"])]->)[0];
    }
    can nlg {
        visitor.response = "Sorry I can't handle that just yet. Anything else I can help you with?";
    }
}

There are many interesting things going on in these ~30 lines of code so let's break it down!

  • The dialogue_state node is the parent node and it is similar to a virtual class in OOP. It defines the variables and abilities of the nodes but the details of the abilities will be specified in the inheriting children nodes.
  • In this case, dialogue_state has 4 node abilities:
    • can nlu: NLU stands for Natural Language Understanding. This ability will analyze user's incoming requset and apply AI models.
    • can process: This ability uses the NLU results and figure out the next dialogue state the walker should go to.
    • can nlg: NLG stands for Natural Language Generation. This abilitiy will compose response to the user, often based on the results from nlu.
    • can classify_intent: an ability to handle intent classification. This is the same intent classification logic that has been copied over from the walker.
    • can extract_entities: a new ability with a new AI model -- entity extraction. We will cover that just in a little bit (read on!).
  • Between these four node abilities, classify_intent and extract_entities have concrete logic defined while nlu and nlg are "virtual node abilities", which will be specified in each of the inheriting children.
  • For example, dialogue_root inherit from dialogue_state and overwrites nlu and nlg:
    • for nlu, it invokes intent classification because it needs to decide what's the intent of the user (test drive vs order a tesla).
    • for nlg, it just has a general fall-back response in case the system can't handle user's ask.
  • New Syntax: visitor is the walker that is "visiting" the node. And through visitor.*, the node abilities can access and update the context of the walker. In this case, the node abilities are updating the response variable in the walker's context so that the walker can return the response to its caller, as well as the wlk_ctx variable that will contain various walker context as the walker traverse the graph.
    • the init_wlk_ctx ability initializes the wlk_ctx variable for each new question.

In this new node architecture, each dialogue state will have its own node type, specifying their state-specific logic in nlu, nlg and process. Let's take a look!

node how_to_order_state:dialogue_state {
    has name = "how_to_order";
    can nlg {
        visitor.response = "You can order a Tesla through our design studio";
    }
}

node test_drive_state:dialogue_state {
    has name = "test_drive";
    can nlu {
        if (!visitor.wlk_ctx["intent"]): ::classify_intent;
        ::extract_entities;
    }
    can process {
        // Check entity transition
        required_entities = -[entity_transition]->.edge[0].context["entities"];
        if (vector.sort_by_key(visitor.wlk_ctx["entities"].d::keys) == vector.sort_by_key(required_entities)) {
            visitor.wlk_ctx["next_state"] = -[entity_transition]->[0];
            visitor.wlk_ctx["prev_state"] = here;
        } elif (visitor.wlk_ctx["prev_state"] and !visitor.wlk_ctx["prev_state"].context["name"] in ["test_drive", "td_confirmation"]){
            next_state = -[intent_transition(intent==visitor.wlk_ctx["intent"])]->;
            if (next_state.length > 0 and visitor.wlk_ctx["intent"] != "no") {
                visitor.wlk_ctx["next_state"] = next_state[0];
                visitor.wlk_ctx["prev_state"] = here;
            } else {
                visitor.wlk_ctx["respond"] = true;
            }
        } else {
            visitor.wlk_ctx["respond"] = true;
        }
    }
    can nlg {
        if ("name" in visitor.wlk_ctx["entities"] and "address" not in visitor.wlk_ctx["entities"]):
            visitor.response = "What is your address?";
        elif ("address" in visitor.wlk_ctx["entities"] and "name" not in visitor.wlk_ctx["entities"]):
            visitor.response = "What is your name?";
        else:
            visitor.response = "To set you up with a test drive, we will need your name and address.";
    }
}

node td_confirmation:dialogue_state {
    has name = "test_drive_confirmation";
    can nlu {
        if (!visitor.wlk_ctx["intent"]): ::classify_intent;
    }
    can process {
        if (visitor.wlk_ctx["prev_state"]): visitor.wlk_ctx["respond"] = true;
        else {
            visitor.wlk_ctx["next_state"] = -[intent_transition(intent==visitor.wlk_ctx["intent"])]->[0];
            visitor.wlk_ctx["prev_state"] = here;
        }
    }
    can nlg {
        visitor.response =
            "Can you confirm your name to be " + visitor.wlk_ctx["entities"]["name"][0] + " and your address as " + visitor.wlk_ctx["entities"]["address"][0] + "?";
    }
}

node td_confirmed:dialogue_state {
    has name = "test_drive_confirmed";
    can nlg {
        visitor.response = "You are all set for a Tesla test drive!";
    }
}

node td_canceled:dialogue_state {
    has name = "test_drive_canceled";
    can nlg {
        visitor.response = "No worries. We look forward to hearing from you in the future!";
    }
}
  • Each dialogue state now has its own node type, all inheriting from the same generic dialogue_state node type.

  • We have 4 dialogue states here for the test drive capability:

    • test_drive: This is the main state of the test drive intent. It is responsible for collecting the neccessary information from the user.
    • test_drive_confirmation: Ths is the state for user to confirm the information they have provided are correct and is ready to actually schedule the test drive.
    • test_drive_confirmed: This is the state after the user has confirmed.
    • test_drive_canceled: User has decided, in the middle of the dialogue, to cancel their request to schedule a test drive.
  • The process ability contains the logic that defines the conversational flow of the dialogue system. It uses the data in wlk_ctx and assign a next_state which will be used by the walker in a take statement, as you will see in a just a little bit.

  • New Syntax: The code in test_drive_state's ability demonstrates jac support for list and dictionary. To access the list and dictionary-specific functions, first cast the variable with .l/.list for list and .d/.dict for dictionaries, then proceed with : to access the built-in functions for list and dictioinaries. For more on jac's built-in types, refer to the relevant sections of the Jaseci Bible.

    • Specifically in this case, we are comparing the list of entities of the entity_transition edge with the list of entities that have been extracted by the walker and the AI model (stored in wlk_ctx["entities]). Since there can be multiple entities required and they can be extracted in arbitrary order, we are sorting and then comparing here.
  • New Syntax: -[entity_transition]->.edge shows how to access the edge variable. Consider -[entity_transition]-> as a filter. It returns all valid nodes that are connected to the implicit here via an entity_transition. On its own, it will return all the qualified nodes. When followed by .edge, it will return the set of edges that are connected to the qualified nodes.

You might notice that some states do not have a process ability. These are states that do not have any outgoing transitions, which we refer to as leaf nodes. If these nodes are reached, they indicate that a dialogue has been completed end to end. The next state for these node will be returning to the root node so that the next dialogue can start fresh. To facilitate this, we will add the following logic to the process ability of the parent dialogue_state node so that by default, any nodes inheriting it will follow this rule.

node dialogue_state {
...
    can process {
        if (visitor.wlk_ctx["prev_state"]): visitor.wlk_ctx["respond"] = true;
        else {
            visitor.wlk_ctx["next_state"] = net.root();
            visitor.wlk_ctx["prev_state"] = here;
        }
    }
...
}

Note

Pay attention to the 4 dialogue states here. This pattern of main -> confirmation -> confirmed -> canceled is a very common conversational state graph design pattern and can apply to many topics, e.g., make a restaurant reservation and opening a new bank account. Essentially, almost any action-oriented requests can leverage this conversational pattern. Keep this in mind!

Entity Extraction

Previously, we have introduced intent classification and how it helps to build a dialogue system. We now introduce the second key AI models, that is specifically important for a multi-turn dialogue system, that is entity/slot extraction.

Entity extraction is a NLP task that focuses on extracting words or phrases of interests, or entities, from a given piece of text. Entity extraction, sometimes also referred to as Named Entity Recognition (NER), is useful in many domains, including information retrieval and conversational AI. We are going to use a transformer-based entity extraction model for this exercise.

Let's first take a look at how we are going to use an entity model in our program. Then we will work on training an entity model.

First, we introduce a new type of transition:

edge entity_transition {
    has entities;
}

Recall the intent_transition that will trigger if the intent is the one that is being predicted. Similarly, the idea behind an entity_transition is that we will traverse this transition if all the specified entities have been fulfilled, i.e., they have been extracted from user's inputs.

With the entity_transition, let's update our graph

graph dialogue_system {
    has anchor dialogue_root;
    spawn {
        dialogue_root = spawn node::dialogue_root;
        test_drive_state = spawn node::test_drive_state;
        td_confirmation = spawn node::td_confirmation;
        td_confirmed = spawn node::td_confirmed;
        td_canceled = spawn node::td_canceled;

        how_to_order_state = spawn node::how_to_order_state;

        dialogue_root -[intent_transition(intent="test drive")]-> test_drive_state;
        test_drive_state -[intent_transition(intent="cancel")]-> td_canceled;
        test_drive_state -[entity_transition(entities=["name", "address"])]-> td_confirmation;
        test_drive_state -[intent_transition(intent="provide name or address")]-> test_drive_state;
        td_confirmation - [intent_transition(intent="yes")]-> td_confirmed;
        td_confirmation - [intent_transition(intent="no")]-> test_drive_state;
        td_confirmation - [intent_transition(intent="cancel")]-> td_canceled;

        dialogue_root -[intent_transition(intent="order a tesla")]-> how_to_order_state;
    }
}

Your graph should look something like this!

Multi-turn Dialogue Graph

Update the Walker for Multi-turn Dialogue

Let's now turn our focus to the walker logic

walker talk {
    has question;
    has wlk_ctx = {};
    has response;
    root {
        take --> node::dialogue_root;
    }
    dialogue_state {
        if (!question) {
            question = std.input("Question (Ctrl-C to exit)> ");
            here::init_wlk_ctx;
        }
        here::nlu;
        here::process;
        if (visitor.wlk_ctx["respond"]) {
            here::nlg;
            std.out(response);
            question = null;
            take here;
        } else {
            take visitor.wlk_ctx["next_state"] else: take here;
        }
    }
}

The walker logic looks very different now. Let's break it down!

  • First off, because the intent classification logic is now a node ability, the walker logic has become simpler and, more importantly, more focused on graph traversal logic without the detailed (and occasionally convoluted) logic required to process to interact with an AI model.
  • New Syntax: here::nlu and here::nlg invokes the node abilities. here can be subtitied with any node variables, not just the one the walker is currently on.

Now that we have explained some of the new language syntax here, let's go over the overall logic of this walker. For a new question from the user, the walker will

  1. analyze the question (here:nlu) to identify its intent (predicted_intent) and/or extract its entities (extracted_entities).
  2. based on the NLU results, it will traverse the dialogue state graph (the two take statements) to a new dialogue state
  3. at this new dialogue state, it will perform NLU, specific to that state (recall that nlu is a node ability that varies from node to node) and repeat step 2
  4. if the walker can not make any state traversal anymore (take ... else {}), it will construct a response (here::nlg) using the information it has gathered so far (the walker's context) and return that response to the user.

If this still sounds fuzzy, don't worry! Let's use a real dialogue as an example to illustrate this.

Turn #1:
    User: hey i want to schedule a test drive
    Tesla AI: To set you up with a test drive, we will need your name and address.

Turn #2:
    User: my name is Elon and I live at 123 Main Street
    Tesla AI: Can you confirm your name to be Elon and your address as 123 Main Street?

Turn #3:
    User: Yup! that is correct
    Tesla AI: You are all set for a Tesla test drive!

At turn #1,

  • The walker starts at dialogue_root.
  • The nlu at dialogue_root is called and classify the intent to be test drive.
  • There is an intent_transition(test_drive) connecting dialogue_root to test_drive_state so the walker takes itself to test_drive_state .
  • We are now at test_drive_state, its nlu requires entity_extraction which will look for name and address entities. In this case, neither is provided by the user.
  • As a result, the walker can no longer traverse based on the take rules and thus construct a response based on the nlg logic at the test_drive_state.

At turn #2,

  • The walker starts at test_drive_state, picking up where it left off.
  • nlu at test_drive_state perform intent classification and entity extractions. This time it will pick up both name and address.
  • As a result, the first take statement finds a qualified path and take that path to the td_confirmation node.
  • At td_confirmation, no valid take path exists so a response is returned.

Note

Turn #3 works similiarly as turn #1. See if you can figure out how the walker reacts at turn #3 yourself!

Train an Entity Extraction Model

Let's now train an entity extraction model! We are using a transformer-based token classification model.

First, we need to load the actions. The action set is called tfm_ner (tfm stands for transformer).

jaseci > actions load module jaseci_ai_kit.tfm_ner

Warning

If you installed jaseci_ai_kit prior to September 5th, 2022, please upgrade via pip install --upgrade jaseci_ai_kit. There has been an update to the module that you will need for remainder of this exercise. You can check your installed version via pip show jaseci_ai_kit. You need to be on version 1.3.4.6 or higher.

Similar to Bi-encoder, we have provided a jac program to train and inference with this model, as well as an example training dataset. Go into the code/ directory and copy tfm_ner.jac and ner_train.json to your working directory. We are training the model to detect two entities, name and address, for the test drive use case.

Let's quickly go over the training data format.

[
    "sure my name is [tony stark](name) and i live at [10880 malibu point california](address)",
    "my name is [jason](name)"
]

The training data is a json list of strings, each of which is a training example. [] indicate the entitiy text while the () following it defines the entity type. So in the example above, we have two entities, name:tony stark and address: 10880 malibu point california.

To train the model, run

jaseci > jac run tfm_ner.jac -walk train -ctx "{\"train_file\": \"ner_train.json\"}"

After the model is finished training, you can play with the model using the infer walker

jaseci > jac run tfm_ner.jac -walk infer

For example,

jaseci > jac run tfm_ner.jac -walk infer
Enter input text (Ctrl-C to exit)> my name is jason
[{"entity_text": "jason", "entity_value": "name", "conf_score": 0.5514775514602661, "start_pos": 11, "end_pos": 16}]

The output of this model is a list of dictionaries, each of which is one detected entitiy. For each detected entity, entity_value is the type of entity, so in this case either name or address; and entity_text is the detected text from the input for this entity, so in this case the user's name or their address.

Remember to save the trained model to a location of your choosing, run

jaseci > jac run tfm_ner.jac -walk save_model -ctx "{\"model_path\": \"tfm_ner_model\"}"

Warning

If you are uploading your code to GitHub, make sure to exclude the tfm_ner_model folder as it is too large and you will not be able to push your changes. If you still wish to push this file, you must use Git Large File Storage (Git LFS).

Let's now update the node ability to use the entity model.

node dialogue_state {
    ...
    can extract_entities {
        res = tfm_ner.extract_entity(visitor.question);
        for ent in res {
            ent_type = ent["entity_value"];
            ent_text = ent["entity_text"];
            if (!(ent_type in visitor.wlk_ctx["entities"])){
                visitor.wlk_ctx["entities"][ent_type] = [];
            }
            visitor.wlk_ctx["entities"][ent_type].l::append(ent_text);
        }
    }
    ...
}

There is one last update we need to do before this is fully functional. Because we have more dialogue states and a more complex graph, we need to update our classifier to include the new intents. We have provided an example training dataset at code/clf_train_2.json. Re-train the bi-encoder model with this dataset.

Note

Refer to previous code snippets if you need a reminder on how to train the bi-encoder classifier model.

Note

Remember to save your new entity extraction model!

Now try running the walker again with jac run dialogue.jac!

Congratulations! You now have a fully functional multi-turn dialogue system that can handle test drive requests!

Unify the Dialogue and FAQ Systems

So far, we have built two separate conversational AI systems, a FAQ system that automatically scales with the available question-answer pairs and a multi-turn action-oriented dialogue system that can handle complex requests. These two systems serve different use cases and can be combined to a single system to provide a flexible and robust conversational AI experience. In this section, we are going to unify these two systems into one coherent conversational AI system.

While these two systems rely on different AI models, they share many of the same logic flow. They both follow the general steps of first analyizing user's question with NLU AI models, make decision on the next conversational state to be and then construct and return a response to the user. Leveraging this shared pattern, we will first unify the node architecture of the two systems with a single parent node type, cai_state (cai is short of conversational AI).

node cai_state {
    has name;
    can init_wlk_ctx {
        new_wlk_ctx = {
            "intent": null,
            "entities": {},
            "prev_state": null,
            "next_state": null,
            "respond": false
        };
        if ("entities" in visitor.wlk_ctx) {
            // Carry over extracted entities from previous interaction
            new_wlk_ctx["entities"] = visitor.wlk_ctx["entities"];
        }
        visitor.wlk_ctx = new_wlk_ctx;
    }
    can nlu {}
    can process {
        if (visitor.wlk_ctx["prev_state"]): visitor.wlk_ctx["respond"] = true;
        else {
            visitor.wlk_ctx["next_state"] = net.root();
            visitor.wlk_ctx["prev_state"] = here;
        }
    }
    can nlg {}
}

Note that the logic for init_wlk_ctx and the default process logic have been hoisted up into cai_state as they are shared by the dialogue system and FAQ system. You can remove these two abilities from dialogue_state node, as it will be inheriting them from cai_state now.

We then update the definition of dialogue_state in dialogue.jac to inherit from cai_state:

node dialogue_state:cai_state{
    // Rest of dialogue_state code remain the same
}

Before we move on, we will take a quick detour to introduce multi-file jac program and how import works in jac.

Multi-file Jac Program and Import

Jac's support for multi-file is quite simple. You can import object definitions from one jac file to another with the import keyword. With import {*} with "./code.jac", everything from code.jac will be imported, which can include nodes, edges, graph and walker definition. Alternaitvely, you can import specific objects with import {node::state} with "./code.jac".

To compile a multi-file Jac program, you will need one jac file that serves as the entry point of the program. This file need to import all the neccessary components of the program. Chained importing is supported.

Once you have the main jac file (let's call it main.jac), you will need to compile it and its imports into a single .jir file. jir here stands for Jac Intermediate Representation. To compile a jac file, use the jac build command

jaseci > jac build main.jac

If the compilation is successful, a .jir file with the same name will be generated (in this case, main.jir). jir file can be used with jac run or jac dot the same way as the jac source code file.

Note

The jir format is what you will use to deploy your jac program to a production jaseci instance.

Unify FAQ + Dialogue Code

For faq_state, we need to now define the nlu and nlg node abilities for FAQ. So let's update the following in faq.jac

First, faq_root

node faq_root:cai_state {
    can use.qa_classify;
    can nlu {
        if (!visitor.wlk_ctx["prev_state"]) {
            answers = -->.answer;
            best_answer = use.qa_classify(
                text = visitor.question,
                classes = answers
            );
            visitor.wlk_ctx["intent"] = best_answer["match"];
        }
    }
    can process {
        if (visitor.wlk_ctx["prev_state"]): visitor.wlk_ctx["respond"] = true;
        else {
            for n in --> {
                if (n.context["answer"] == visitor.wlk_ctx["intent"]){
                    visitor.wlk_ctx["next_state"] = n;
                    break;
                }
            }
            visitor.wlk_ctx["prev_state"] = here;
        }
    }
    can nlg {
        visitor.response = "I can answer a variety of FAQs related to Tesla. What can I help you with?";
    }
}

At this point, if you have been following this journey along, this code should be relatively easy to understand. Let's quickly break it down.

  • For FAQ, the nlu logic uses the USE QA model to find the most relevant answer. Here we are re-using the intent field in the walker context to save the matched answer. You can also opt to create another field dedicated to FAQ NLU result.
  • For the traversal logic, this is very similar to the previous FAQ logic, i.e. find the faq_state node connected to here that contains the most relevant answer.
  • for n in --> iterates through all the nodes connected with an outgoing edge from the current node. You can use .context on any node variables to access its variables.

And the logic for the faq_state that contains the answer is relatively simple;

node faq_state:cai_state {
    has question;
    has answer;
    can nlg {
        visitor.response = here.answer;
    }
}

With these new nodes created, let's update our graph definition. We have renamed our graph to be tesla_ai and the dialogue.jac file to tesla_ai.jac.

graph tesla_ai {
    has anchor dialogue_root;
    spawn {
        dialogue_root = spawn node::dialogue_root;
        test_drive_state = spawn node::test_drive_state;
        td_confirmation = spawn node::td_confirmation;
        td_confirmed = spawn node::td_confirmed;
        td_canceled = spawn node::td_canceled;

        dialogue_root -[intent_transition(intent="test drive")]-> test_drive_state;
        test_drive_state -[intent_transition(intent="cancel")]-> td_canceled;
        test_drive_state -[entity_transition(entities=["name", "address"])]-> td_confirmation;
        test_drive_state -[intent_transition(intent="provide name or address")]-> test_drive_state;
        td_confirmation - [intent_transition(intent="yes")]-> td_confirmed;
        td_confirmation - [intent_transition(intent="no")]-> test_drive_state;
        td_confirmation - [intent_transition(intent="cancel")]-> td_canceled;

        faq_root = spawn graph::faq;
        dialogue_root -[intent_transition(intent="i have a question")]-> faq_root;
    }
}

One thing worth pointing out here is that we are spawning a graph inside a graph spawn block.

Our graph should now looks like this!

Here comes the biggest benefit of our unified node architecture -- the exact same walker logic can be shared to traverse both systems. The only change we need to make is to change from dialogue_state to cai_state to apply the walker logic to a more generalized set of nodes.

walker talk {
    ...
    root {
        take --> node::dialogue_root;
    }
    cai_state {
        if (!question) {
            question = std.input("Question (Ctrl-C to exit)> ");
            here::init_wlk_ctx;
        }
        ...
    }
}

Update the graph name in the init walker as well.

walker init {
    root {
        spawn here --> graph::tesla_ai;
        spawn here walker::talk;
    }
}

To compile the program,

jaseci > jac build tesla_ai.jac

As mentioned before, if the compiliation succeedd, a tesla_ai.jir will be generated.

Note

Run into issues at this build step? First check if all the imports are set up correctly.

Running a jir is just like running a jac file

jaseci > jac run tesla_ai.jir

One last step, since we introduce a new intent i have a questions, we need to update our classifier model again. This time, use the clf_train_3.json example training data.

Note Make sure so save your model again so you can return to it in a new seesion!

The model is trained? Great! Now run the jir and try questions like "I have some tesla related questions" then following with FAQ questioins!

Congratulations! You have created a single conversational AI system that is capable of answering FAQs and perform complex multi-step actions.

Bring Your Application to Production

Typing in questions and getting responses via jsctl in terminal is a quick and easy way of interactively test and use your program. But the ultimate goal of building any products is to eventually deploying it to production and having it serve real users via standard interface such as RESTful API endpoints. In this section, we will cover a number of items related to bringing your jac program to production.

Introducing yield

yield is a jac keyword that suspend the walker and return a response, which then can be resumed at a later time with the walker context retained. Walker context includes its has variables and its node traversal plan (i.e., any nodes that have been queued by previously executed take statements). This context retention is done on a per-user basis. yield is a great way to maintaining user-specific context and history in between walker calls. To learn more about yield, refer to the relevant sections of the Jaseci Bible.

In the case of our conversational AI system, it is essential for our walker to remember the context information gained from previous interactions with the same user. So let's update our walker with yield.

walker talk {
    has question, interactive = false;
    has wlk_ctx = {
        "intent": null,
        "entities": {},
        "prev_state": null,
        "next_state": null,
        "respond": false
    };
    has response;
    root {
        take --> node::dialogue_root;
    }
    cai_state {
        if (!question and interactive) {
            question = std.input("Question (Ctrl-C to exit)> ");
            here::init_wlk_ctx;
        } elif (!question and !interactive){
            std.err("ERROR: question is required for non-interactive mode");
            disengage;
        }
        here::nlu;
        here::process;
        if (visitor.wlk_ctx["respond"]) {
            here::nlg;
            if (interactive): std.out(response);
            else {
                yield report response;
                here::init_wlk_ctx;
            }
            question = null;
            take here;
        } else {
            take visitor.wlk_ctx["next_state"] else: take here;
        }
    }
}

Two new syntax here:

  • report returns variable from walker to its caller. When calling a walker via its REST API, the content of the API response payload will be what is reported.
  • yield report is a shorthand for yielding and reporting at the same time. This is equivalane to yield; report response;.

Introduce sentinel

sentinel is the overseer of walkers, nodes and edges. It is the abstraction Jaseci uses to encapsulate compiled walkers and architype nodes and edges. The key operation with respesct to sentinel is "register" a sentinel. You can think of registering a sentinel as a compiling your jac program. The walkers of a given sentinel can then be invoked and run on arbitrary nodes of any graph.

Let's register our jac program

jaseci > sentinel register tesla_ai.jir -set_active true -mode ir

Three things are happening here:

  • First, we registered the jir we compiled earlier to new sentinel. This means this new sentinel now has access to all of our walkers, nodes and edges. -mode ir option speciifes a jir program is registered instead of a jac program.
  • Second, with -set_active true we set this new sentinel to be the active sentinel. In other words, this sentinel is the default one to be used when requests hit the Jac APIs, if no specific sentinels are specified.
  • Third, sentinel register has automatically creates a new graph (if no currently active graph) and run the init walker on that graph. This behavior can be customized with the options -auto_run and -auto_create_graph.

To check your graph

jaseci > graph get -mode dot

This will return the current active graph in DOT format. This is the same output we get from running jac dot earlier. Use this to check if your graph is successfully created.

Once a sentinel is registered, you can update its jac program with

jaseci > sentinel set -snt SENTINEL_ID -mode ir tesla_ai.jir

To get the sentinel ID, you can run one of the two following commands

jaseci > sentinel get

or

jaseci > sentinel list

sentinel get returns the information about the current active sentinel, while sentinel list returns all available sentinels for the user. The output will look something like this

{
  "version": null,
  "name": "main.jir",
  "kind": "generic",
  "jid": "urn:uuid:817b4ff4-e6b7-4296-b383-55515e1e8b4a",
  "j_timestamp": "2022-08-04T20:23:16.952641",
  "j_type": "sentinel"
}

The jid field is the ID for the sentinel. (jid stands for jaseci ID).

With a sentinel and graph, we can now run walker with

jaseci > walker run talk -ctx "{\"question\": \"I want to schedule a test drive\"}"

And with yield, the next walker run will pick up where it leaves off and retain its variable states and nodes traversal plan.

Tests

Just like any program, a set of automatic tests cases with robust coverage is essential to the success of the program through development to production. Jac has built-in tests support and here is how you create a test case in jac.

import {*} with "tesla_ai.jac";

test "testing the Tesla conv AI system"
with graph::tesla_ai by walker::talk(question="Hey I would like to go on a test drive"){
    res = std.get_report();
    assert(res[-1] == "To set you up with a test drive, we will need your name and address.");
}

Let's break this down.

  • test "testing the tesla conv AI system" names the test.
  • with graph::tesla_ai specify the graph to be used as the text fixture.
  • by walker::talk specify the walker to test. It will be spawned on the anchor node of the graph.
  • std.get_report() let you access the report content of the walker so that you can set up any assertion neccessary with assert.

To run jac tests, save the test case(s) in a file (say tests.jac) and import the neccessary walkers and graphs. Then run

jaseci > jac test tests.jac

This will execute all the test cases in tests.jac squentially and report success or any assertion failures.

Running Jaseci as a Service

So far, we have been interacting jaseci through jsctl. jaseci can also be run as a service serving a set of RESTful API endpoints. This is useful in production settings. To run jaseci as a service, first we need to install the jaseci-serv python package.

pip install jaseci-serv

Important

As a best practice, it is recommended to always use the same jsctl version (installed as part of the jaseci package) and jsserv version (installed with the jaseci-serv python package). You can install a specific version of either package via pip install PACKAGE_NAME==PACKAGE_VERSION.

Since this is the first time we are running a jaseci server, a few commands are required to set up the database

jsserv makemigrations base
jsserv makemigrations
jsserv migrate

The above commands essentially initializes the database schemas.

Important

The above commands are only required the first time you are starting a jsserv instance. These commands will create a mydatabase file in the current directory as the storage for the database.

We will also need an admin user so we can log into the jaseci server. To create an admin user, run

jsserv createsuperuser

And follow the command line prompts to create a super user. For the purpose of this codelab, we are going to use the following credentials:

Email: admin@j.org
Password: JaseciAdmin

Then launch the jaseci server with

jsserv runserver

You should see an output that looks like the following

$ jsserv runserver
Watching for file changes with StatReloader
Performing system checks...
System check identified no issues (0 silenced).
October 24, 2022 - 18:27:14
Django version 3.2.15, using settings 'jaseci_serv.jaseci_serv.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Take note of the http://127.0.0.1:8000/. This is the URL of your jsserv instance. In this case, 127.0.0.1 means it is live on localhost.

To access the server via jsctl, we just need to login to the server first before running any jsctl commands

jsctl
jaseci > login http://localhost:8000/
Username:
Password:

Follow the prompts the provide the email and password you used to create the superuser earlier. In our case, it will be

jsctl
jaseci > login http://localhost:8000/
Username: admin@j.org
Password: JaseciAdmin

If logged in successfully, you should see a token being returned. It will look something like this

jaseci > login http://localhost:8000
Username: yiping@jaseci.org
Password:
Token: 45ef2ac9d07aa571769c7d5452e4553a8a74b061ea621e21222789aa9904e8c7
Login successful!
@jaseci >

Important

Notice the @ symbol in front of the @jaseci > command line prompt. This indicates that your jsctl session is now logged into a jsserv instance, while jaseci > indicates it is in a local session.

While logged into the jsserv instance, you can register a sentinel on it with sentinel register command, just like how you were running it before in the local jsctl session

@jaseci > sentinel register tesla_ai.jir -set_active true -mode ir

After registering, you can then run walker run, just like before in a local jsctl session

@jaseci > walker run talk -ctx "{\"question\": \"I want to schedule a test drive\"}"

Important

If this is the first time you are running your jac program on this jsserv instance, you will also need to repeat the actions load commands to load the actions. And for any AI models, use their respective load_model action to load the trained models.

And viola! Now you are running your jac program in a jaseci server with jsserv. The Jaseci server supports a wide range of API endpoints. All the jsctl commands we have used throughput this tutorial have an equivalent API endpoint, such as walker_run and sentinel_register. As a matter of fact, the entire development journey in this tutorial can be done completely with a remote jaseci server instance. You can go to localhost:8000/docs to check out all the available APIs.

Note

jsserv is a wrapper around the django manage.py script and it supports the commands that manage.py supports.

Note

So far we have shown how to run the jaseci server natively on a machine. If you wish to stand up a jaseci server in a kubernetes cluster, you can find an example kubernetes manifest file at https://github.com/Jaseci-Labs/jaseci/blob/main/scripts/jaseci.yaml

Manage Sentinels on the Jaseci Server

Because we have been running our jac program in a local jsctl environment and under a development setting, we only need to worry about one single user. In a production setting, our application need to serve many users. In addition, new accounts will be created when new users sign up for our application. While it is certainly possible to register a new sentinel for every new user, it is far from ideal from a scalability standpoint. To solve this, we introduce the concept of the global sentinel.

A global sentinel is a sentinel that is exposed globally and can be set as the active sentinel for any users. To create a global sentinel, you need to login as a user with admin priviledge so we are going to use the superuser we created earlier. We are going to first register a sentinel and name it tesla_ai_global.

@jaseci > sentinel register -set_active true -mode ir tesla_ai.jir -name tesla_ai_global

To set a sentinel as the global sentinel for the jaseci server, we run

@jaseci > global sentinel set -snt sentinel:tesla_ai_global

Now that this sentinel is the global sentinel, to activate this sentinel as the active sentinel for any given user, run the following while logged in as the user

@jaseci > sentinel active global

This will set the active sentinel for the user to the global sentinel, which in our case is named tesla_ai_global. Once this is set, any future walker run requests will by default use the global sentinel.

To update the global sentinel, run as the admin user

@jaseci > sentinel set -snt sentinel:tesla_ai_global -mode ir tesla_ai.jir

This will update the gloabl sentinel with the updated jir code and because this is a global sentinel, any users that have the global sentinel set as their active sentinel will also effectively be running with an updated sentinel.

Manage Graphs

Now that we have the global sentinel set up, it's time to discuss the management of graphs in a production environment. Let's first create two general categories of application, un-authenticated and authenticated applications. Un-authenticated applications do not require any user authentication and can be used by anyone in the world as long as they access the URL of the application. This category of applications usually serve functions that can benefit a wide range of audience but lack any personalized usage due to lack of access to personal data that is only possible with authentication. Example of un-authenticated applications include chatbot that provides public service annoucement or information that are useful to the general public. In contrast, authenticated applications require user to log in first before they can use the application. The application then can fetch information and data specific to the user that is interacting with the application. Example of authenticated applications include a virtual assistant within your bank app that you can use about your account balance or your recent account transactions.

Both un-authenticated and authenticated applications can utilize the global sentinel pattern. Developers of the application register and update a global sentinel for all users. On the other hand, depending on the application, the management of graphs can be different. For un-authenticated applications, the front-end of the application is often integrated with the backend with a service account. All users will be effectively using the graph of this service account. For authenticated applications, each user will often have a corresponding account in the jaseci backend and their own graph. Therefore, a graph needs to be created and set as active for every new user creation. To create a graph, run

@jaseci > graph create -set_active true

And depending on the application, you might also need to initialize the graph with the init walker

@jaseci > walker run init

Note

Our tesla bot falls into the category of un-authenticated applications. Anyone can ask about FAQs and schedule a test drive. So for this tutorial, we will use the admin account as our service account and its graph as the active graph

The Jaseci RESTful APIs

With a jaseci web server (i.e., jsserv), we also get access to the full suite of Jaseci RESTful APIs. You can go to http://localhost:8000/docs/ to checkout the list of available APIs and their documentation and request and response payload format. The documentation looks like this

Every jsctl command has a corresponding API endpoint. Click on the triangle to the right of the endpoint to see details on its request and response format. The command line argument to the jsctl command becomes the fields in the request payload of the API endpoint. jsctl is great for rapid development and testing but for deploying a jaseci application to production, a set of RESTful API endpoints is the industry standard. Here are the most commonly used endpoints that you should pick up and get familarized first.

  • /user/create to create a new user
  • /js/sentinel_register to register a sentinel
  • /js/walker_run to run a walker

Integration with a Webapp

We are now going to show an example of how to integrate with a frontend to our tesla chatbot. We have provided a template web-based frontend at https://github.com/Jaseci-Labs/jaseci/blob/main/examples/CanoniCAI/code/html/. This is a simple chatbot frontend that supports both voice and text input. It also coverts the response text to speech and speak it back.

Here is a screenshot of the UI. You can click on the microphone button to talk to it or use the textbox below for a text input.

The web frontend communciates with the Jaseci backend via HTTP requests. Here is the relevant code where the frontend makes a POST request to the /js/walker_run API to run the talk walker to ask a question.

const getAnswer = async (question) => {
      let data = {
        ctx: { "question": question },
        name: "talk"
      }

      try {
        // NOTE: Change this URL to your Jaseci server URL.
        // NOTE: Change the token to your authenticated token
        let result = await fetch('http://localhost:8000/js/walker_run', {
          method: 'POST',
          mode: 'cors',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': 'token bf6c3138799af356cbec27da90de0f7476fd9e25059c83dc0dfdd339ff68dd5b'
          },
          body: JSON.stringify(data),
        })
        result = await result.json();

        const answer = result.report[0];

        document.querySelector('#answer').innerHTML = answer;
        speech.text = answer.replace(/https?.*?(?= |$)/g, "");
        var voices = window.speechSynthesis.getVoices();
        speech.voice = voices[7];
        speechSynthesis.getVoices().forEach(function (voice) {
          console.log(voice.name, voice.default ? voice.default : '');
        });

        // Start Speaking
        window.speechSynthesis.speak(speech);

      } catch (error) {
        console.log(error)
      }
    }

Let's quickly dissect this API call.

  • It is sending a POST request, as specified by the method field.
  • It is sending the request to the URL http://localhost:8000/js/walker_run. You should replace the localhost:8000 with your own jaseci server URL.
  • The request payload is a JSON data structured stored in data, as follows. The name field specifies the name of the walker to run and ctx is a dictionary containing all neccessary parameters to the walker, just like what we have been doing with walker run in jsctl. In this webapp, the question is being pulled from the frontend from either the Speech-to-text engine or the text input.
{
    "ctx": {
        "question": "USER QUESTION",
    },
    "name": "talk"
}

Note

You need to update the webapp to point to your own jaseci server URL (line 360 of index.html) as well as an updated authentication token (line 365 of index.html) which can you obtain from logging in via jsctl.

Improve Your AI Models with Crowdsource

Coming soon!

Build a Custom Jaseci Module

In this tutorial, you are going to learn how to build a custom jaseci module with python. In this application I will teach you how to create a basic calculator module for jaseci.

Excited? Hell yeah! Let's jump in.

Preparation

Let's start by creating a folder called calculator in your root directory of your application. After creating the folder let's create a file name calculator.py inside of the calculator folder.

Note

we are using python to create the custom jaseci module so you will need .py files and not jac.

After creating the file, open the file in a code editor and let's start coding our module.

from jaseci.actions.live_actions import jaseci_action

First, we will have to import jaseci_actions to the calculator.py file. We will be using jaseci actions to load the module into jaseci.

@jaseci_action(act_group=["timestamp"], allow_remote=True)

In this block:

  • act_group is the name of the jaseci action group called when loading a the module.
  • allow_remote indicates whether you want this action to be run remotely or not.

We will be adding onto the file.

@jaseci_action(act_group=["timestamp"], allow_remote=True)
def add(first_number: int, second_number: int):
    return first_number + second_number

What this functions does, it adds the two numbers from the parameter and returns the sum of each number.

Note

Practice adding data type to the parameters for e.g. first_number: int because jaseci_actions use this as validation, remotely and also through the jaseci application.

Loading the custom module (API)

In this section, I will run you through how to load the custom module through the API.

> uvicorn calculator:serv_actions 

We use uvicorn to run modules remotely.

Note

calculator is folder name and the path in which the module is located and serv_actions will allow you to run all functions remotely at one time.

←[33mWARNING←[0m:  ASGI app factory detected. Using it, but please consider setting the --factory flag explicitly.
←[32mINFO←[0m:     Started server process [←[36m15604←[0m]
←[32mINFO←[0m:     Waiting for application startup.
←[32mINFO←[0m:     Application startup complete.
←[32mINFO←[0m:     Uvicorn running on ←[1mhttp://127.0.0.1:8000←[0m (Press CTRL+C to quit)

You will see something like this and if it shows this you are ready to test out the jaseci custom module.

Go to http://localhost:8000/docs and you can test out your module to see if it works remotely.

Loading the custom module (JAC)

In this section, I will run you through how to load the custom module through the Jac application.

> actions load local calculator/calculator.py

Since we are creating our own module we have to use the term local instead of module or remote. After local is the path to where the module is located.

{
  "success": true
}

You should see this after running the command. If you see this you have successfully build a custom module using jac with Jaseci.

How to use the custom module (JAC)

In this section we will show you how to use the custom module in the jac application.

Create a file name main.jac and add the following code.

walker init {
    can calculator.add;
    report calculator.add(1,1);
}

This allows you to load the module can (act_group created)(function created for the act_group);

can calculator.add;

We will report the result from the calculation.

report calculator.add(1,1);

The following will be the result after running the init walker.

{
  "success": true,
  "report": [
    2
  ],
  "final_node": "urn:uuid:04e97f70-26b3-467e-a291-bd03b18e7a6d",
  "yielded": false
}

Once you see that status it means that everything is working perfectly. Simple right! Hope you learn't something new today.

Creating A Custom AI Jaseci Module

In this section, we will be creating a t5 based summarization module for jaseci. So let's get started.

Imports

import torch
from jaseci.actions.live_actions import jaseci_action
from transformers import T5Tokenizer, T5ForConditionalGeneration  # , T5Config

In this block:

  • We have imported the package torch: An open source machine learning framework that accelerates the path from research prototyping to production deployment.
  • We also imported jaseci_action so we can use it's functionalities to attach it to the jac application.
  • Since we are creating a summarization module called t5 it comes with it modules in the form of transformers.

Brining in models

model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
device = torch.device("cpu")

In this block:

  • we will be using the t5-small pretrained model because these models can be very big.

Generating summary based on text

def t5_generate_sum(text, min_length, max_length):

Here, we will be creating a function that generates a summary, we will be intaking parameters such as text (which will be the body of data you want to summarize), min_length (this will be the minimum of words you would like the summarization model to spit out), max_length (which will be the maximum of being returned from the summarization model). So let's get to the next line.

preprocess_text = text.strip().replace("\n", "")

This will help us remove new line from any body of text that the user might have inputed to the model. This can mess up the model and return a ugly comprehensive data of the text.

t5_prepared_Text = "summarize: " + preprocess_text

The T5 summarization model requires that you append summarize infront the body of text used to summarize.

tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)

This will encode the text so the AI model can understand and process it.

summary_ids = model.generate(
        tokenized_text,
        num_beams=4,
        no_repeat_ngram_size=2,
        min_length=min_length,
        max_length=max_length,
        early_stopping=True,
    )

Using the T5 model this will generate the summary based on the paramater we passed in min_length, max_length, tokenized_text and etc.

output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
return output

Here, based on the result from the encoded summary generated from the AI model we will decode the summarized version of the encoded body of text and return it to the user.

Function to return summary to Jac or API

@jaseci_action(act_group=["t5_sum"], allow_remote=True)
def classify_text(text: str, min_length: int = 30, max_length: int = 100):
    output = t5_generate_sum(text, min_length, max_length)
    return output

In this block:

  • Since we created a function which generates the summary. we need a jaseci action function that will bind the summarization module to jac and to the API.
  • here we called the action group t5_sum.

Full Code

import torch
from jaseci.actions.live_actions import jaseci_action
from transformers import T5Tokenizer, T5ForConditionalGeneration  # , T5Config

# from fastapi import HTTPException

model = T5ForConditionalGeneration.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
device = torch.device("cpu")


# generates summary based on text
def t5_generate_sum(text, min_length, max_length):
    preprocess_text = text.strip().replace("\n", "")
    t5_prepared_Text = "summarize: " + preprocess_text

    tokenized_text = tokenizer.encode(t5_prepared_Text, return_tensors="pt").to(device)

    summary_ids = model.generate(
        tokenized_text,
        num_beams=4,
        no_repeat_ngram_size=2,
        min_length=min_length,
        max_length=max_length,
        early_stopping=True,
    )

    output = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return output


# summarize a large body of text using t5 model (small model)
# which returns data at a fast rate.
@jaseci_action(act_group=["t5_sum"], allow_remote=True)
def classify_text(text: str, min_length: int = 30, max_length: int = 100):
    output = t5_generate_sum(text, min_length, max_length)
    return output

Once you have completed these steps, load the module using the actions load local command as shown below:

> actions load local path/to/t5_sum.py

Good luck! This is how you create a custom AI module using Python in Jaseci.

Running Action Library as Independent Server

In this tutorial, we will discuss how to use uvicorn to stand up a Jaseci action library as an independet server. We use the jaseci_ai_kit for this example.

There are two ways to stand up a jaseci_ai_kit server, we will explore those in the following section.

Through the installed jaseci_ai_kit pip package

After installing jaseci_ai_kit package.

  • pip install jaseci_ai_kit

Run the following command to stand up the server.

  • uvicorn jaseci_ai_kit.[ai_model]:serv_actions

We will stand up the cl_summer (summarization AI model). The command is as follow:

  • uvicorn jaseci_ai_kit.cl_summer:serv_actions

Once run it should look like this once successful.

←[32mINFO←[0m:     Started server process [←[36m13024←[0m]
←[32mINFO←[0m:     Waiting for application startup.
←[32mINFO←[0m:     Application startup complete.
←[32mINFO←[0m:     Uvicorn running on ←[1mhttp://127.0.0.1:8000←[0m (Press CTRL+C to quit)

To use it in your jaseci app, you will have to load the action using the following command.

  • actions load remote http://127.0.0.1:8000

Let's walk you through the other way.

Through the repo

After you download the jaseci repo from git, the first step you will have to do is change the directory to one location below.

  • cd jaseci_kit/jaseci_kit/modules/cl_summer

Note

You can use this process to stand up custom jaseci modules

To stand it up locally you will run the following command in terminal:

  • uvicorn [name_of_file]:serv_actions in this case uvicorn cl_summer:serv_actions

If you don't need the default port or host you can use this command:

  • uvicorn cl_summer:serv_actions --host 0.0.0.0 --port 9000

After you run the command it should look like this, once it is successful.

←[33mWARNING←[0m:  ASGI app factory detected. Using it, but please consider setting the --factory flag explicitly.

←[32mINFO←[0m:     Started server process [←[36m42808←[0m]
←[32mINFO←[0m:     Waiting for application startup.
←[32mINFO←[0m:     Application startup complete.
←[32mINFO←[0m:     Uvicorn running on ←[1mhttp://0.0.0.0:9000←[0m (Press CTRL+C to quit)

After you see the server started you will have to go to the browser and run this domain:

  • http://0.0.0.0:9000/docs

To use it in your jaseci app, you will have to load the action using the following command.

  • actions load remote http://0.0.0.0:9000

Congratulations, you have successfully use uvicorn to stand up a jaseci_ai_kit server locally.

Request Action Library

How to use the built in requests action library to make outgoing API calls from Jac to 3rd party API's.

Introduction

First of all there are 4 types of request actions in jaseci, they are as follows:

  • POST
  • GET
  • PUT
  • DELETE

There are three standard parameters for each requests and they are as follows:

  • url (string): the url where you will make the request
  • data (dict): the information that is required to pass in the request
  • header (dict): this will be the header information for the request

GET

GET is used to request data from a specified resource.

request.get(url, data, headers);

POST

POST is used to send data to a specified resource.

request.post(url, data, headers);

PUT

PUT is used to update data from a specified resource.

request.put(url, data, headers);

DELETE

DELETE is used to delete data from a specified resource.

request.delete(url, data, headers);

EXAMPLE

So let's create a quick SIMPLE RESTFUL application example (TODO) from JSON PlACEHOLDER and you can add this to CanoniCAI and use it.

walker get_todo {
    has uid;
    has title;
    has completed = false;
    
    has url = "https://jsonplaceholder.typicode.com/todos/1";
    has headers = {};

    report request.get(url, {}, headers);
}

walker post_todo {
    has uid;
    has title;
    has completed = false;
    
    has url = "https://jsonplaceholder.typicode.com/todos/";
    has headers = {};
    
    report request.post(url, {"userId": uid, "title": title, "completed": completed}, headers);
}

walker put_todo {
    has uid;
    has title;
    has completed = false;
    
    has url = "https://jsonplaceholder.typicode.com/todos/1";
    has headers = {};


    report request.put(url, {"userId": uid, "title": title, "completed": completed}, headers);
}

walker delete_todo {
    has uid;
    has title;
    has completed = false;
    
    has url = "https://jsonplaceholder.typicode.com/todos/1";
    has headers = {};

    report request.delete(url, {}, headers);
}

Let's test the application we build, create a file name api.jac and copy over all the code to the file. Great let's run each walker.

First let's run get_todo walker:

  • jac run api.jac -walk get_todo

Let's see the result:

jaseci > jac run api.jac -walk get_todo  
{
  "success": true,
  "report": [
    {
      "status_code": 200,
      "response": {
        "userId": 1,
        "id": 1,
        "title": "delectus aut autem",
        "completed": false
      }
    }
  ],
  "final_node": "urn:uuid:12a5affa-c0a2-4959-9e3d-54e3f4cd4ca1",
  "yielded": false
}

post_todo walker:

  • jac run api.jac -walk post_todo
jaseci > jac run api.jac -walk post_todo
{
  "success": true,
  "report": [
    {
      "status_code": 201,
      "response": {
        "userId": null,
        "title": null,
        "completed": false,
        "id": 201
      }
    }
  ],
  "final_node": "urn:uuid:eaca3dfa-3abc-4c03-9d55-d2e5cdf1b1e6",
  "yielded": false
}

put_todo walker:

  • jac run api.jac -walk put_todo -ctx "{"id": 201, "title":"hi"}"
jaseci > jac run api.jac -walk put_todo -ctx "{\"id\": 201, \"title\":\"hi\"}" 
{
  "success": true,
  "report": [
    {
      "status_code": 200,
      "response": {
        "userId": null,
        "title": "hi",
        "completed": false,
        "id": 1
      }
    }
  ],
  "final_node": "urn:uuid:393a0094-57d5-4745-944d-9fac007edc38",
  "yielded": false
}

delete_todo walker:

  • jac run api.jac -walk delete_todo
jaseci > jac run api.jac -walk delete_todo
{
  "success": true,
  "report": [
    {
      "status_code": 200,
      "response": {}
    }
  ],
  "final_node": "urn:uuid:6f90aac5-7284-48c7-b5df-1abc64dfdf10",
  "yielded": false
}

So now everything is working. You can now implement this setup to your code implementation and have fun.

All you have to do now is build and set the sentinel and call these walker and you will be able to use the functionality.

Key Language Level Abstractions and Concepts

There are a number of abstractions and concepts in Jac that is distinct from most (all?) other languages. These would be a good place to begin understanding for a seasoned / semi-seasoned programmer.

Graphs

The graph is the only data structure used here. Jaseci believes that every computational problem can be mapped into a graph structure and can be solved by traversing and executing nodes in the graph.

Walkers

A walker is an execution unit that moves(traverses) through a graph while preserving its state (its local scope). There have never been any programming languages with semantics like this one. You can imagine a walker as a little, self-contained robot that can maintain context while it moves spatially around a graph, interacting with the context of its nodes and edges.

Abilities

Nodes and edges in the graph also the walkers can have abilities. Although they don’t have the same semantics as a typical function, Abilities are most nearly comparable to methods in traditional object-oriented programming. You can imagine abilities as independent in-memory/in-data computing activities when employing them.

Actions

Actions serve as function calls with returns and enable bindings to the functionality described outside of Jac/Jaseci. These are comparable to library calls in conventional programming languages. In reality, this external functionality takes the form of a Jaseci action library’s direct connection to Python implementations

Graphs

In Jaseci, we elect to assume the following semantics for the graphs in Jaseci:

  1. Graphs are directed with a special case of a doubly directed edge type which can be utilized practically as an undirected edge.
  2. Both nodes and edges have their own distinct identities (i,e. an edge isn’t representable as a pairing of two nodes). This point is important as both nodes and edges can have contexts.
  3. Multigraphs (i.e., parallel edges) are allowed, including self-loop edges.
  4. Graphs are not required to be acyclic.
  5. No hypergraphs, as I wouldn’t want Jaseci programmers heads to explode.

Walkers

One of the most important abstractions introduced in Jaseci is that of the walker. The semantics of this abstraction is unlike any that has existed in any programming language before.

In a nutshell, a walker is a unit of execution that retains state (its local scope) as it travels over a graphs. Walkers walk from node to node in the graph and executing its body. The walker’s body is specified with an opening and closing braces ( { } ) and is executed to completion on each node it lands on. In this sense a walker iterates while spooling through a sequence of nodes that it ‘takes’ using the take keyword. We call each of these iterations node-bound iterations.

Variables declared in a walker’s body takes two forms: its context variables, those that retain state as it travels from node to node in a graph, and its local variables, those that are reinitialized for each node-bound iterations.

Walkers present a new way of thinking about programmatic execution distinct from the near-ubiquitous function based asbtraction in other languages. Instead of a functions scope being temporally pushed onto an ever increasing stack as functions call other functions. Scopes can be spacially laid out on a graph and walkers can hop around the graph taking its scope with it. A key difference in this model is in its introduction of data spacial problem solving. In the former function-based model scopes become unaccessible upon the sub-call of a function until that function returns. In contrast, walkers can access any scope at any time in a modular way.

When solving problems with walkers, a developer can think of that walker as a little self- contained robot or agent that can retain context as it spacially moves about a graph, interacting with the context in nodes and edges of that graph.

In addition to the introduction of the take command to support new types of control flow for node-bound iterations. The keywords and semantics of disengage, skip, and ignore are also introduced. These instruct walkers to stop walking the graph, skip over a node for execution, and ignore certain paths of the graph.

Understanding Walkers by Example

When we run a jac code, by default it's exucuting the init walker. Basically the walker init works as the main method in other programming language. save following code as main.jac and run the code in jsctl shell with jac run main.jac

Example 1:

walker init{
    std.out("This is from init walker \n");
}

Output 1:

    This is from init walker

As you can see, this code has executed the init walker. Now let's create another walker;

Output 2:

walker second_walker{
    std.out("This is from second walker \n");
}

walker init{
    std.out("This is from init walker");
    root{
        spawn here walker::second_walker;
    }
}

Output 2:

    This is from init walker
    This is from second walker

The statements from second walker and init are printed in the jac shell, and we may run just second_walker directly by using the command jac run main.jac -walk second_walker. Here, the -walk parameter instructs the jsctl to execute a certain walker.

Walkers Navigating Graphs Examples

As mentioned earlier the walkers can traverse(walk) through the nodes of the graph in breadth first search (BFS) or depth first search(DFS) approaches.

Note

BFS is a traversal approach in which begins from root node and walk through all nodes on the same level before moving on to the next level. DFS is also a traversal approach in which the traverse begins at the root node and proceeds through the nodes as far as possible until we reach the node with no unvisited nearby nodes.

We are creating the following graph to demostrate traversing of walkers in comming sections;

Example Graph - Navigating

Jaseci introduces the handy command called "take" to instruct walker to navigate through nodes. See how that works in following example;

Example 3:

node plain: has number;

## defining the graph
graph example {
    has anchor head;
    spawn {
        n=[];
        for i=0 to i<7 by i+=1 {
            n.l::append(spawn node::plain(number=i+1));
        }

        n[0] --> n[1] --> n[2];
        n[1] --> n[3];
        n[0] --> n[4] --> n[5];
        n[4] --> n[6];
        head=n[0];
        }
    }

#init walker traversing
walker init {
    root {
        start = spawn here --> graph::example;
        take-->;
        }
    plain {
        std.out(here.number);
        take-->;
    }
}

Output 3:

1
2
5
3
4
6
7

take command lets the walker travers through graph nodes. You may notice by default, a walker travers with take command using the breadth first search approach. But the take command is flexible hence you can indicate whether the take command should use a depth first or a breadth first traversal to navigate. Look at the following example;

Example 4:

node plain: has name;

## defining the graph
graph example {
    has anchor head;
    spawn {
        n=[];
        for i=0 to i<7 by i+=1 {
        n.l::append(spawn node::plain(name=i+1));
        }
        n[0] --> n[1] --> n[2];
        n[1] --> n[3];
        n[0] --> n[4] --> n[5];
        n[4] --> n[6];
        head=n[0];
        }
    }

## walker for breadth first search
walker walk_with_breadth {
    has anchor node_order = [];
    node_order.l::append(here.name);
    take:bfs -->; #can be replaced with take:b -->
    }

walker walk_with_depth {
    has anchor node_order = [];
    node_order.l::append(here.name);
    take:dfs -->; #can be replaced with take:d -->
    }

walker init {
    start = spawn here --> graph::example;
    b_order = spawn start walker::walk_with_breadth;
    d_order = spawn start walker::walk_with_depth;
    std.out("Walk with Breadth:",b_order,"\nWalk with Depth:",d_order);
    }

Output 4:

Walk with Breadth: [1, 2, 5, 3, 4, 6, 7]
Walk with Depth: [1, 2, 3, 4, 5, 6, 7]

You may see in the above example take:bfs--> and take:dfs -- commands instruct walker to traverse breadth first search or depth first search accordingly. Additionally, to define breadth first or depth first traversals, can use the short hand of take:b --> or take:d —>.

Skipping and Disengaging

Jac offers couple of more useful control statements that are pretty convenient, skip and disengage, with walker traversing graphs with take commands.

Skipping

The idea behind the abstraction of skip in the context of a walkers code block is that it tells a walker to halt and abandon any unfinished work on the current node in favor of moving to the next node (or complete computation if no nodes are queued up).

Note

Node/edge abilities also support the usage of the skip directive. The skip merely decides not to use the remaining steps of that ability itself in this context.

Lets change the init walker of Example 3 to demostrate how the skip command works;

Example 5:

.
.
.

#init walker traversing
walker init {
    root {
        start = spawn here --> graph::example;
        take-->;
        }
    plain {
        ## Skipping the nodes with even numbers
        if(here.number % 2==0): skip;
        std.out(here.number);
        take-->;
    }
}

Output 5:

1
5
7

Now it is evident when the node number is an even number, the code in the example above skips the code execution for the particular node. The line if(here.number %2 ==): skip; says walker to skips nodes with an even number.

The skip command "breaks" out of a walker or ability rather than a loop, but otherwise has semantics that are nearly comparable to the standard break command in other programming languages.

Disengaging

The command disengage tells the walker to stop all execution and "disengage" from the graph (i.e., stop visiting nodes anymore from here) and can only be used inside the code body of a walker.

To demonstrate how the disengage command functions, let's once more utilize the init walker from example 3;

Example 6:

.
.
.

#init walker traversing
walker init {
    root {
        start = spawn here --> graph::example;
        take-->;
        }
    plain {
        ## Stoping execution from the node number equals to 5
        if(here.number==5): disengage;
        std.out(here.number);
        take-->;
    }
}

Output 6

1
2

The init walker in this example is nearly identical to the code in example 5, but we added the condition if(here.numer == 5): disengage;. This caused the walker to halt execution and finish its walk, thus truncating the output array.

Note

In addition to a standard disengage, Jac additionally supports a disengage-report shorthand of the type disengage report "I'm disengaging";. Before the disconnect really takes place, this directive produces a final report.

Technical Semantics of Skip and Disengage

It's important to remember a few key semantic differences between skip and disengage commands.

- The 'skip' statement can be used in the code bodies of walkers and abilities.
- The 'disengage' statement can only be used in the code body of walkers.
- 'skip' and 'disengage' statements have no effect on the block of code that ends with an 'exit'. Any code in a walker's with 'exit' block will start running as soon as the walker exit the graph.
- An easy way to think about these semantics is as similar to the behavior of a traditional return (skip) and a return and stop walking (disengage).

Ignoring and Deleting

The Jaseci walkers also have two more useful commands: ignore and destroy.

Ignoring

The quite handy command ignore from Juseci allows you to skip(ignore) visiting nodes or edges when traversing.

Example 7:

node person: has name;
edge family;
edge friend;

walker build_example {
    spawn here -[friend]-> node::person(name="Joe");
    spawn here -[friend]-> node::person(name="Susan");
    spawn here -[family]-> node::person(name="Matt");
    spawn here -[family]-> node::person(name="Dan");
    }

walker init {
    root {
        spawn here walker::build_example;
    ignore -[family]->;
    ignore -[friend(name=="Dan")]->;
    take -->;
    }
person {
    std.out(here.name);
    take-->;
    }
}

Deleting

To remove nodes or edges from the graph, Jaseci also offers the very useful command "destroy." Run the example that follows using the 'dot' command in the Jac shell. i.e. jac dot main.jac.

Example 8:

node person: has name;
edge family;
edge friend;

walker build_example {
    spawn here -[friend]-> node::person(name="Joe");
    spawn here -[friend]-> node::person(name="Susan");
    spawn here -[family]-> node::person(name="Matt");
    spawn here -[family]-> node::person(name="Dan");
}

walker init {
    root {
        spawn here walker::build_example;
    for i in -[friend]->: destroy i;
    take -->;
    }

person {
    std.out(here.name);
    take-->;
}
}

The majic line in the above code is the for i in -[friend]->: destroy i; it instruct walker to remove all the nodes connected by friend edges. try playing with the code by removing and adding destroy command.

Graph before destroy commandGraph after destroy command
Example Graph - Deleting 1Example Graph 2 - Deleting 2

Note

To visualize the dot output can use the Graphviz. An online version of it is Here.

Reporting Back as you Travel

The report command in jac resembles a conventional programming logging function in certain ways. The state of each node the walker visits while trarsing will continue to be recorded in this way.

Example 9:

node person: has name;
edge family;
edge friend;

walker build_example {
spawn here -[friend]-> node::person(name="Joe");
spawn here -[friend]-> node::person(name="Susan");
spawn here -[family]-> node::person(name="Matt");
spawn here -[family]-> node::person(name="Dan");
}

walker init {
    root {
        spawn here walker::build_example;
        spawn -->[0] walker::build_example;
        take -->;
    }
person {
        report here; # report print back on disengage
        take -->;
    }
}

Output 9:

{
  "success": true,
    {
      "name": "person",
      "kind": "node",
      "jid": "urn:uuid:dcec06b4-4b7f-461d-bbe1-1fbe22a0ed0c",
      "j_timestamp": "2022-11-03T10:18:08.328560",
      "j_type": "node",
      "context": {
        "name": "Matt"
      }
    },
    {
      "name": "person",
      "kind": "node",
      "jid": "urn:uuid:1dde2125-f858-401e-b0e8-fc2bdb7b38fb",
      "j_timestamp": "2022-11-03T10:18:08.330218",
      "j_type": "node",
      "context": {
        "name": "Dan"
      }
    }

A portion of the final result is shown in the sample above. As the number of nodes in the graphs grows, the output will lengthen.

Yielding Walkers

We have so far examined walkers that carry variables and state as they move around a graph. Each time a walk is completed, a walker's state is cleared by default , but node/edge state is preserved. Nevertheless, there are circumstances in which you would want a walker to maintain its state across runs, or even to pause in the middle of a walk and wait to be explicitly called again, updating a subset of its dynamic state. This is where the yield keyword comes in.

To see an example of yield in action we will modify the 'init' walker from example 9.

Example 10:

.
.
.
node person: has name;
edge family;
edge friend;

walker build_example {
spawn here -[friend]-> node::person(name="Joe");
spawn here -[friend]-> node::person(name="Susan");
spawn here -[family]-> node::person(name="Matt");
spawn here -[family]-> node::person(name="Dan");
}

walker init {
    root {
        spawn here walker::build_example;
        spawn -->[0] walker::build_example;
        take -->;
    }
person {
        report here.context;
        take -->;
        yield;
    }
}

Output 10:

{
  "success": true,
  "report": [
    {
      "name": "Joe"
    }
  ],
  "final_node": "urn:uuid:b7ebf434-bd90-443a-b8e2-c29589c3da57",
  "yielded": true
}

The yield keyword in example 9 instructs the walker simple_yield to stop walking and wait to be called again, even though the walker is instructed to take--> edges. In this example, a single next node location is queued up and the walker reports a single here.context each time it’s called, taking only 1 edge per call.

Also note yield can be followed by a number of operations as a shorthand. For example take-->; and yield; could be combined to a single line with yield take -->;. We call this a yield-take. Shorthands include,

  • Yield-Take: yield take -->;
  • Yield-Report: yield report "hi";
  • Yield-Disengage: yield disengage; and yield disengage report "bye";

In each of these cases, the take, report, and disengage executes with the yield.

Technical Semantics of Yield

There are several key semantics of yield to remember, including:

  1. Upon a yield, a report is returned back and cleared.
  2. Additional report items from further walking will be return on subsequent yields or walk completion.
  3. Like the take command, the entire body of the walker will execute on the current node and actually yield at the end of this execution. • Note: Keep in mind yield can be combined with disengage and skip commands.
  4. If a start node, also known as a "prime" node, is supplied while continuing a walker after a yield, the walker will disregard this prime node and continue from where it left off on its journey if there are still other walk nodes it is planned to visit.
  5. If there are no nodes scheduled for the walker to go to next, a prime node must be specified (or the walker will continue from root by default).
  6. with entry and with exit code blocks in the walker are not executed upon continuing from a yield or executing a yeild respectively. Regardless of how many yields there are in between, they only execute once at the beginning and finish of a walk.
  7. At the level of the master (user) abstraction, Jaseci maintains the distinction between walkers that have been yielded and need to be resumed and those that are currently being run. The semantics of walkers that are summoned as public are currently unclear. For customized yield behaviors, developers should use the more basic walker spawn and walker execute APIs.

Abilities

Nodes, edges, and walkers can have abilities. The body of an ability is specified with an opening and closing braces ( { } ) within the specification of a node, edge, or walker and specify a unit of execution.

Abilities are most closely analogous to methods in a traditional object oriented program, however they do not have the same semantics of a traditional function. An ability can only interact within the scope of context and local variables of the node/edge/walker for which it is affixed and do not have a return semantic. (Though it is important to note, that abilities can always access the scope of the executing walker using the visitor special variable as described below)

When using abilities, a developer can think of these as self-contained in-memory/in-data compute operations.

here and visitor

At every execution point in a Jac/Jaseci program there are two scopes visible, that of the walker, and that of the node it is executing on. These contexts can be referenced with the special variables here and visitor respectively. Walkers use here to refer to the context of the node it is currently executing on, and abilities can use visitor to refer to the context of the current walker executing.

Understanding Abilities by Example

You can think of abilities as methods in traditional programming but however they are not similar in semantics;

  • Abilities can be in nodes, edges or walkers
  • Abilities cannot interact outside of the context and local variables of the attached node, edge, or walker, and does not have a return meaning.

Basic example of node abilities

This is a very basic example of a node ability.

Example 1:

node city{
    has name;
    can set_name{ #also can use "with activity"
        name = "c1";
        std.out("Setting city name:", here.context);
    }
} 

walker build_example{
    node1 = spawn here --> node::city;
}

walker init{
    root{
        spawn here walker::build_example;
        take-->;
    }
    city{
        here::set_name;
        std.out(here.name);
    }
}

set_name is the ability defined inside the city node. This ability will set a name to the city node. here::set_name is the syntax of triggering the node ability from the walker init.

Output 1:

Setting city name: {"name": "c1"}
c1

To see node abilities in advance let's define the following graph, which represent cities and connection between the them.

Example Graph

Note

To generate random interger values we can use rand.integer action from the rand action library; rand.integer(15,100) will output a integer value between 15 and 100;

The following example will set city names in each node;

Example 2:

node city{
    has name;
    has tourists;

    can set_tourists{ #also can use "with activity"
        tourists = rand.integer(15,100);
        std.out("Setting number of tourists in", here.context.name,"city", "to",tourists); 
    }
} 

walker build_example{
    node1 = spawn here --> node::city(name="c1");
    node2 = spawn node1 --> node::city(name="c2");
    node3 = spawn node2 --> node::city(name="c3");
    here --> node2;
    node1 --> node3;
}

walker init{

    root{
        spawn here walker::build_example;
        take-->;
    }

    city{
        here::set_tourists;
        take-->;
    }
}

set_tourists is the node ability in city node. here::set_tourists triggers the node ability inside the init walker. To get the variable value from the current context here.context.{variable_name} has been used. Look at the std.out statement inside the set_tourist node ability. The node ability can also defined as can set_tourists with activity {}. The both definitions works similarly.

Run the example code to obtain following output.

Output 2:

Setting number of tourists in c1 city to 47
Setting number of tourists in c2 city to 15
Setting number of tourists in c2 city to 69
Setting number of tourists in c3 city to 89
Setting number of tourists in c3 city to 51
Setting number of tourists in c3 city to 44

The init walker visits c2 and c3 edges multiple times as you can observe in the graph visualization c2 and c3 has multiple paths. to avoid resettnig the number of tourists in each visit let's replace the set_tourists ability with following code snippet;

can set_tourists{ #also can use "with activity"
    if(here.tourists==null){
        tourists = rand.integer(15,100);
        std.out("Setting number of tourists in", here.context.name,"city", "to",tourists);
    }
}

Basic example of walker abilities

In the following example adds another walker named traveller. To collect the value of a variable which is inside a walker we are using visitor keyword. See how it has been used inside the code snippet;

Note here refers to the current node scope pertinent to the program's execution point and visitor refers to the pertinent walker scope pertinent to that particular point of execution. All variables, built-in characteristics, and operations of the linked object instance are fully accessible through these references.

Example 3:

node city{
    has name;
    has tourists;

    can set_tourists{ #also can use "with activity"
        if(here.tourists==null){
            tourists = rand.integer(15,100);
            std.out("Setting number of tourists in", here.context.name,"city", "to",tourists);
        }
    }

    can reset_tourists with traveller entry{
        here.tourists = here.tourists + visitor.tours;
        std.out("Total tourists in", here.context.name, "when traveller arrives:",here.tourists);
    }

} 

walker build_example{
    node1 = spawn here --> node::city(name="c1");
    node2 = spawn node1 --> node::city(name="c2");
    node3 = spawn node2 --> node::city(name="c3");
    here --> node2;
    here --> node3;
}

walker init{

    root{
        spawn here walker::build_example;
        take-->;
    }

    city{
        here::set_tourists;
        spawn here walker::traveller;
        take-->;
    }
}

walker traveller{
    has tours = 1;
}

Output 3:

Setting number of tourists in c1 city to 84
Total tourists in c1 when traveller arrives: 85
Setting number of tourists in c2 city to 74
Total tourists in c2 when traveller arrives: 75
Setting number of tourists in c3 city to 27
Total tourists in c3 when traveller arrives: 28
Total tourists in c2 when traveller arrives: 76
Total tourists in c3 when traveller arrives: 29
Total tourists in c3 when traveller arrives: 30

As you can see number of tourists has been increased by one in each city with walker traveller entry to each node.The code phrase with traveler entry instructs the node ability reset_tourists to only execute when the traveller walker enters the "city" node.

We can try resetting variable values inside a walker using a ability of a node on a visit. lets update the walker traveller and add reset_walker_values ability inside the city node to see if this works.

can reset_walker_value with traveller entry{
    visitor.walker_value =  1;
    std.out("Total visit of traveller is",visitor.walker_value);
}

walker traveller{
    has tours = 1;
    has walker_value = 0;
    std.out(walker_value);
}

You might observe that while using a node's ability, the walkers' state remains unchanged.

Let's call a walker ability from a node in the following example;

Example 4:

node city{
    has name;
    has tourists;

    can set_tourists{ #also can use "with activity"
        if(here.tourists==null){
            tourists = rand.integer(15,100);
            std.out("Setting number of tourists in", here.context.name,"city", "to",tourists);
        }
    }
    can reset_tourists with traveler entry{
        here.tourists = here.tourists + visitor.tours;
        std.out("When traveler visits:",here.tourists, " tourists are in the city", here.context.name );
        visitor::print;
    }

} 

walker build_example{
    node1 = spawn here --> node::city(name="c1");
    node2 = spawn node1 --> node::city(name="c2");
    node3 = spawn node2 --> node::city(name="c3");
    here --> node2;
    here --> node3;
}

walker init{

    root{
        spawn here walker::build_example;
        take-->;
    }
    city{
        here::set_tourists;
        spawn here walker::traveler;
        take-->;
    }
}

walker traveler{
    has tours = 1;
    can print{
        std.out("Traveler enters the city");
    }
}

Output 4:

Setting number of tourists in c1 city to 33
When traveler visits: 34  tourists are in the city c1
Traveler enters the city
Setting number of tourists in c2 city to 99
When traveler visits: 100  tourists are in the city c2
Traveler enters the city
Setting number of tourists in c3 city to 16
When traveler visits: 17  tourists are in the city c3
Traveler enters the city
When traveler visits: 101  tourists are in the city c2
Traveler enters the city
When traveler visits: 18  tourists are in the city c3
Traveler enters the city
When traveler visits: 19  tourists are in the city c3
Traveler enters the city

Observe that the print statement "Traveler enters the city" comes from the walker traveler and triggers to executed when enters to a city node.

Lets try adding following node ability inside city node;

Example 5

can reset_tourists_1 with traveler exit{
    here.tourists = here.tourists - visitor.tours;
    std.out("When traveler leaves:",here.tourists, "tourists are in the city", here.context.name);
}

Output 5

Setting number of tourists in c1 city to 76
When traveler visits: 77  tourists are in the city c1
When traveler leaves: 76 tourists are in the city c1
Setting number of tourists in c2 city to 84
When traveler visits: 85  tourists are in the city c2
When traveler leaves: 84 tourists are in the city c2
Setting number of tourists in c3 city to 60
When traveler visits: 61  tourists are in the city c3
When traveler leaves: 60 tourists are in the city c3
When traveler visits: 85  tourists are in the city c2
When traveler leaves: 84 tourists are in the city c2
When traveler visits: 61  tourists are in the city c3
When traveler leaves: 60 tourists are in the city c3
When traveler visits: 61  tourists are in the city c3
When traveler leaves: 60 tourists are in the city c3

reset_tourists_1 executes when the walker traveler leaves the city node.

Actions

Actions enables bindings to functionality specified outside of Jac/Jaseci and behave as function calls with returns. These are analogous to library calls in traditional languages. This external functionality in practice takes the form of direct binding to python implementations that are packaged up as a Jaseci action library.

Note

This action interface is the abstraction that allows Jaseci to do it's sophisticated serverless inter-machine optimizations, auto-scaling, auto-componentization etc.

Understanding the actions by example

Basic action example

Jaseci has set of inbuilt actions. Also you can load and unload actions in jsctl shell. to see the available actions in jaseci session try running actions list. Here are two basic example of jaseci date actions.

Example 1:

node person {
    has name;
    has birthday;
}

walker init {
    can date.quantize_to_year;
    can date.quantize_to_month;
    can date.quantize_to_week;

    person1 = spawn here --> node::person(name="Josh", birthday="1995-05-20");

    birthyear = date.quantize_to_year(person1.birthday);
    birthmonth = date.quantize_to_month(person1.birthday);
    birthweek = date.quantize_to_week(person1.birthday);

    std.out("Date ", person1.birthday);
    std.out("Quantized date to year ", birthyear);
    std.out("Quantized date to month ", birthmonth);
    std.out("Quantized date to week ", birthweek);
}

Output 1:

Date  1995-05-20
Quantized date to year  1995-01-01T00:00:00
Quantized date to month  1995-05-01T00:00:00
Quantized date to week  1995-05-15T00:00:00

The following example executes action in each person nodes of the graph.

Example 2:

node person {
    has name;
    has birthday;
}

walker init {
    can date.quantize_to_year;

    root {
        person1 = spawn here --> node::person(name="Josh", birthday="1995-05-20");
        person2 = spawn here --> node::person(name="Joe", birthday="1998-04-23");
        person3 = spawn here --> node::person(name="Jack", birthday="1997-03-12");
        take -->;
    }
    
    person {
        birthyear = date.quantize_to_year(here.birthday);
        std.out(here.name," Birthdate Quantized to year ",birthyear);
        }
}

Output 2:

Josh  Birthdate Quantized to year  1995-01-01T00:00:00
Joe  Birthdate Quantized to year  1998-01-01T00:00:00
Jack  Birthdate Quantized to year  1997-01-01T00:00:00

Basic actions with presets and event triggers

Note here refers to the current node scope pertinent to the program's execution point and visitor refers to the pertinent walker scope pertinent to that particular point of execution. All variables, built-in characteristics, and operations of the linked object instance are fully accessible through these references.

Example 3:

node person {
    has name;
    has byear;

    #this sets the birth year from the setter
    can date.quantize_to_year::visitor.year::>byear with setter entry;

    #this executes upon exit of the walker from node
    can std.out::byear," from ", visitor.info:: with exit;

}

walker init {

    #collect the current time
    has year=std.time_now();
    root {
        person1 = spawn here --> node::person(name="Josh", byear="1992-01-01");
        take --> ;
    }

    person {
        spawn here walker::setter;
    }
}

walker setter {
    has year="1995-01-01";
    }

Output 3:

1995-01-01T00:00:00  from  {"name": "setter", "kind": "walker", "jid": "urn:uuid:a3e5f4b6-aeda-4cd0-9552-506cb3b7c693", "j_timestamp": "2022-11-09T09:10:05.134836", "j_type": "walker", "context": {"year": "1995-01-01"}}
1995-01-01T00:00:00  from  {"name": "init", "kind": "walker", "jid": "urn:uuid:47f1e467-a0e6-4772-a06a-204f6a1b69c3", "j_timestamp": "2022-11-09T09:10:05.129720", "j_type": "walker", "context": {"year": "2022-11-09T09:10:05.131397"}}

Language Features

Input and Output

To print on to the terminal we use :

std.out("Hello World");

To take input from terminal we use :

std.input();

Data types

JAC is a dynamically typed language so there is no need to declare the data type.

walker init {
    a=5;
    b=5.0;
    c=true;
    d='5';
    e=[a, b, c, d, 5];
    f={'num': 5};

    summary = {'int': a, 'float': b, 'bool': c, 'string': d, 'list': e, 'dict': f};

    std.out(sumary);
}

Operators

Arthimetic

// addition
a = 4 + 4;
e = a + b + c + d;

// multiplication
b = 4 * -5;

// division
c = 4 / 4;  # Returns a floating point number

// subtraction
d = 4 - 6;

// exponent / power
a = 4 ^ 4;

// modulus
b = 9 % 5

Equality

// equal
a == b

// not equal
a != b

// less than
a < b

// greater than
a > b

// less than and equal to
a <= b

// greater than and equal to
a >= b

Logical

// not
!a,

// and
a && b
a and b

// or
a || b
a or b

// mixture
!a or b
!(a and b)

Assigments

a = 4 + 4;
a += 4 + 4;
a -= 4 * -5;
a *= 4 / 4;
a /= 4 - 6;

Control FLow

The If statement

# Simple IF statement
if(condition){
    #execute if condition is true
}

If else


if(condition){
    #execute code if condition is true
} else {
    # execute code if condition is not true.
}

elif

if(condition){
     #execute code if condition is true
} 
elif(condition 2){
     #execute code if condition 2 is true
}
elif(condition 3){
     #execute code if condition 3 is true
}
else {
     #execute code if none of the conditions are true
}

Functions and Actions

Nodes and Walkers have


title : Control Flow

Control FLow

The If statement

# Simple IF statement
if(condition){
    #execute if condition is true
}

If else


if(condition){
    #execute code if condition is true
} else {
    # execute code if condition is not true.
}

elif

if(condition){
     #execute code if condition is true
} 
elif(condition 2){
     #execute code if condition 2 is true
}
elif(condition 3){
     #execute code if condition 3 is true
}
else {
     #execute code if none of the conditions are true
}

title : File Handling

 

 #load contents of a file to string
 lines  = file.load_str("test.txt")

 #load the contents of a Json file
 lines = file.load_json("tests.json");


title : Multiple Inheritance

JAC allows for nodes and edges to inherit attributes and functions of the same type .

Node Inheritance

node state {
    has title;
    has message;
    has prompts;
}

node input_state:state {
    has input;
}

node output_state :input:state{
    has output;
}

Edge Inheritance

edge transition {
    has transition_next ;
}

edge transition_back: transition {
    has prev_step ;
}

title : Operators

Arthimetic

// addition
a = 4 + 4;
e = a + b + c + d;

// multiplication
b = 4 * -5;

// division
c = 4 / 4;  # Returns a floating point number

// subtraction
d = 4 - 6;

// exponent / power
a = 4 ^ 4;

// modulus
b = 9 % 5

Equality

// equal
a == b

// not equal
a != b

// less than
a < b

// greater than
a > b

// less than and equal to
a <= b

// greater than and equal to
a >= b

Logical

// not
!a,

// and
a && b
a and b

// or
a || b
a or b

// mixture
!a or b
!(a and b)

Assigments

a = 4 + 4;
a += 4 + 4;
a -= 4 * -5;
a *= 4 / 4;
a /= 4 - 6;

title : Data Types

Data types

JAC is a dynamically typed language so there is no need to declare the data type.

walker init {
    a=5;
    b=5.0;
    c=true;
    d='5';
    e=[a, b, c, d, 5];
    f={'num': 5};

    summary = {'int': a, 'float': b, 'bool': c, 'string': d, 'list': e, 'dict': f};

    std.out(sumary);
}

title : Functions

Nodes and Walkers can have unique abilities. These are much like functions. They can be executed in different scenarios.

Nodes and Functions

The basic syntax for a function is as follows. Walkers can execute this code by simply using here.[name of function]

node [name of node ] {
    can [name of function] {
        
        # enter code to be executed here
    }
}

Execute with entry and exit

Node functions can be executed when a walker traverses on to it or when a walker leaves

node [name of node ]  {
    can [name of function]entry  {
        
        # enter code to be executed when walker traverses on this node 
    }

      can [name of function] exit {
        
        # enter code to be executed when walker leaves this node 
    }
}

Execute with sepcific walker

Nodes can have their functions only executed by a specific walker

node [name of node ] {
    can [name of function] with [namme of walker ] {
        
        # enter code to be executed here
    }
}

Excute with specific walker on entry and exit

node [name of node ] {
    can [name of function] with [namme of walker ] entry {
        
        # enter code to be executed here
    }
      can [name of function] with [namme of walker ] exit{
        
        # enter code to be executed here
    }
}

Global Reference Syntax (to be improve)

This for accessing current thread attributes.

global.context <Dict>

It will return global variables

global.info <Dict>

  • report
  • report_status
  • report_custom
  • request_context
  • runtime_errors

global.info["report"]

returns current report list

   [1, "any value from report", {}, true, []]

global.info["report_status"]

returns http status code for the report

   200

global.info["report_custom"]

returns current report:custom value

   {
       "yourCustomField": "customValue"
   }

global.info["request_context"]

returns current request payload

   {
       "method": "POST",
       "headers": {
           "Content-Length": "109",
           "Content-Type": "application/json",
           "Host": "localhost:8000",
           "User-Agent": "insomnia/2022.4.2",
           "Accept": "*/*"
       },
       "query": {},
       "body": {
           "name": "sample_walker",
           "ctx": {
               "fieldOne": "1"
           },
           "nd": "active:graph",
           "snt": "active:sentinel"
       }
   }

Usage:

walker can now accept custom payload (non ctx structure). Request body can be access via globa.info["request_context"]["body"] developers now have control on different request constraints such us method type and headers validation


global.info["runtime_errors"]

returns current runtime error list

[
  "sentinel1:sample_walker - line 100, col 0 - rule walker - Internal Exception: 'NoneType' object has no attribute 'activity_action_ids'"
]


title : Input and output

Input and Output

To print on to the terminal we use :

std.out("Hello World");

To take input from terminal we use :

std.input();

How to upload file

Old approach: (still supported)

Most of the requests uses json as request body. This will have some limitations if you want to include file thru json.

Structure:

// Sample Request
{
    "name": "walker_name",
    "ctx": {
        "anyFieldNameForYourFile": [{
            "name": "sample.txt",
            "base64": "MAo=" // "MAo=" is equivalent to "0\n"
        }]
    },
    "nd": "active:graph",
    "snt": "active:sentinel"
}

Note:

anyFieldNameForYourFile structure is based on jaseci action request.multipart_base64. You can use different structure. However, you will still need to request it as base64. You will also need to reconstruct it to request.multipart_base64's files structure if you want to pass it on different service (internal or 3rd party) using multipart/form-data

// request.multipart_base64's files parameter structure
[
    {
        "field": "anyFieldName", // Optional: Default is "file"
        "name": "sample.txt",
        "base64": "MAo="
    }
]

Limitation

Each Base64 digit represents exactly 6 bits of data. So, three 8-bits bytes of the input string/binary file (3×8 bits = 24 bits) can be represented by four 6-bit Base64 digits (4×6 = 24 bits).

This means that the Base64 version of a string or file will be at least 133% the size of its source (a ~33% increase). The increase may be larger if the encoded data is small. For example, the string "a" with length === 1 gets encoded to "YQ==" with length === 4 — a 300% increase. Some server/service have limit in terms of accepting request. For example, your file is 4MB and you converted it to base64, some server will return 413: Payload Too Large. This may also affect upload time since it will add some additional file size.

To improve developer/user experience, we support multipart/form-data approach


How to upload file using multipart/form-data

What is multipart/form-data

The enctype attribute specifies how the form-data should be encoded when submitting it to the server. Multipart/form-data is one of the most used enctype/content type. In multipart, each of the field to be sent has its content type, file name and data separated by boundary from other field.

No encoding of the data is necessary, because of the unique boundary. The binary data is sent as it is. The server reads the until the next boundary string.

Definition of multipart/form-data

The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in [RFC 2046]. In forms, there are a series of fields to be supplied by the user who fills out the form. Each field has a name. Within a given form, the names are unique.

“multipart/form-data” contains a series of parts. Each part is expected to contain a content-disposition header [RFC 2183] where the disposition type is “form-data”, and where the disposition contains an (additional) parameter of “name”, where the value of that parameter is the original field name in the form. For example, a part might contain a header:

Content-Disposition: form-data; name=”user” with the value corresponding to the entry of the “user” field.

Field names originally in non-ASCII character sets may be encoded within the value of the “name” parameter using the standard method described in RFC 2047. As with all multipart MIME types, each part has an optional “Content-Type”, which defaults to text/plain.

If the contents of a file are returned via filling out a form, then the file input is identified as the appropriate media type, if known, or “application/octet-stream”.

If multiple files are to be returned as the result of a single form entry, they should be represented as a “multipart/mixed” part embedded within the “multipart/form-data”. Each part may be encoded and the “content-transfer-encoding” header supplied if the value of that part does not conform to the default encoding.

Implementation

Request using application/json

Request:
URL: http://localhost:8000/js/walker_run?testQueryField=1
Method: POST
{
	"name": "walker_name",
	"nd": "active:graph",
	"snt": "active:sentinel",
	"ctx": {
        "yourCtxField1": "value1",
        "yourCtxField2": "value2",
        "yourCtxField3": "value3",
        "fileTypeField": [
            {
				"name": "test.txt",
				"base64": "MAo="
			}
        ],
        "fileTypeSameField": [
            {
				"name": "test.txt",
				"base64": "MAo="
			}
        ]
    },
    "additionalFIeld": 1
}

Request using multipart/form-data

Insomnia: multipart/form-data

Curl equivalent of multipart/form-data

curl --request POST \
  --url http://localhost:8000/js/walker_run?testQueryField=1 \
  --header 'Authorization: token {yourToken}' \
  --header 'Content-Type: multipart/form-data' \
  --form name=walker_name \
  --form nd=active:graph \
  --form snt=active:sentinel \
  --form 'ctx=path/to/file/context.json' \
  --form 'fileTypeField=path/to/file/test.txt' \
  --form 'fileTypeSameField=path/to/file/test.txt' \
  --form 'fileTypeSameField=path/to/file/test.txt'
  --form snt=active:sentinel \

context.json

{
    "yourCtxField1": "value1",
    "yourCtxField2": "value2",
    "yourCtxField3": "value3"
}

test.txt

 0

Advantages

  • request with multiple part with different content-type is supported
  • files are not subjected for additional size like base64 does
  • it will also avoid 413: Payload Too Large since it will use multipart/form-data as main request Content-Type instead of application/json
  • File handling can now be improved in terms of performance since application/json request will always saved in memory while multipart/form-data can be on memory or on disk. (On disk handling will be for future improvements)
  • You can also access the files using has fieldName excluding ctx

Report Custom

Supports custom structure as response body.

Syntax

    report:custom = `{{ any | {} | [] }}`

Usage

This can be combine with walker_callback as 3rd party service requires different json structure on response. It can also be used for different scenario that doesn't require ctx structure

Walker Callback

Walker callback is used for running a walker to a specific node using public key instead of authorization token.

Use Case

Generating public URL that can be used as callback API for 3rd party Webhook API. You may also use this as a public endpoint just to run a specific walker to a specific node.

Structure

POST /js_public/walker_callback/{node uuid}/{spawned walker uuid}?key={public key}

Step to Generate

1. Jac Code

walker sample_walker: anyone {
    has fieldOne;
    with entry {
        report 1;
    }
}

2. Register Sentinel

curl --request POST \
  --url http://localhost:8000/js/sentinel_register \
  --header 'Authorization: token {yourToken}' \
  --header 'Content-Type: application/json' \
  --data '{ "name": "sentinel1", "code": "walker sample_walker: anyone {\r\n\thas fieldOne;\r\n\twith entry {\r\n\t\treport 1;\r\n\t}\r\n}" }'
// RESPONSE
[
	{
		"version": "3.5.7",
		"name": "zsb",
		"kind": "generic",
		"jid": "urn:uuid:b4786c7a-cf24-49a4-8c2c-755c75a35043",
		"j_timestamp": "2022-05-11T05:57:07.849673",
		"j_type": "sentinel"
	}
]

3. Spawn Public Walker (sample_walker)

curl --request POST \
  --url http://localhost:8000/js/walker_spawn_create \
  --header 'Authorization: token {yourToken}' \
  --header 'Content-Type: application/json' \
  --data '{ "name": "sample_walker", "snt":"active:sentinel" }'
// RESPONSE
{
	"context": {},
	"anchor": null,
	"name": "sample_walker",
	"kind": "walker",
	// this is the spawned walker uuid to be used
	"jid": "urn:uuid:2cf6d0dc-e7eb-4fc8-8564-1bbdb48baad3",
	"j_timestamp": "2022-06-07T09:45:22.101017",
	"j_type": "walker"
}

4. Getting Public Key

curl --request POST \
  --url http://localhost:8000/js/walker_get \
  --header 'Authorization: token {yourToken}' \
  --header 'Content-Type: application/json' \
  --data '{ "mode": "keys", "wlk": "spawned:walker:sample_walker", "detailed": false }'
// RESPONSE
{
	// this is the public key used for walker callback
	"anyone": "97ca941e6bf1f43c3a4e531e40b2ad5a"
}

5. Construct the URL

Assuming there's a node with uuid of aa1bb26e-238b-40a0-8e39-333ec363ace7 this endpoint can now be accessible by anyone

POST /js_public/walker_callback/aa1bb26e-238b-40a0-8e39-333ec363ace7/2cf6d0dc-e7eb-4fc8-8564-1bbdb48baad3?key=97ca941e6bf1f43c3a4e531e40b2ad5a

Alias Actions

Alias provides names for long string like UUIDs.

Register Alias

#name (str): The name for the alias created by caller.
#value (str): The value for that name to map to (i.e., UUID)
response  = jaseci.alias_register(name,value);

List Aliases

# List all string to string alias that caller can use
jaseci.alias_list()

Delete Alias

# Delete an active string to string alias mapping
# name (str): The name for the alias to be removed from caller.
jaseci.alias_delete()

Clear Alias

#  Removes  all aliases.
jaseci.alias_clear()

Date Actions

Jaseci has it's own set of built-in Date Functions

Quantize to Year

#take a standard python datetime string and extract the year out of it accordingly

x = '2021-12-12';
z = date.quantize_to_year(x);
std.out(z);

Quantize to Month

#take a standard python datetime string and extract the month out of it accordingly
x = '2021-12-12';
z = date.quantize_to_month(x);
std.out(z);

Quantive to Week

#take a standard python datetime string and extract the month out of it accordingly
x = '2021-12-12';
z = date.quantize_to_week(x)
std.out(z);

Quantize to day

#take a standard python datetime string and extract the day out of it accordingly

x = '2021-12-12';
z = date.quantize_to_day(x);
std.out(z);

Date Difference

#t akes two datetime string and returns an integer that is the number of days in between the two given dates.

z = date.date_day_diff('2021-12-12','2022-12-12');
std.out(z);

File Actions

Load file to string

# Converts file to string , max_chars is set to none by default
Testfile = file.load_str(test.txt, max_chars = 1000)

Load Json file to dictionary

# Loads json from file to dictionary format
TestJson = file.load_json(test.json)

String to file

# dumps string in to file
test = "This is a test of the dump_str method
Testfile = file.dump_str("text.txt",test)

Append string to a file

# appending a string to a file.
test = "This is a another test but with  the append_str method
Testfile = file.append_str(str, max_chars = 1000)

Create Json File

# dump dictionary in to json file
test = {
    "name": "test",
    "method" : "dump_json",
    "use" : "dumps dictionary to json file"
}
Testfile = file.dump_json("text.json",test)

Delete file

#delete any file 
file.delete("text.txt")

Global Actions

Jaseci Actions

Alias

Alias provides names for long string like UUIDs.

Register Alias

#name (str): The name for the alias created by caller.
#value (str): The value for that name to map to (i.e., UUID)
response  = jaseci.alias_register(name,value);

List Aliases

# List all string to string alias that caller can use
jaseci.alias_list()

Delete Alias

# Delete an active string to string alias mapping
# name (str): The name for the alias to be removed from caller.
jaseci.alias_delete()

Clear Alias

#  Removes  all aliases.
jaseci.alias_clear()

Objects

Get global Variable

# name: name of global variable
value = jaseci.global_get(name);

Object Details

# retuen detail of jaseci object
object : element - jaseci object
details = jaseci.object_get(object);

Object Access Mode

# Get the object access mode for any jaseci object.
# object : element - jaseci object
accessMode  = jaseci.object_perms_set(object)

Set Object access mode

# valid perms = ["public", "private", "read_only"]
# object : element - jaseci object
# perm : string 
jaseci.object_perms_set(element,perm);

Object access grant

# grants one object the acess to another object
# object : element - object to access
# master : element  - object to gain access

rent = jaseci.object_perms_grant(element, master);

Revoke object access

#Remove permissions for user to access a Jaseci object
# object : element - object that was  accessed
# master : element  - object that has access.
ret = jaseci.object_perms_revoke(element,master);

Graphs

Create Graph

# Create a graph instance and return root node graph object
jaseci.graph_create()

Get Graph Content

# Return the content of the graph with mode
# Valid modes: {default, dot, }
# gph :graph - graph whose conten you need
# mode : string - "deafult" or "dot" , "default" by default
Contents = jaseci.graph_get(gph);

List Graph Objects

#  Provide complete list of all graph objects (list of root node objects)
# detailed : boolean - if eac graph details are wanted
graph_info = jaseci.graph_list(detailed);

Set Default Graph

# set the default graph master should use
# gph : graph - graph to be default.
message = jaseci.graph_active_set(gph);

Remove Default Graph

# Unsets the default sentinel master should use
jaseci.graph_active_unset();

Get Default Graph

# detailed : boolean - default false , true to return graph details (optional)
grph = jaseci.graph_active_get()

Delete Graph

# permantely  delete graph 
# grph : graph - graph to be deleted

message = jaseci.graph_delete(grph);

Return Node Value

# returns value of a given node
# nd : node : node whose value will be returned.
node_value =  jaseci.graph_node_get(nd);

Set Node Value

# Assigns values to member variables of a given node using ctx object
# nd : node : node to who a value will be assigned.
# ctx : dictionary - values to assign 

node_details = jaseci.graph_node_set(nd,ctx);

Sentinels

Register Sentinel

#Create blank or code loaded sentinel and return object
# name: str -  "default" when not specified,
#encoded: bool 
#auto_run: Auto_run is the walker to execute on register (assumes active graph is selected), default is "init"
#ctx: dict = {},
#set_active: bool = True,

sentel = jaseci.sentinel_regsiter(name,encoded,auto_run,ctx,set_active);

Global Sentinel

#   Copies global sentinel to local master
#set_active : boolean - set sentinel to be active
# on_demand : boolean - 

sentl = jaseci.sentinel_pull(set_active);

Get Sentinel

# Get a sentinel rendered with specific mode
#Valid modes: {default, code, ir, }
#snt : sentinel : sentinel to be rendered in specific mode.
#mode : str - mode sentinel will be in 
snt  = jaseci.sentinel_get(snt,mode);

Set Sentinel Code

# Set code/ir for a sentinel, only replaces walkers/archs in sentinel
# Needs more clarity
jaseci.sentinel_set();

Sentinel List

# Provides Completed list of all sentinel objects 

#snt_list = jaseci.sentinel_list();

Sentinel Test

# Run battery of test cases within sentinel and provide result
#snt : sentinel - sentinel to be tested

snt_details = jaseci.sentinel_test(snt);

Default Sentinel

# Sets the default sentinel master should use
#snt :sentinel - sentinel to be made default

message = jaseci.sentinel_active_set(snt);

Remove Default Sentinel

# Unsets the default sentinel master should use
messsage = jaseci.sentinel_active_unset();

Set Global Sentinel

# Sets the default master sentinel to the global sentinel
response  = jaseci.sentinel_active_global();

Return default Sentinel

#  Returns the default sentinel master is using
response = jaseci.sentinel_active_get();

Delete Sentinel

# Permanently delete sentinel with given id
snt : sentinel - sentinel to be deleted

message = jaseci.sentinel_delete(snt);

Run Walker

# clarity needed
# Run a walker on a specific node
# wlk : walker - walker to be ran 
# nd : node -  node where walker will be placed
# ctx : dictionary  - context for walker
response  = jaseci.walker_summon()

Register Walker

# clarity needed
#  Create blank or code loaded walker and return object
walker_seralized = jaseci.walker_register();

Get Walker

# Get a walker rendered with specific mode
# wlk : walker - walker to be rendered 
# mode : str - mode to return walker
#  Valid modes: {default, code, ir, keys, }
wlk_response = jaseci.walker_get(wlk,mode);

Set Walker code

#  Set code/ir for a walker
# Valid modes: {code, ir, }
# wlk :walker - walker code/ir to be set
# code : str - "code" or  "ir" 
message = jaseci..walker_set(wlk,code);

List Walkers

# List walkers known to sentinel
snt :sentinel - active sentinel

walkers = jaseci.walker_list();

Delete Walker

# Permantely delete walker with given id 
# wlk : walker - walker to be deleted
# snt : sentinel - sentinel where walker resides
message = jaseci.walker_delete(wlk,snt);

Spawn Walker

#  Creates new instance of walker and returns new walker object
# name : str - name of walker
# snt : sentinel - sentinel the walker will be under
spawn_wlk = jaseci.walker_spawn_create(name,snt);

Delete spawned Walker

#Delete instance of walker 
# name : string - name of walker to be deleted
jaseci.walker_spawn_delete(name);

List Spawned walker

# List walkers spawned by master
# detailed : boolean - return details of walkers
walkers  = jaseci.walker_spawn_list(deatailed);

Assign walker to node

#  Assigns walker to a graph node and primes walker for execution 
# wlk : walker  - walker to be assigned
# nd : node - node walker will be assigned too
# ctx : dicionary  - context for node

message   = jaseci.walker_prime(wlk,nd,ctx);

Execute Walker

# execute walker assuming it is primed.
# wlk : walker -  walker to execute
# nd : node - node where execution will begin

response  = jaseci.walker_execute(wlk,nd);

Walker run

# Creates walker instance, primes walker on node, executes walker, reports results, and cleans up walker instance.
#name: str - name of the walker 
#nd: node = Node walker will be primed on 
#ctx: dict -  {} by default
#snt: sentinel - None  by default 
#profiling: bool - False by default

response =  jaseci.walker_run(name,nd,ctx,snt,profiling);

Walker Individual APIs

#name : string - name of walker
#nd :node - node walker will be primed on
# ctx : dictionary - dictionary for context information
# snt :  sentinel , none by default
# profiling : boolean , false by default
 response = jaseci.wapi(name,nd,ctx);

Architypes

Create Architype

# code : string : the test or filename  for an architype jac code
#encoded : boolean : if code is encoded or not
# snt (uuid) : the uuid of the sentinel to be the owner of this architype

architype_response  = jaseci.architype_register(code,encoded,snt);

Get Architype

# Get an architype rendered with specific mode
# arch : architype - the architype being accessed
# mode : string - valid modes {default, code, ir}
# detailed : boolean - return detailed info also

architpe_serialized   = jaseci.architype_get(arch,mode,detailed);

Set Architype code or ir

#arch (uuid): The architype being set
#code (str): The text (or filename) for an architypes Jac code/ir
#mode (str): Valid modes: {default, code, ir, }

response  = jaseci.architype_set(arch,code,mode);

List Architype

# List architypes know to sentinel
#snt (uuid): The sentinel for which to list its architypes
# detailed (bool): Flag to give summary or complete set of fields

archs = jaseci.architype_list(snt,detailled);

Delete Architype

# Permanently delete sentinel with given id
#arch (uuid): The architype being set
#snt (uuid): The sentinel for which to list its architypes

response = jaseci.architype_delete(arch,snt);

Masters

Create Master

# create a master instance and retrun root node master object 
# name  :str - name of master
# active : boolean 
# ctx : dictionary - additional feilds for overloaded interfaces

master_object  = jaseci.master_create(name,active,ctx);

Get Master Content

# return the content of the master with mode
# name : string - name of master to be returned
# mode : string - modes{'default',}

master_object = jaseci.master_get(name,mode);

List Masters

#  Provide complete list of all master objects (list of root node objects)
# detailed : boolean - detailed info wanted. 

masters  = jaseci.master_list(detailed);

Set Default Master

#  Sets the default sentinel  master should use
# name  : name of master to be set

response  = jaseci.master_active_set(name);

Unset Default Master

# unsets the default sentinel mastershould use
response  = jaseci.master_active_unset();

Get Default Master

#  Returns the default master master is using
# detailed : boolean  - return detailed information on the master

master_serialized = jaseci.master_active_get(detailed);

Get Master Object

# Returns the masters object

master_object = jaseci.master_self();

Delete Master

name : str - master to be deleted

response   = jaseci.master_delete(name);

Logger

APIs for Jaseci Logging configuration

Connect to internal logger

#   Connects internal logging to http(s) (log msgs sent via POSTs)
#  Valid log params: {sys, app, all }
# host : string  - 
# port : string - 
# url : string - 
# log : string  - 
response = jaseci.logger_http_connect(host,port,url,log);

Remove HTTP Handler

# log  : string - default ,all

response  = jaseci.logger_http_clear(log);

Check Active logger

#  list active loggers

response = jaseci.logger_list();

Global API

Set Global

# Set a global variable 
# name  : string - name of global
# value : string -  value of global
 response = jaseci.global_set(name,value);

Delete Global

# delete a global
# name : string - delete globals
response = jaseci.global_delete(name);

Set Global Sentinel

# set sentinel as  globally accessible
# snt : sentinel -  sentinel to be set globally accessible
response = jaseci.global_sentinel_set(snt);

Unset Global Sentinel

#unset globally accessible variable
# snt : sentinel - sentinel to be removed as globally acccessible 
response  = jaseci.sentinel_unset(snt);

Super Master

Super Instance of Master

#   Create a super instance and return root node super object
# name : string - name of master
# set_active : boolean - set master to active
# other_fields : dictionary - used for additional feilds for overloaded interfaces (i.e., Dango interface)

master_object  = jaseci.master_createsuper(name,set_active,other_fields);

Masters info

#  Returns info on a set of users
# num : int -  specifies the number of users to return 
# start_idx :int -  specfies where to start

# in development 

Set Default Master

# Sets the default master master should use
# mast : master - master to be used
response  = jaseci.master_become(mast);

Unset default Master

# Unsets the default master master should useS
response = jaseci.master_become();

Stripe

Set of APIs to expose Stripe Management

Create Product

# name : string - default "VIP Plan"
# description : string - default " Plan description"

message = jaseci.stripe_product_create(name,desciption);

Modify Product Price

# productId : string - id of product to be modified 
# amount : float - amount for product ,default is 50
# interval : string - default  "month"

message = jaseci.stripe_product_price_set(productId,amount,interval);

List Products

# retrieve all products
# detailed : boolean - details of all products
 product_list = jaseci.stripe_product_list();

Create Customer

# paymentId : string - id of payment method
# name : string - name of customer 
# email : string - email of customer
# description : string  - description of customer

message =  jaseci.stripe_customer_create(paymentId,name,email,description);

Get Customer Information

# retrieve customer information
#customerId : string - id to identify customer

message = jaseci.stripe_customer_get(customerId);

Add Customer Payment Method

# paymentMethodId : string - id of payment method
# customerId  : string - id to uniquely identify customer 
message = jaseci.stripe_customer_payment_add(paymentId,customerId);

Remove Customer Payment method


# paymentMethodId : string - id of payment method

message = jaseci.stripe_customer_payment_delete(paymentId);

Customer's List of payment Method

# get list of customer payment method
# customerId : string - id to uniquely identify customer

payment_methods = jaseci.stripe_customer_payment_get(customerId);

Update Customer default payment

# paymentMethodId : string - id of payment method
# customerId  : string - id to uniquely identify customer 

message = jaseci.stripe_customer_payment_default(customeId,paymentMethodId);

Create Customer Subscription

# create customer subscription
# paymentId : string - id pf payment method
# priceId : string - id for price 
# customerId: string - id to uniquely identify customer 

message = jaseci.stripe_subscription_create(paymentId,priceId,customerId);

Cancel Customer Subscription

# subscriptionId : string - id to uniquley identify subscription
message  = jaseci.stripe_subscription_delete(subscriptionId);

Get Customer Subscription

# retrieve customer subscription 
# customerId : string - id to uniquely identify customer

customer_subscription = jaseci.stripe_subscription_get(customerId);

Invoice List

# retrieve customer list of invoices
# customerId : string - id to uniquely identify customer`
# subscriptionId : string - id to uniquley identify subscription
# limit : int - max amount of invoices to return
# lastitem : string - id of item from where the return should start default is " " 

invoices = jaseci.stripe_invoice_list(customerId,subscriptionId,limit,lastitem);

Load actions

Load modules locally

# hot load a python module and assimlate any jaseci action
# file  string - module to be loaded
success_message  = jaseci.actions_load_local(file);

Load modules remote

#  Hot load an actions set from live pod at URL
# url : string - link to module to be loaded
success_message = jaseci.actions_load_remote(url);

Load modules local

mod : string - name of module to be loaded

success_messsage = jaseci,actions_load_module(mod);

List actions

actions = jaseci.actions_list();

Configurations APIs

Get config

# get a Connfig
# name : string - name of configurations
# do_check : boolean - deafult is True

confid_details = jaseci.config_get(name,do_check);

Set Config

# name :string - name of configuration
# value : string - value to set 
# do_check : boolean - deafult is True

config_details = jaseci.config_set(name,value,do_check);

List Config


configs = jaseci.config_list();

List Valid Config


valid_configs = jaseci.config_index();

Configuration exits

# name : string - name of configuration
config_exist = jaseci.config_exists(name);

Delete Configurations

#name : string
# do_check : boolean - deafult is True

message = jaseci.config_delete(name,do_check);

Net Actions

Max Anchor Value

# returns object (node,edge) with the highest  anchor value
node year {
    has_anchor year_num;
}

jacset  = [year1,year2,year3];
value = net.max(jac_set)

Minimum Anchor Value

# returns object (node,edges) with the lowest anchor value
node year {
  
    has_anchor year_num;
}

jacset  = [year1,year2,year3];
value = net.min(jac_set)

Get Node Root

# returns root node of a given graph 
node year {
    has_anchor year_num;
}

jacset  = [year1,year2,year3];
value = net.root(jac_set)

Rand Actions

Seeds random number generator

# seeds random num generator
rand.seed(4);

Generate random generator

# Generates random integer between range (0-10)
num = rand.integer(0, 10);

Random Selection

a_list = ['apple','mango','orange']
# Randomly selects and return item from list
num = rand.choice(a_list);

Generate Random Word

# generate a random word
 wrd = rand.word();

Generate Random Sentence

# generates a random sentence
# min_lenght - optional , minimum amount of words defaut is 4
# max_lenght - optional ,  maximum amount of words default is 10
# sen - optional
senetence  = rand.sentence();

Generate Random Paragraph

# generates a random paragraph
# min_lenght - optional , minimum amount of setences defaut is 4
# max_lenght - optional ,  maximum amount of sentences default is 8
# sen - optional
paragraph = rand.paragraph();

Generate Random Text

# generates a random text
# min_lenght - optional , minimum amount of paragraph ,defaut is 3
# max_lenght - optional ,  maximum amount of paragraph default is 6
# sen - optional
 test  = rand.text();

Generate time

# Generate a random datetime between range.

returned time = rand.time("2020-10-25", "2020-11-26);

Request Actions

Jaseci allows for in-code use of common request methods.

Get Request

# make get request 
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.get(url ,data , headers)

Post Request

# make post request 
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.post(url ,data , headers)

Put Request

# make put request 
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.put(url ,data , headers)

Delete Request

# make delete request 
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.delete(url ,data , headers)

Head Request

# make head request , returns header of  a get request alone
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.head(url ,data , headers)

Option Request

# make options request , requests permitted communications options fror a given url or server.
# url : string  - url to where the request will be made 
# data : dictionary - data being sent that will be converetd to json.
# header : dicionary -  header data 
response = request.get(url ,data , headers)

File upload

# used to upload a file or files
# url : string  - url to where the request will be made 
# file : single base64 encoded file
# files : list of base64 encode files.
# header : dicionary -  header data 
response = request.multipart_base64(url ,file , headers)

Download File

# url : string  - url to where the request will be made 
# header : dicionary -  header data 
# encoding : strign - file format , default is utf-8
downloaded_file = request.file_download_base64(url,header,encoding)

Standard Actions

Logging output


# printing output to log
data = {
    "type" : "String",
    "name" " "Jaseci"
}
result  = std.log(data)

Output

data = {
    "type" : "String",
    "name" " "Jaseci"
}

# print on to the termnial
std.out(data)

Input

# takes input from the terminal 
# any string passed will be printed on to the screen
std.input("> ")

Standar Error

# printing to standard error

std.eer()

Sort Columns

# Sorts in place list of lists by column
# Param 1 - list
# Param 2 - col number (optional)
# Param 3 - boolean as to whether things should be reversed (optional)
#Return - Sorted list
sorted_list = std.sort_by_col(param1,param2)

UTC time

# Get utc date time for now in iso format
time  = std.time_now()

Set Global Variable


# set global varibale visible to all walker
# name : string
# value : value (must be json seriaziable)

global_variable = std.set_global(name,value);

Get Global Variable

# get global variable
# name : name of variable
global_variable = std.get_global(name);

Load local actions to Jaseci

# load local actions date to jaseci
action = std.actload_local("date.py");

Load remote actions to Jaseci

action = std.actload_remote(url)

Load module actions to Jaseci

#load use_qa model
action = std.actload_module('use_qa');

Destroy Global

global = std.destroy_global(name)

Set object Permission

element - target element
mode - valid permission (public, private, read_only)
object = std.set_perms(element,mode)

Get object Permission

#Returns object access mode for any Jaseci object
# object - target element
# Return - Sorted list

obj = std.get_perms(object);

Grant object Permission

# grants another user permission to access a jaseci object
# obj :target element
# element : master to be granted permission
# readonly : Boolean read-only flag
# Returns sorted list

object  = std.grant_perms(obj,element,readonly)

Revoke Permission

# Remove permission for user to access a jaseci object
# obj : target element
# element : master to be revoke permission
# return sorted list
objects = std.revoke_perms(obj,element);

Get Report

# Get current report so far from walker run

reprt = std.get_report();

Vector Actions

Cosine Similarity

#Calculate the Cosine similarity score between 2 vectors.
# return float betweeen  0 and 1
# vectora :list
# vectorb : list
similarity = vector.cosine_sim(vectora, vectorb);

Dot Product

# Calculate the dot product between 2 vectors.
# return float betweeen  0 and 1
# vectora :list
# vectorb : list

dot_product = vector.dot_product(vectora,vectorb);

Centroid

# Calculate the centroid of the given list of vectors.
# list of vectors
# returns [centroid vectors , cluster tightness]

centroid  = vector.get_centroid(vectors);

Softmax

# calculate the softmax value
# returns list
# vectors : dictionary
values = vectors.softmax(vectors);

Dimensionality Reduction

// fit the model with the given vector. save this model(str) to a file for the future usage

data = [[1,2,3],[4,5,6],[7,8,9]];
model = vector.dim_reduce_fit(data, 2);

// transform the given vectors with the given model

new_data = [[3,2,3],[4,9,6]];
reduced_data = vector.dim_reduce_apply(new_data, model);

Walker Actions

Jaseci Kit

Jaseci Kit is a collection of state-of-the-art machine learning models that are readily available to load into jaseci.

Model Directory

Encoders

ModuleModel NameExampleTypeStatusDescriptionResources
use_encUSE EncoderLinkZero-shotReadySentence-level embedding pre-trained on general text corpusPaper
use_qaUSE QALinkZero-shotReadySentence-level embedding pre-trained on Q&A data corpusPaper
fast_encFastTextLinkTraining req.ReadyFastText Text ClassifierPaper
bi_encBi-encoderLinkTraining req./Zero-shotReadyDual sentence-level encodersPaper
sbert_simSBert SimilarityLinkTraining req./Zero-shotReadySBert Encoders for Sentence SimilarityPaper
poly_encPoly-encoderTraining req./Zero-shotExperimentalPoly EncoderPaper
cross_encCross-encoderTraining req./Zero-shotExperimentalCross EncoderPaper

Entity

ModuleModel NameExampleTypeStatusDescriptionResources
ent_ext/ lstm_nerFlair NERLinkTraining req.ReadyEntity extraction using the FLAIR NER framework
tfm_nerTransformer NERLinkTraining req.ReadyToken classification on Transformer models, can be used for NERHuggingface
lstm_nerLSTM NERTraininig req.ExperimentalEntity extraction/Slot filling via Long-short Term Memory Network

Summarization

ModuleModel NameExampleTypeStatusDescriptionResources
cl_summerSummarizerLinkNo Training req.ReadyExtractive Summarization using SumyDoc.
t5_sumSummarizerLinkNo Training req.ReadyAbstractive Summarization using the T5 ModelDoc., Paper

Text Processing

ModuleModel NameExampleTypeStatusDescriptionResources
text_segText SegmenterLinkNo Training req.ExperimetalTopical Change Detection in DocumentsHuggingface

Non-AI Tools

ModuleModel NameExampleStatusDescriptionResources
pdf_extPDF ExtractorLinkReadyExtract content from a PDF file via PyPDF2Doc.

Examples

Encoders

USE Encoder (use_enc)

use_enc module uses the universal sentence encoder to generate sentence level embeddings. The sentence level embeddings can then be used to calculate the similarity between two given text via cosine similarity and/or dot product.

  • encode: encodes the text and returns a embedding of 512 length
    • Alternate name: get_embedding
    • Input:
      • text (string or list of strings): text to be encoded
    • Return: Encoded embeddings
  • cos_sim_score:
    • Input:
      • q_emb (string or list of strings): first text to be embeded
      • a_emb (string or list of strings): second text to be embedded
    • Return: cosine similarity score
  • text_simliarity: calculate the simlarity score between given texts
    • Input:
      • text1 (string): first text
      • text2 (string): second text
    • Return: cosine similarity score
  • text_classify: use USE encoder as a classifier
    • Input:
      • text (string): text to classify
      • classes (list of strings): candidate classification classes

Example Jac Usage:

# Use USE encoder for zero-shot intent classification
walker use_enc_example {
    can use.encode, use.cos_sim_score;
    has text = "What is the weather tomorrow?";
    has candidates = [
        "weather forecast",
        "ask for direction",
        "order food"
    ];
    text_emb = use.encode(text)[0];
    cand_embs = use.encode(candidates); # use.encode handles string/list

    max_score = 0;
    max_cand = 0;
    cand_idx = 0;
    for cand_emb in cand_embs {
        cos_score = use.cos_sim_score(cand_emb, text_emb);
        if (cos_score > max_score) {
            max_score = cos_score;
            max_cand = cand_idx;
        }
        cand_idx += 1;
    }

    predicted_cand = candidates[max_cand];
}

USE QA (use_qa)

use_qa module uses the multilingual-qa to generate sentence level embeddings. The sentence level embeddings can then be used to calculate best match between question and available answers via cosine similarity and/or dist_score.

  • question_encode: encodes question and returns a embedding of 512 length

    • Alternate name: enc_question
    • Input:
      • text (string or list of strings): question to be encoded
    • Return: Encoded embeddings
  • answer_encode: encodes answers and returns a embedding of 512 length

    • Alternate name: enc_answer
    • Input:
      • text (string or list of strings): question to be encoded
      • context (string or list of strings): usually the text around the answer text, for example it could be 2 sentences before plus 2 sentences after.
    • Return: Encoded embeddings
  • cos_sim_score:

    • Input:
      • q_emb (string or list of strings): first embeded text
      • a_emb (string or list of strings): second embeded text
    • Return: cosine similarity score
  • dist_score:

    • Input:
      • q_emb (string or list of strings): first embeded text
      • a_emb (string or list of strings): second embeded text
    • Return: inner product score
  • question_similarity: calculate the simlarity score between given questions

    • Input:
      • text1 (string): first text
      • text2 (string): second text
    • Return: cosine similarity score
  • question_classify: use USE QA as question classifier

    • Input:
      • text (string): text to classify
      • classes (list of strings): candidate classification classes
  • answer_similarity: calculate the simlarity score between given answer

    • Input:
      • text1 (string): first text
      • text2 (string): second text
    • Return: cosine similarity score
  • answer_classify: use USE encoder as answer classifier

    • Input:
      • text (string): text to classify
      • classes (list of strings): candidate classification classes
  • qa_similarity: calculate the simlarity score between question and answer

    • Input:
      • text1 (string): first text
      • text2 (string): second text
    • Return: cosine similarity score
  • qa_classify: use USE encoder as a QA classifier

    • Input:
      • text (string): text to classify
      • classes (list of strings): candidate classification classes
    • Returns:

Example Jac Usage:

# Use USE_QA model for zero-shot text classification
walker use_qa_example {
    can use.qa_similarity;
    has questions = "What is your age?";
    has responses = ["I am 20 years old.", "good morning"];
    has response_contexts = ["I will be 21 next year.", "great day."];

    max_score = 0;
    max_cand = 0;
    cand_idx = 0;
    for response in responses {
        cos_score = use.qa_similarity(text1=questions,text2=response);
        std.out(cos_score);
        if (cos_score > max_score) {
            max_score = cos_score;
            max_cand = cand_idx;
        }
        cand_idx += 1;
    }

    predicted_cand = responses[max_cand];
}

BI-Encoder (bi_enc)

bi_enc module can be used for intent classification, it takes contexts and candidates, to predict the best suitable candidate for each context. You can train the module on custom data to behave accordingly.

  • dot_prod:
    • Input:
      • vec_a (list of float): first embeded text
      • vec_b (list of float): second embeded text
    • Return: dot product score
  • cos_sim_score:
    • Input:
      • vec_a (list of float): first embeded text
      • vec_b (list of float): second embeded text
    • Return: cosine similarity score
  • infer: predits the most suitable candidate for a provided context, takes text or embedding
    • Input:
      • contexts (string or list of strings): context which needs to be classified
      • candidates (string or list of strings): list of candidates for the context
      • context_type (string): can be text or embedding type
      • candidate_type (string): can be text or embedding type
    • Return: a dictionary of similarity score for each candidate and context
  • train: used to train the Bi-Encoder for custom input
    • Input:
      • dataset (Dict): dictionary of candidates and suportting contexts for each candidate
      • from_scratch (bool): if set to true train the model from scratch otherwise trains incrementally
      • training_parameters (Dict): dictionary of training parameters
    • Returns: text when model training is completed
  • get_context_emb:
    • Alternate name: encode_context
    • Input:
      • contexts (string or list of strings): context which needs to be encoded
    • Returns a list of embedding of 128 length for tiny bert
  • get_candidate_emb:
    • Alternate name: encode_candidate
    • Input:
      • candidates (string or list of strings): candidates which needs to be encoded
    • Returns: list of embedding of 128 length for tiny bert
  • get_train_config:
    • Input: None
    • Returns: json of all the current training configuration
    {
       "max_contexts_length": 128,
       "max_candidate_length": 64,
       "train_batch_size": 8,
       "eval_batch_size": 2,
       "max_history": 4,
       "learning_rate": 0.001,
       "weight_decay": 0.01,
       "warmup_steps": 100,
       "adam_epsilon": 1e-06,
       "max_grad_norm": 1,
       "num_train_epochs": 10,
       "gradient_accumulation_steps": 1,
       "fp16": false,
       "fp16_opt_level": "O1",
       "gpu": 0,
       "basepath": "logoutput",
       "seed": 12345,
       "device": "cuda"
    }
    
  • set_train_config:
    • Input
      • train_parameters (Dict): dictionary of training parameters. See the json example above under get_train_config for the list of available training parameters.
    • Returns: "Config setup is complete." if train configuration is completed successfully
  • get_model_config:
    • Input: None
    • Returns: json of all the current model configuration
    {
        "shared": false,
        "model_name": "prajjwal1/bert-tiny",
        "model_save_path": "modeloutput",
        "loss_function": "mse",
        "loss_type": "dot"
    }
    
  • set_model_config:
    • Input
      • model_parameters(Dict): dictionary of model parameters. See the json example above under get_model_config for the list of available training parameters.
    • Returns: "Config setup is complete." if model configuration is completed successfully
  • save_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[Saved model at] : <model_path>" if model successfully saved
  • load_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[loaded model from] : <model_path>" if model successfully loaded

Example Jac Usage:

# Train an bi-encoder model for intent classification
walker bi_enc_example{
    has train_file = "train_bi.json";
    has from_scratch = true;
    has num_train_epochs = 20;
    has contexts= ["Share my location with Hillary's sister"];
    has candidates=[
        "searchplace",
        "getplacedetails",
        "bookrestaurant",
        "gettrafficinformation",
        "compareplaces",
        "sharecurrentlocation",
        "requestride",
        "getdirections",
        "shareeta",
        "getweather"
    ];

    can bi_enc.train,bi_enc.infer;

    train_data = file.load_json(train_file);

    # Train the model
    bi_enc.train(
        dataset=train_data,
        from_scratch=from_scratch,
        training_parameters={
            "num_train_epochs": num_train_epochs
        }
    );

    # Use the model to perform inference
    # returns the list of context with the suitable candidates
    resp_data = bi_enc.infer(
        contexts=contexts,
        candidates=candidates,
        context_type="text",
        candidate_type="text"
    );

    # Iterate through the candidate labels and their predicted scores
    max_score = 0;
    max_intent = "";
    pred=resp_data[0];
    for j=0 to j<pred["candidate"].length by j+=1 {
        if (pred["score"][j] > max_score){
            max_intent = pred["candidate"][j];
            max_score = pred["score"][j];
        }
    }
    std.out("predicted intent : ",max_intent ," Conf_Score:", max_score);
}

Sbert Similarity (sbert_sim)

sbert_sim is a implementation of SentenceBert for scoring similarity between sentence pairs, it uses bi-encoder in a saimese setup to encode the sentences followed by the cosine similarity to score the similarity.

  • get_dot_score : Caculate the dot product of two given vectors
    • Input:
      • vec_a (list of float): first embeded text
      • vec_b (list of float): second embeded text
    • Return: dot product score
  • get_cos_score : Caculate the cosine similarity score of two given vectors
    • Input:
      • vec_a (list of float): first embeded text
      • vec_b (list of float): second embeded text
    • Return: cosine similarity score
  • get_text_sim: gets the similarity score between query with all the sentences in corpus and return the top_k similar sentences with sim_score
    • Input:
      • query (string or list of strings): context which needs to be classified
      • corpus (string or list of strings): list of candidates for the context
      • top_k (string): can be text or embedding type
    • Return: list of top_k similar sentences with sim_score
  • train: used to train the Bi-Encoder for custom input
    • Input:
      • dataset (List): List of List, each list contains a pair of sentence and similarity score.
      • training_parameters (Dict): dictionary of training parameters
    • Returns: text when model training is completed
  • getembeddings:
    • Input:
      • texts (string or list of strings): take text and returns a encoded embeddings
    • Returns a list of embeddings
  • get_train_config:
    • Input: None
    • Returns: json of all the current training configuration
    {
       "device": "cpu",
       "num_epochs": 2,
       "model_save_path": "output/sent_model-2022-11-04_17-43-18",
       "model_name": "bert-base-uncased",
       "max_seq_length": 256
    }
    
  • load_model:
    • Input

      • model_type (string): can be default or tfm_model
        • default : loads model from the sbert model zoo
        • tfm_model : load tranformer model from the huggingface hub
      • model_name (string): this is name of the model to be loaded
      •  {
         "model_name": "all-MiniLM-L12-v2",
         "model_type": "default"
         }
        
    • Returns: "[loaded model from] : <model_type> <model_name>" if model successfully loaded

      • [loaded model from] SBERT Hub : all-MiniLM-L12-v2
        

Example Jac Usage:

## Train and evalute a sbert model for senetence similarity
walker sbert_sim_example{
    has train_file = "train_sbert.json";
    has num_epochs = 2;
    has query= ["A girl is on a sandy beach."];
    has corpus=["A girl dancing on a sandy beach."];
    has top_k=1;

    can sbert_sim.train,sbert_sim.get_text_sim;

    train_data = file.load_json(train_file);
    
    # Train the model 
    sbert_sim.train(
        dataset=train_data['train_data'],
        training_parameters={
            "num_epochs": num_epochs
        }
    );

    # returns the top_k of simlar test in the corpus 
    resp_data = sbert_sim.get_text_sim(query=query,corpus=corpus,top_k=top_k);
    std.out(resp_data);
}

FastText Encoder (fast_enc)

fast_enc module uses the facebook's fasttext -- efficient learning of word representations and sentence classification.

  • train: used to train the Bi-Encoder for custom input
    • Input:
      • traindata (Dict): dictionary of candidates and suportting contexts for each candidate
      • train_with_existing (bool): if set to true train the model from scratch otherwise trains incrementally
  • predict: predits the most suitable candidate for a provided context, takes text or embedding
    • Input:
      • sentences (list of strings): list of sentences the needs to be classified
    • Return: a dictionary of sentence, predicted intent and probability
  • save_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[Saved model at] : <model_path>" if model successfully saved
  • load_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[loaded model from] : <model_path>" if model successfully loaded

Example Jac Usage:

# Train and inference with a fasttext classifier
walker fast_enc_example {
    has train_file = "fast_enc_train.json";
    has train_with_existing = false;
    has test_sentence=  ["what's going on ?"];
    can fast_enc.train,fast_enc.predict;

    # Training the model
    train_data = file.load_json(train_file);
    fast_enc.train(traindata=train_data,train_with_existing=false);

    # Getting inference from the model
    resp_data=fast_enc.predict(sentences=test_sentence);
    std.out(resp_data);
}

Entity

Entity Extraction (ent_ext)

ent_ext module uses Flair named entity recognition architecture. Can either be used zero-shot or trained.

  • train: used to train the Flair-based NER model
    • Input:
      • train_data: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context.
      [
          {
              "context": "EU rejects German call to boycott British lamb",
              "entities": [
                  {
                      "entity_value": "EU",
                      "entity_type": "ORG",
                      "start_index": 0,
                      "end_index": 2
                  },
                  {
                      "entity_value": "German",
                      "entity_type": "MISC",
                      "start_index": 11,
                      "end_index": 17
                  },
                  {
                      "entity_value": "British",
                      "entity_type": "MISC",
                      "start_index": 34,
                      "end_index": 41
                  }
              ]
          }
      ]
      
      • val_data: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context
      [
          {
              "context": "CRICKET LEICESTERSHIRE TAKE OVER AT TOP AFTER INNINGS VICTORY",
              "entities": [
                  {
                      "entity_value": "LEICESTERSHIRE",
                      "entity_type": "ORG",
                      "start_index": 8,
                      "end_index": 22
                  }
              ]
          }
      ]
      
      • test_data: (List(Dict)): a list of dictionaries containing contexts and list of entities in each context
      [
          {
              "context": "The former Soviet republic was playing in an Asian Cup finals tie for the first time",
              "entities": [
                  {
                      "entity_value": "Soviet",
                      "entity_type": "MISC",
                      "start_index": 11,
                      "end_index": 17
                  },
                  {
                      "entity_value": "Asian",
                      "entity_type": "MISC",
                      "start_index": 45,
                      "end_index": 50
                  },
                  {
                      "entity_value": "Asian",
                      "entity_type": "MISC",
                      "start_index": 45,
                      "end_index": 50
                  }
              ]
          }
      ]
      
      • train_params: (Dict): dictionary of training parameters to modify the training behaviour
      {
          "num_epoch": 20,
          "batch_size": 16,
          "LR": 0.01
      }
      
  • entity_detection: detects all availabe entities from the provided context
    • Input:
      • text (string): context to detect entities.
      • ner_labels(list of strings): List of entities, e.g. ["LOC","PER"]
    • Return: a list of dictionary entities containing entity_text, entity_value, conf_score and index
  • save_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[Saved model at] : <model_path>" if model successfully saved
  • load_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[loaded model from] : <model_path>" if model successfully loaded
  • set_config:
    • Input
      • ner_model: pretrained or basic model to be loaded, provide the exact name of the model, available options are:
        • Pre-trained LSTM / GRU : ["ner", "ner-fast","ner-large"]
        • Huggingface model : all available models that can be intialized with AutoModel
        • None : for load a RNN model from scratch
      • model_type: type of model to be loaded, available options are :
        • TRFMODEL : for huggingface models
        • LSTM or GRU : RNN models
    • Returns: "Config setup is complete." if model successfully loaded

Example Jac Usage:

# Train and inference with an entity extraction model
walker ent_ext_example {

    has train_file = "train_data.json";
    has val_file = "val_data.json";
    has test_file = "test_data.json";
    has from_scratch = true;
    has num_train_epochs = 20;
    has batch_size = 8;
    has learning_rate = 0.02;
    can ent_ext.entity_detection, ent_ext.train;
    train_data = file.load_json(train_file);
    val_data = file.load_json(val_file);
    test_data = file.load_json(test_file);

    # Training the model
    ent_ext.train(
        train_data = train_data,
        val_data = val_data,
        test_data = test_data,
        train_params = {
            "num_epoch": num_train_epochs,
            "batch_size": batch_size,
            "LR": learning_rate
            });

    # Getting inference from the model
    resp_data = ent_ext.entity_detection(text="book a flight from kolkata to delhi",ner_labels= ["LOC"]);
    std.out(resp_data);
}

Entity Extraction Using Transformers (tfm_ner)

tfm_ner module uses transformers to identify and extract entities. It uses TokenClassification method from Huggingface.

  • train: used to train transformer NER model
    • Input:
      • train_data: (List(Dict)): a list dictionary containing contexts and list of entities in each context
      [
          {
              "context": "MINNETONKA , Minn .",
              "entities": [
                          {
                              "entity_value": "MINNETONKA",
                              "entity_type": "LOC",
                              "start_index": 0,
                              "end_index": 10
                          },
                          {
                              "entity_value": "Minn",
                              "entity_type": "LOC",
                              "start_index": 13,
                              "end_index": 17
                          }
              ]
          }
      ]
      
      • mode: (String): mode for training the model, available options are :
        • default: train the model from scratch
        • incremental: providing more training data for current set of entities
        • append: changing the number of entities (model is restarted and trained with all of traindata)
      • epochs : (int): Number of epoch you want model to train.
  • extract_entity: detects all availabe entities from the provided context
    • Input:
      • text (string): context to detect entities.
    • Return: a list of dictionary entities containing entity_text, entity_value, conf_score and index
  • save_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[Saved model at] : <model_path>" if model successfully saved
  • load_model:
    • Input
      • model_path (string): the path to save model
    • Returns: "[loaded model from] : <model_path>" if model successfully loaded
  • get_train_config:
    • Input: None
    • Returns: json of all the current training configuration
       {
           "MAX_LEN": 128,
           "TRAIN_BATCH_SIZE": 4,
           "VALID_BATCH_SIZE": 2,
           "EPOCHS": 50,
           "LEARNING_RATE": 2e-05,
           "MAX_GRAD_NORM": 10,
           "MODE": "default"
       }
    
  • set_train_config:
    • Input
      • train_parameters (Dict): dictionary of training parameters. See the json example above for available configuration parameters.
    • Returns: "Config setup is complete." if train configuration is completed successfully
  • get_model_config:
    • Input: None
    • Returns: json of all the current model configuration
        {
            "model_name": "prajjwal1/bert-tiny",
            "model_save_path": "modeloutput"
        }
    
  • set_model_config:
    • Input
      • model_parameters(Dict): dictionary of model parameters. See the json example above for available configuration parameters.
    • Returns: "Config setup is complete." if model configuration is completed successfully

Example Jac Usage:

# Train and inference with a transformer-based NER model
walker tfm_ner_example {

    has train_file = "train_ner.json";
    has num_train_epochs = 10;
    has mode = "default";
    can tfm_ner.extract_entity, tfm_ner.train;
    train_data = file.load_json(train_file);

    # Training the model
    tfm_ner.train(
        mode = mode,
        epochs = num_train_epochs,
        train_data=train_data
    );

    # Infer using the model
    resp_data = tfm_ner.extract_entity(
        text="book a flight from kolkata to delhi,Can you explain to me,please,what Homeowners Warranty Program means,what it applies to,what is its purpose? Thank you. The Humboldt University of Berlin is situated in Berlin, Germany"
    );
    std.out(resp_data);
}

Summarization

Summarizer (cl_summer)

cl_summer uses the sumy summarizer to create extractive summary.

  • summarize: to get the extractive summary in provided sentences count.
    • Input
      • text(String): text the contain the entire context
      • url(String): the link to the webpage
      • sent_count(int): number of sentence you want in the summary
      • summarizer_type(String): name of the summarizer type, available options are:
        • LsaSummarizer
        • LexRankSummarizer
        • LuhnSummarizer
    • Returns: List of Sentences that best summarizes the context
    • Input text file summarize.json
      {
          "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and   rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail.', But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as, he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there."
      }
      

Example Jac Usage for given text blob:

# Use the summarizer to summarize a given text blob
walker cl_summer_example {
    has text_file = "summarize.json";
    has sent_count = 5;
    has summarizer_type = "LsaSummarizer";
    can cl_summer.summarize;

    # Getting Extractive summary from text
    train_data = file.load_json(text_file);
    resp_data = cl_summer.summarize(
        text=train_data.text,
        url="none",
        sent_count=sent_count,
        summarizer_type=summarizer_type
    );
    report resp_data;
}

Example Jac Usage for given URL

# Use the summarizer to summarize a given URL
walker cl_summer_example {
    has sent_count = 5;
    has summarizer_type = "LsaSummarizer";
    has url="https://in.mashable.com/";
    can cl_summer.summarize;

    # Getting Extractive summary from URL
    resp_data_url = cl_summer.summarize(
        text="none",
        url=url,
        sent_count=sent_count,
        summarizer_type=summarizer_type
    );
    report resp_data_url;
}

T5 Summarization (t5_sum)

t5_sum uses the T5 transformer model to perform abstractive summary on a body of text.

  • classify_text: use the T5 model to summarize a body of text
    • Input:
      • text (string): text to summarize
      • min_length (integer): the least amount of words you want returned from the model
      • max_length (integer): the most amount of words you want returned from the model
    • Input datafile **data.json**
      {
          "text": "The US has passed the peak on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month. The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world. At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors. We'll be the comeback kids, all of us, he said. We want to get our country back. The Trump administration has  previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.",
          "min_length": 30,
          "max_length": 100
      }
      

Example Jac Usage:

# Use the T5 model to summarize a given piece of text
walker summarization {
    can t5_sum.classify_text;
    has data = "data.json";
    data = file.load_json(data);
    summarized_text = t5_sum.classify_text(
        text = data["text"],
        min_length = data["min_length"],
        max_length = data["max_length"]
        );
    report summarized_text;
}

Text Processing

Text Segmenter (text_seg)

text_seg Text segmentation is a method of splitting a document into smaller parts, which is usually called segments. It is widely used in text processing. Each segment has its relevant meaning. Those segments categorized as word, sentence, topic, phrase etc. module implemented for the Topical Change Detection in Documents via Embeddings of Long Sequences.

  • get_segements: gets different topics in the context provided, given a threshold

    • Input
      • text(String): text the contain the entire context
      • threshold(Float): range is between 0-1, make each sentence as segment if, threshold is 1.
    • Returns: List of Sentences that best summarizes the context
  • load_model: to load the available model for text segmentation

    • Input
      • model_name(String): name of the transformer model to load, options are:
        • wiki: trained on wikipedia data
        • legal: trained on legal documents
    • Returns: "[Model Loaded] : <model_name>"
  • Input data file text_seg.json

    {
        "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail. But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there."
    }
    

Example Jac Usage:

walker text_seg_example {
    has data_file = "text_seg.json";
    has threshold = 0.85;
    can text_seg.get_segments, text_seg.load_model;

    # loading the desired model
    resp_data = text_seg.load_model(model_name='wiki');
    std.out(resp_data);

    # Getting Segments of different topic from text
    data = file.load_json(data_file);
    resp_data = text_seg.get_segments(
        text=data.text,
        threshold=threshold
        );
    std.out(resp_data);
}

Non-AI Tools

PDF Extractor (pdf_ext)

pdf_ext module implemented for the Topical Change Detection in Documents via Embeddings of Long Sequences.

  • extract_pdf: gets different topics in the context provided, given a threshold
    • Input
      • url(String): gets the pdf from URL
      • path(Float): gets the pdf from local path
      • metadata(Bool): to display available metadata of PDF
    • Returns: a json with number of pages the pdf had and content

Example Jac Usage:

walker pdf_ext_example {
    has url = "http://www.africau.edu/images/default/sample.pdf";
    has metadata = true;
    can pdf_ext.extract_pdf;

    # Getting the dat from PDF
    resp_data = pdf_ext.extract_pdf(url=url,
    metadata=metadata);
    std.out(resp_data);
}

Summarizer (cl_summer)

Module cl_summer uses the sumy summarizer to extract summary from text.

  1. Import cl_summer module in jac
  2. Summarizer

Walk through

1. Import Summarizer (cl_summer) module in jac

  1. For executing jaseci Open terminal and run follow command.
    jsctl -m
    
  2. Load cl_summer module in jac by command
    actions load module jaseci_ai_kit.cl_summer
    

2. Summarizer

For this tutorial, we are going to leverage the Summarizer (cl_summer) which would help us to generate summary of the provided text.

  • Creating Jac Program for summarizer (cl_summer)

    1. Create a file by name summarizer.jac

    2. Create node model_dir and summarizer in summarizer.jac file.

      node model_dir;
      node summarizer{};
      
    3. Initializing node summarizer and import cl_summer.summarize ability inside node summarizer.

      # import ability
      can cl_summer.summarize;
      
    4. Initialize module summarize inside summarizer node.

      # summarizer
      can summarize with summarizer entry{
          data = file.load_json(visitor.data);
      
          report cl_summer.summarize(
              text = data["text"],
              url = data["url"],
              sent_count = data["sent_count"],
              summarizer_type = data["summarizer_type"]
              );
      }
      

      summarize: to get the extractive summary of the text in the given number of sentence counts .

      Parameter details

      • Input Data

        data.json file

        {
            "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and   rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail.', But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as, he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there.",
            "url": "none",
            "sent_count": 5,
            "summarizer_type": "LsaSummarizer"
        }
        
        • text(String): text the contain the entire context
        • url(String): the link to the webpage
        • sent_count(int): number of sentence you want in the summary
        • summarizer_type(String): name of the summarizer type, available options are:
          • LsaSummarizer
          • LexRankSummarizer
          • LuhnSummarizer
      • Output List of Sentences that best summarizes the context

    5. Adding edge name of summ_model in summarizer.jac file for connecting nodes inside graph.

      # adding edge
      edge summ_model {
          has model_type;
      }
      
    6. Adding graph name of summ_graph for initializing node.

      # adding graph
      graph summ_graph {
          has anchor summ_model_dir;
          spawn {
              summ_model_dir = spawn node::model_dir;
              summarizer_node = spawn node::summarizer;
              summ_model_dir -[summ_model(model_type="summarizer")]-> summarizer_node;
          }
      }
      
    7. Initializing walker init for calling graph.

      walker init {
          root {
          spawn here --> graph::summ_graph;
          }
      }
      
    8. Creating walker name of summarizer for getting parameter from context or default and calling ability summarize.

      # declaring walker for summerize text
      walker summarizer{
          has data="data.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final summarizer.jac program

      node model_dir;
      node summarizer{
          # import ability
          can cl_summer.summarize;
      
          # summarizer
          can summarize with summarizer entry{
              data = file.load_json(visitor.data);
      
              report cl_summer.summarize(
                  text = data["text"],
                  url = data["url"],
                  sent_count = data["sent_count"],
                  summarizer_type = data["summarizer_type"]
                  );
          }
      }
      
      # adding edge
      edge summ_model {
          has model_type;
      }
      
      # adding graph
      graph summ_graph {
          has anchor summ_model_dir;
          spawn {
              summ_model_dir = spawn node::model_dir;
              summarizer_node = spawn node::summarizer;
              summ_model_dir -[summ_model(model_type="summarizer")]-> summarizer_node;
          }
      }
      
      # declaring init walker
      walker init {
          root {
          spawn here --> graph::summ_graph;
          }
      }
      
      # declaring walker for summerize text
      walker summarizer{
          has data="data.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
    • Steps for running summarizer.jac program

      • Execute the follow command for Build summarizer.jac

        jac build summarizer.jac
        
      • Execute the follow command to Activate sentinal

        sentinel set -snt active:sentinel -mode ir summarizer.jir
        
      • Execute the walker summarizer with default parameter for summarizer(cl_summer) module by following command

        walker run summarizer
        
      • After executing walker summarizer result data will show on console.

        Result

         "report": [
                    [
                    "The King of England was at war with him and had led a great army into Scotland to drive him out of the land.",
                    "At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains.",
                    "One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him.",
                    "As he lay thinking, he saw a spider over his head making ready to weave her web.",
                    "Bruce almost forgot his own troubles as, he watched her swing herself out upon the slender line."
                    ]
                ]
        

Bi-Encoder (bi_enc)

bi_enc is a arrangement of two encoder modules from BERT, it represents context and candidate separately using twin-structured encoders, it takes contexts and candidates, to predict the best suitable candidate for each context. You can train the module on custom data to behave accordingly. Let's take a deep dive into the trainning culture.

This tutorial shows you how to train a Bi-Encoder with a custom training loop to categorize contexts by candidates. In this you use jaseci(jac) and python.

  1. Preparing dataset
  2. Import Bi-Encoder(bi_enc) module in jac
  3. Train the model
  4. Evaluate the model's effectiveness
  5. Use the trained model to make predictions

Walk through

1. Praparing dataset

For this tutorial, we are going to leverage the biencoder for intent classification, which is categorizing an incoming text into a one of predefined intents. for demonstration purpose, we are going to use the SNIPS dataset as an example here. snips dataset.

SNIPS is a popular intent classificawtion datasets that covers intents such as [ "BookRestaurant", "ComparePlaces", "GetDirections", "GetPlaceDetails", "GetTrafficInformation", "GetWeather", "RequestRide", "SearchPlace", "ShareCurrentLocation", "ShareETA" ] We need to do a little data format conversion to create a version of SNIPS that work with our biencoder implemenation. For this part, we are going to use Python. First,

  1. Import the dataset from huggingface dataset library.

    # import library
    from datasets import load_dataset
    # load dataset
    dataset = load_dataset("snips_built_in_intents")
    print(dataset["train"][:2])
    

    If imported successsfuly, you should see the data format to be something like this in output

    {"text": ["Share my location with Hillary's sister", "Send my current location to my father"], "label": [5, 5]}

  2. Converting the format from the SNIPS out of the box to the format that can be ingested by biencoder.

    import pandas as pd
    from sklearn.model_selection import train_test_split
    import json
    
    # get labels names
    lab = dataset["train"].features["label"].names
    # create labels dictionary
    label_dict = {v: k for v, k in enumerate(lab)}
    # dataset
    dataset = dataset["train"]
    
    # create dataset function
    def CreateData(data):
        # Create dataframe
        df = pd.DataFrame(data)
        # Map labels dict on label column
        df["label"] = df["label"].apply(lambda x : label_dict[x])
        # grouping text on basis of label
        df = df.groupby("label").agg({"text": "\t".join, "label": "\t".join})
        df["label"] = df["label"].apply(lambda x: x.split("\t")[0])
    
        # Create data dictionary
        data_dict = {}
        for i in range(len(df)):
            data_dict[df["label"][i]] = df["text"][i].split("\t")
        return data_dict
    # Split dataset: Create train and test dataset and store in json file `train_bi.json` and `test_bi.json` and save to disk.
    # Split dataset in train and test set
    train, test = train_test_split(dataset, test_size=0.2, random_state=42)
    
    # Create train dataset
    train_data = CreateData(train)
    # write data in json file 'train_bi.json'
    with open("train_bi.json", "w", encoding="utf8") as f:
        f.write(json.dumps(train_data, indent = 4))
    
    
    # Create test dataset
    test_data = CreateData(test)
    data = {
        "contexts": [],
        "candidates": [],
        "context_type": "text",
        "candidate_type": "text"
        }
    for itm in test_data:
        data["candidates"].append(itm)
        data["contexts"].extend(test_data[itm])
    # write data in json file 'test_bi.json'
    with open("test_bi.json", "w", encoding="utf8") as f:
            f.write(json.dumps(data, indent = 4))
    

    The resulting format should look something like this.

    • train_bi.json

      "BookRestaurant": [
          "Book me a table for 2 people at the sushi place next to the show tomorrow night","Find me a table for four for dinner tonight"
          ],
      "ComparePlaces": [
          "What's the cheapest between the two restaurants the closest to my hotel?"
          ]
      
    • test_bi.json

      {
          {
              "contexts": [
                  "We are a party of 4 people and we want to book a table at Seven Hills for sunset",
                  "Book a table at Saddle Peak Lodge for my diner with friends tonight",
                  "How do I go to Montauk avoiding tolls?",
                  "What's happening this week at Smalls Jazz Club?",
                  "Will it rain tomorrow near my all day event?",
                  "Send my current location to Anna",
                  "Share my ETA with Jo",
                  "Share my ETA with the Snips team"
              ],
              "candidates": [
                  "BookRestaurant",
                  "ComparePlaces",
                  "GetDirections",
                  "GetPlaceDetails",
                  "GetTrafficInformation",
                  "GetWeather",
                  "RequestRide",
                  "SearchPlace",
                  "ShareCurrentLocation",
                  "ShareETA"
              ],
              "context_type": "text",
              "candidate_type": "text"
          }
      }
      

2. Import Bi-Encoder(bi_enc) module in jac

  1. Open terminal and run jaseci by cmd

    jsctl -m

  2. Load bi_enc module in jac by cmd

    actions load module jaseci_ai_kit.bi_enc

3. Train the model

For this tutorial, we are going to train and infer the biencoder for intent classification its train on snips train datasets and infer on test dataset, which is categorizing an incoming text into a one of predefined intents.

  • Creating Jac Program (train and infer bi_enc)

    1. Create a file by name bi_encoder.jac

    2. Create node model_dir and bi_encoder in bi_encoder.jac file

      node model_dir;
      node bi_encoder {};
      
    3. Initializing node bi_encoder and import train and infer ability inside node.

      # import train and infer ability
      can bi_enc.train, bi_enc.infer;
      
    4. Initialize module train and infer inside bi_encoder node bi_enc.train take training argument and start traing bi_enc module

      can train_bi_enc with train_bi_enc entry{
      # Code snippet for training the model
      train_data = file.load_json(visitor.train_file);
      
      # Train the model
      report bi_enc.train(
          dataset=train_data,
          from_scratch=visitor.from_scratch,
          training_parameters={
              "num_train_epochs": visitor.num_train_epochs.int
              }
          );
      }
      # prediction on test dataset
      can infer with train_bi_enc exit{
          # Use the model to perform inference
          # returns the list of context with the suitable candidates
          test_data = file.load_json(visitor.test_file);
          resp_data = bi_enc.infer(
              contexts=test_data["contexts"],
              candidates=test_data["candidates"],
              context_type=test_data["context_type"],
              candidate_type=test_data["candidate_type"]
          );
          # the infer action returns all the candidate with the confidence scores
          # Iterate through the candidate labels and their predicted scores
          result = [];
          for pred in resp_data.list{
              text = pred["context"];
              max_score = 0;
              max_intent = "";
              for j=0 to j<pred["candidate"].length by j+=1 {
                  if (pred["score"][j] > max_score){
                      max_intent = pred["candidate"][j];
                      max_score = pred["score"][j];
                  }
              }
              result.list::append({
                  "context":text,
                  "predicted intent":max_intent,
                  "Conf_Score":max_score
                  });
          }
          report [result];
      }
      # predict intent on new text
      can predict with predict entry{
      # Use the model to perform inference
      # returns the list of context with the suitable candidates
      test_data = file.load_json(visitor.test_data_file);
      resp_data = bi_enc.infer(
          contexts=test_data["contexts"],
          candidates=test_data["candidates"],
          context_type=test_data["context_type"],
          candidate_type=test_data["candidate_type"]
          );
      # the infer action returns all the candidate with the confidence scores
      # Iterate through the candidate labels and their predicted scores
      pred = resp_data[0]
      context = pred["contexts"]
      max_score = 0;
      max_intent = "";
      for j=0 to j<pred["candidate"].length by j+=1 {
          if (pred["score"][j] > max_score){
              max_intent = pred["candidate"][j];
              max_score = pred["score"][j];
          }
      }
      report [{
              "context":text,
              "pred intent":max_intent,
              "Conf_Score":max_score
              }];
      }
      

      Parameter details

      • train: will be used to train the Bi-Encoder on custom data
        • Input:
          • dataset (Dict): dictionary of candidates and suportting contexts for each candidate
          • from_scratch (bool): if set to true train the model from scratch otherwise trains incrementally
          • training_parameters (Dict): dictionary of training parameters
        • Returns: text when model training is completed
      • infer: will be used to predits the most suitable candidate for a provided context, takes text or embedding
        • Input:
          • contexts (string or list of strings): context which needs to be classified
          • candidates (string or list of strings): list of candidates for the context
          • context_type (string): can be text or embedding type
          • candidate_type (string): can be text or embedding type
        • Return: a dictionary of similarity score for each candidate and context
    5. Adding edge name of bi_model in bi_encoder.jac file for connecting nodes inside graph.

      # adding edge
      edge bi_model {
          has model_type;
      }
      
    6. Adding graph name of bi_encoder_graph for initializing node .

      graph bi_encoder_graph {
          has anchor bi_model_dir;
          spawn {
              bi_model_dir = spawn node::model_dir;
              bi_encoder_node = spawn node::bi_encoder;
              bi_model_dir -[bi_model(model_type="bi_encoder")]-> bi_encoder_node;
          }
      }
      
    7. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::bi_encoder_graph;
          }
      }
      
    8. Creating walker name of train_bi_enc for getting parameter from context or default and calling ability train and infer bi_encoder.

      # Declaring the walker:
      walker train_bi_enc{
          # the parameters required for training
          has train_file = "train_bi.json";
          has from_scratch = true;
          has num_train_epochs = 20;
          has test_file = "test_bi.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Default parameter for train and test biencoder
      train_file : local path of train_bi.json file
      from_scratch : true
      num_train_epochs : 20
      test_file : local path of test_bi.json file

    9. Declaring walker for predicting intents on new text

      walker predict{
          has test_data_file = "test_dataset.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final bi_encoder.jac program we are conmbining all steps from 2 to 8 inside bi_encoder.jac

      node model_dir;
      node bi_encoder{
          # import train and infer ability
          can bi_enc.train, bi_enc.infer;
      
          can train_bi_enc with train_bi_enc entry{
              #Code snippet for training the model
              train_data = file.load_json(visitor.train_file);
      
              # Train the model
              report bi_enc.train(
                  dataset=train_data,
                  from_scratch=visitor.from_scratch,
                  training_parameters={
                      "num_train_epochs": visitor.num_train_epochs.int
                      }
                  );
              }
      
          can infer with train_bi_enc exit{
              # Use the model to perform inference
              # returns the list of context with the suitable candidates
              test_data = file.load_json(visitor.test_file);
              resp_data = bi_enc.infer(
                  contexts=test_data["contexts"],
                  candidates=test_data["candidates"],
                  context_type=test_data["context_type"],
                  candidate_type=test_data["candidate_type"]
              );
              # the infer action returns all the candidate with the confidence scores
              # Iterate through the candidate labels and their predicted scores
      
              result = [];
              for pred in resp_data.list{
                  text = pred["context"];
                  max_score = 0;
                  max_intent = "";
                  for j=0 to j<pred["candidate"].length by j+=1 {
                      if (pred["score"][j] > max_score){
                          max_intent = pred["candidate"][j];
                          max_score = pred["score"][j];
                      }
                  }
                  result.list::append({
                      "context":text,
                      "predicted intent":max_intent,
                      "Conf_Score":max_score
                      });
              }
              report [result];
          }
      
          # predict intent on new text
          can predict with predict entry{
          # Use the model to perform inference
          test_data = file.load_json(visitor.test_data_file);
          resp_data = bi_enc.infer(
              contexts=test_data["contexts"],
              candidates=test_data["candidates"],
              context_type=test_data["context_type"],
              candidate_type=test_data["candidate_type"]
              );
      
          # the infer action returns all the candidate with the confidence scores
          # Iterate through the candidate labels and their predicted scores
          pred = resp_data[0];
          max_score = 0;
          max_intent = "";
          for j=0 to j<pred["candidate"].length by j+=1 {
              if (pred["score"][j] > max_score){
                  max_intent = pred["candidate"][j];
                  max_score = pred["score"][j];
              }
          }
          report [{
              "context": pred["context"],
              "pred intent": max_intent,
              "Conf_Score": max_score
              }];
          }
      }
      
      
      # adding edge
      edge bi_model {
          has model_type;
      }
      
      graph bi_encoder_graph {
          has anchor bi_model_dir;
          spawn {
              bi_model_dir = spawn node::model_dir;
              bi_encoder_node = spawn node::bi_encoder;
              bi_model_dir -[bi_model(model_type="bi_encoder")]-> bi_encoder_node;
          }
      }
      
      walker init {
          root {
          spawn here --> graph::bi_encoder_graph;
          }
      }
      
      
      # Declaring the walker:
      walker train_bi_enc{
          # the parameters required for training
          has train_file = "train_bi.json";
          has from_scratch = true;
          has num_train_epochs = 20;
          has test_file = "test_bi.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
      # declaring walker for predicting intents on new text
      walker predict{
          # passing input data for prediction
          has test_data_file = "test_dataset.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

    Steps for running bi_encoder.jac programm

    1. Build bi_encoder.jac by run cmd

      jac build bi_encoder.jac

    2. Activate sentinal by run cmd

      sentinel set -snt active:sentinel -mode ir bi_encoder.jir

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir bi_encoder.jir

    3. Calling walker train_bi_enc with default parameter for training bi_enc module by cmd

      walker run train_bi_enc

    After 3rd step running logging will shown on console
    training logs

    jaseci > walker run train_bi_enc
    Saving non-shared model to : modeloutput
    non shared model created
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:13<00:00,  2.42it/s]
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:13<00:00, 14.12it/s]
    
                Epoch : 1
                loss : 0.11524221436543898
                LR : 0.000891891891891892
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:01<00:00, 21.54it/s]
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:01<00:00, 21.42it/s]
    
                Epoch : 2
                loss : 0.030822114031197445
                LR : 0.0006689189189189189
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:01<00:00, 20.99it/s]
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:01<00:00, 20.67it/s]
    
                Epoch : 3
                loss : 0.016803985327538667
                LR : 0.000445945945945946
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:01<00:00, 19.98it/s]
    97%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊     | 32/33 [00:01<00:00, 19.59it/s]
    
                Epoch : 4
                loss : 0.011880970348350027
                LR : 0.000222972972972973
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:04<00:00,  8.16it/s]
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [00:04<00:00,  6.06it/s]
    
                Epoch : 5
                loss : 0.010109249611780273
                LR : 0.0
    
    Epoch: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:22<00:00,  4.49s/batch]
    
    

4. Evaluation of the model effectiveness

  • Performing model effectiveness on test_bi.json dataset.

    Model testing Accuracy : 0.9090909090909091
    Model testing F1_Score : 0.886656034024455
    
    Model classification Report
    
                            precision    recall  f1-score   support
    
            BookRestaurant       1.00      1.00      1.00        14
             ComparePlaces       0.57      1.00      0.73         4
             GetDirections       1.00      1.00      1.00         7
           GetPlaceDetails       1.00      0.80      0.89        10
     GetTrafficInformation       1.00      0.50      0.67         4
                GetWeather       0.90      1.00      0.95         9
               RequestRide       0.83      1.00      0.91         5
               SearchPlace       0.80      0.67      0.73         6
      ShareCurrentLocation       1.00      1.00      1.00         3
                  ShareETA       1.00      1.00      1.00         4
    
                  accuracy                           0.91        66
                 macro avg       0.91      0.90      0.89        66
              weighted avg       0.93      0.91      0.91        66
    

    Sample Result Data

    [
        {
            "context": "I want a table for friday 8pm for 2 people at Katz's Delicatessen",
            "pred intent": "BookRestaurant",
            "true intent": "BookRestaurant",
            "Conf_Score": 16.789536811645576
        },
        {
            "context": "Is Vertigo Sky Lounge more expensive than the bar I usually go to in New York?",
            "pred intent": "ComparePlaces",
            "true intent": "ComparePlaces",
            "Conf_Score": 10.970629125448095
        },
        {
            "context": "Directions to Disneyworld avoiding traffic",
            "pred intent": "GetDirections",
            "true intent": "GetDirections",
            "Conf_Score": 18.088717110248858
        },
        {
            "context": "Show me the fastest itinerary to my Airbnb on a Friday night",
            "pred intent": "GetDirections",
            "true intent": "GetDirections",
            "Conf_Score": 12.83485819143171
        },
        {
            "context": "Is there any traffic on US 20?",
            "pred intent": "GetTrafficInformation",
            "true intent": "GetTrafficInformation",
            "Conf_Score": 12.378884709882225
        },
        {
            "context": "Can I take my bike to go to work today?",
            "pred intent": "GetTrafficInformation",
            "true intent": "GetWeather",
            "Conf_Score": 13.005006348394783
        },
        {
            "context": "Book a Lyft car to go to 33 greene street",
            "pred intent": "RequestRide",
            "true intent": "RequestRide",
            "Conf_Score": 18.330015462917917
        },
    
        {
            "context": "Get me a ride to the airport",
            "pred intent": "RequestRide",
            "true intent": "RequestRide",
            "Conf_Score": 27.063900934951988
        },
        {
            "context": "Find me the finest sushi restaurant in the area of my next meeting",
            "pred intent": "SearchPlace",
            "true intent": "SearchPlace",
            "Conf_Score": 13.798050719287755
        }
    ]
    

5. Use the trained model to make predictions

  • Create new input data for prdiction stored in test_dataset.json file (can take any name)

    Input data

      {
          "contexts": [
              "We are a party of 4 people and we want to book a table at Seven Hills for sunset"
          ],
          "candidates": [
              "BookRestaurant",
              "ComparePlaces",
              "GetDirections",
              "GetPlaceDetails",
              "GetTrafficInformation",
              "GetWeather",
              "RequestRide",
              "SearchPlace",
              "ShareCurrentLocation",
              "ShareETA"
          ],
          "context_type": "text",
          "candidate_type": "text"
      }
    
  • Calling walker for predict intents by cmd

    walker run predict -ctx "{\"test_data_file\":\"test_dataset.json\"}"
    

    Output Result

     [
      {
        "context": "We are a party of 4 people and we want to book a table at Seven Hills for sunset",
        "pred intent": "BookRestaurant",
        "Conf_Score": 19.72020419731474
      }
    ]
    

Entity Extraction Using FLAIR NER(ent_ext)

FLAIR NER(ent_ext) module uses flair named entity recognition architecture. It can either be used zero-shot or few-shot entity recognition.

For this tutorial we are going to leaverage the flair ner Zero-shot classification and Few-shot classification Use Case

Load the Model by set_config actions for Zeroshot

Different models can be loaded as per the requirement on the basis of size and

  1. Large model : Transformer based model good for ZeroShot Prediction, can be for with custom entities extraction

    1. Roberta
      1. Size - 1.43GB
      2. Eval_time - 1.20 sec
     walker set_config {
         report ent_ext.set_config(
             ner_model = "tars-ner",
             model_type = "tars"
         );
     }
    
  2. Medium Size model: LSTM based model trained to predict PER,ORG, LOC, MISC entities only

    1. Size - 430MB
    2. Eval_time - 1.48 sec
     walker set_config {
         report ent_ext.set_config(
             ner_model = "ner",
             model_type = "lstm  "
         );
     }
    
  3. Small Size model: LSTM based model trained to predict PER,ORG, LOC, MISC entities only

    1. Size - 257MB
    2. Eval_time - 0.40 sec
     walker set_config {
         report ent_ext.set_config(
             ner_model = "ner-fast",
             model_type = "lstm  "
         );
     }
    

USE CASE I : Zero-Shot entity detection

  1. Import flair ner(ent_ext) module
  2. Classify Entity

USE CASE II : Few-shot classification

  1. Preparing dataset
  2. Import flair ner(ent_ext) module
  3. Few-shot classification

Eperiment and methodology

Walk through

USE CASE I : Zero-Shot entity detection Classify entity without training NER Data:

1. Import Flair Ner Module in jac

  1. Open terminal an run jaseci by cmd
    jsctl -m
    
  2. Load module ent_ext in jac by cmd
    actions load module jaseci_ai_kit.ent_ext
    

2. Classify Entity :

For this tutorial we are going to classify entity text with flair ner(ent_ext) module on tars-ner pretrained model.

  • Creating jac program for zero-shot(ent_ext)

    1. Create a file by name zero_shot_ner.jac

    2. Create node model_dir and flair_ner in zero_shot_ner.jac file

      node model_dir;
      node flair_ner {};
      
    3. Initializing node flair_ner and adding abilty set_config and entity_detection

      node flair_ner{
          # set model configuration and infer entity
          ent_ext.set_config, can ent_ext.entity_detection;
          }
      
    4. Initializing module for set_config inside node flair_ner

      set_config will get parameter from context and load model in module. its take two argument model_name(str) an model_type(str).

      can set_config with infer_zero_shot entry{
          report ent_ext.set_config(
              ner_model = visitor.model_name,
              model_type = visitor.model_type
          );
      }
      
    5. Initializing module infer_zero_shot for zero_shot tokenclassification inside flair_ner node

      infer_zero_shot module take two arguments text and labels list for infer entity.

      can infer_zero_shot with infer_zero_shot entry{
          text = visitor.text;
          labels = visitor.labels.list;
          result =  ent_ext.entity_detection(
              text=text,
              ner_labels= labels
              );
          fn = "result.json";
          result = {"text":text,"entities":result["entities"]};
          file.dump_json(fn, result);
      }
      
    6. Adding edge name of ner_model in zero_shot_ner.jac file for connecting nodes inside graph.

      # adding edge
      edge ner_model {
          has model_type;
      }
      
    7. Adding graph name of ner_val_graph for initializing node .

      graph ner_eval_graph {
          has anchor ner_model_dir;
          spawn {
              ner_model_dir = spawn node::model_dir;
              flair_ner_node = spawn node::flair_ner;
              ner_model_dir -[ner_model(model_type="flair_ner")]-> flair_ner_node;
          }
      }
      
    8. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::ner_val_graph;
          }
      }
      
    9. Creating walker name of infer_zero_shot for getting parameter from context and calling ability set_config and infer_zero_shot entity detection.

      # creating walker for entity predictions
      walker infer_zero_shot{
          has model_name;
          has model_type;
          has text;
          has labels;
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      It take arguments from context and call ability to set model configuration set_config and infer_zero_shot for detecting entity from text and store result in result.json file

    • Final jac program zero_shot_ner.jac

      node model_dir;
      node flair_ner {
          # load the model actions here
          can ent_ext.entity_detection, ent_ext.set_config;
      
          can set_config with infer_zero_shot entry{
              report ent_ext.set_config(
                  ner_model = visitor.model_name,
                  model_type = visitor.model_type
              );
          }
      
          can infer_zero_shot with infer_zero_shot entry{
              text = visitor.text;
              labels = visitor.labels.list;
              result =  ent_ext.entity_detection(
                  text=text,
                  ner_labels= labels
                  );
              fn = "result.json";
              result = {"text":text,"entities":result["entities"]};
              file.dump_json(fn, result);
          }
      }
      
      edge ner_model {
          has model_type;
      }
      
      graph ner_eval_graph {
          has anchor ner_model_dir;
          spawn {
              ner_model_dir = spawn node::model_dir;
              flair_ner_node = spawn node::flair_ner;
              ner_model_dir -[ner_model(model_type="flair_ner")]-> flair_ner_node;
          }
      }
      
      
      
      walker init {
          root {
          spawn here --> graph::ner_eval_graph;
          }
      }
      
      # creating walker for entity predictions
      walker infer_zero_shot{
          has model_name;
          has model_type;
          has text;
          has labels;
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
  • Steps for calling jac program use case 1 and infer_zero_shot entity from new text.

    1. Build zero_shot_ner.jac by run cmd

      jac build zero_shot_ner.jac
      
    2. Activate sentinal by run cmd:

      sentinel set -snt active:sentinel -mode ir zero_shot_ner.jir
      

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir zero_shot_ner.jir

    3. Module entity_detection: detects all availabe entities from the provided context

      • Input Data:

        • model_name: name of model which we are using for zero-shot entity detection e.g. tars-ner
        • model_type : type of model using in entity detection e.g. tars
        • text (string): context to detect entities. e.g. "They had a record of five wins and two losses in Opening Day games at Bennett Park 19 wins and 22 losses at Tiger Stadium and three wins and four losses at Comerica Park for a total home record in Opening Day games of 26 wins and 28 losses"
        • ner_labels(list of strings): List of entities, e.g. ["LOC","PER"]
      • Output

        • Result: Created a json file that stored input text and predicted entities in result.json file`
    4. Run the following command to execute walker for entity_detection and pass Input Data in context.

      walker run infer_zero_shot -ctx "{\"model_name\":\"tars-ner\",\"model_type\":\"tars\",\"text\":\"They had a record of five wins and two losses in Opening Day games at Bennett Park 19 wins and 22 losses at Tiger Stadium and three wins and four losses at Comerica Park for a total home record in Opening Day games of 26 wins and 28 losses\",\"labels\":[\"building\", \"organization\"]}"
      
    5. After executing step 6 entity output will be stored in result.json file

      {
          "text": "They had a record of five wins and two losses in Opening Day games at Bennett Park 19 wins and 22 losses at Tiger Stadium and three wins and four losses at Comerica Park for a total home record in Opening Day games of 26 wins and 28 losses",
          "entities": [
              {
                  "entity_text": "Bennett Park",
                  "entity_value": "building",
                  "conf_score": 0.9999510645866394,
                  "start_pos": 70,
                  "end_pos": 82
              },
              {
                  "entity_text": "Tiger Stadium",
                  "entity_value": "building",
                  "conf_score": 0.9999762773513794,
                  "start_pos": 108,
                  "end_pos": 121
              },
              {
                  "entity_text": "Comerica Park",
                  "entity_value": "building",
                  "conf_score": 0.999976634979248,
                  "start_pos": 156,
                  "end_pos": 169
              }
          ]
      }
      

Use Case II : Few-shot classification

In Few shot classification we are going train, test and validate ent_ext module

1. Creating Input Datasets

For train test and validation we are going to prepare dataset from Conll2003 dataset, we are creating list of dict and storing in json file by name train.json, validation.json and test.json, and storing dataset file in directory name dataset and put all required file in this.

  • train_file : dataset/train.json
    [
        {
            "context": "EU rejects German call to boycott British lamb",
            "entities": [
                {
                    "entity_value": "EU",
                    "entity_type": "ORG",
                    "start_index": 0,
                    "end_index": 2
                },
                {
                    "entity_value": "German",
                    "entity_type": "MISC",
                    "start_index": 11,
                    "end_index": 17
                },
                {
                    "entity_value": "British",
                    "entity_type": "MISC",
                    "start_index": 34,
                    "end_index": 41
                }
            ]
        }
    ]
    
  • val_file : dataset/val.json
    [
        {
            "context": "CRICKET LEICESTERSHIRE TAKE OVER AT TOP AFTER INNINGS VICTORY",
            "entities": [
                {
                    "entity_value": "LEICESTERSHIRE",
                    "entity_type": "ORG",
                    "start_index": 8,
                    "end_index": 22
                }
            ]
        }
    ]
    
  • test_file : dataset/test.json
    [
        {
            "context": "The former Soviet republic was playing in an Asian Cup finals tie for the first time",
            "entities": [
                {
                    "entity_value": "Soviet",
                    "entity_type": "MISC",
                    "start_index": 11,
                    "end_index": 17
                },
                {
                    "entity_value": "Asian",
                    "entity_type": "MISC",
                    "start_index": 45,
                    "end_index": 50
                },
                {
                    "entity_value": "Asian",
                    "entity_type": "MISC",
                    "start_index": 45,
                    "end_index": 50
                }
            ]
        }
    ]
    

2. Import Flair ner(ent_ext) Module in jac

  1. Open terminal an run jaseci by cmd
    jsctl -m
    
  2. Load module ent_ext in jac by cmd
    actions load module jaseci_ai_kit.ent_ext
    

3. Few-shot classification (Train, Test and Validate model)

For this tutorial we are going to train the model on train dataset and validate the model on validation dataset and final test model on the test dataset.

  • Creating Jac Program

    1. Create a file by name flair_ner.jac

    2. Create node model_dir and flair_ner in flair_ner.jac file

      node model_dir;
      node flair_ner {};
      
    3. Initializing node flair_ner and adding abilty set_config and entity_detection

      node flair_ner{
          # set ability model configuration and train model
          ent_ext.set_config, can ent_ext.train, ent_ext.entity_detection;
          }
      
    4. Initializing module for set_config inside node flair_ner

      can set_config with infer_zero_shot entry{
          report ent_ext.set_config(
              ner_model = visitor.model_name,
              model_type = visitor.model_type
          );
      }
      

      set_config will take two argument model_name(str) and model_type(str). and load model for training and validation.

    5. Initializing module for train and infer inside node flair_ner

      can train with train_and_val_flair entry{
          # train the model with a given dataset
          train_data = file.load_json(visitor.train_file);
          val_data = file.load_json(visitor.val_file);
          test_data = file.load_json(visitor.test_file);
      
          # training model
          ent_ext.train(
              train_data = train_data,
              val_data = val_data,
              test_data = test_data,
              train_params = {
                  "num_epoch": visitor.num_train_epochs.int,
                  "batch_size": visitor.batch_size.int,
                  "LR": visitor.learning_rate.float
                  });
      }
      
      can infer with predict_flair entry{
          report ent_ext.entity_detection(
              text = visitor.text,
              ner_labels = visitor.ner_labels.list
          );
      }
      

      train will take 4 parameter describing in upcoming steps parameter_description

    6. Adding edge name of ner_model in flair_ner.jac file for connecting nodes inside graph.

      # adding edge
      edge ner_model {
          has model_type;
      }
      
    7. Adding graph name of ner_val_graph for initializing node .

      graph ner_eval_graph {
          has anchor ner_model_dir;
          spawn {
              ner_model_dir = spawn node::model_dir;
              flair_ner_node = spawn node::flair_ner;
              ner_model_dir -[ner_model(model_type="flair_ner")]-> flair_ner_node;
          }
      }
      
    8. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::ner_val_graph;
          }
      }
      
    9. Creating walker name of train_and_val_flair for getting parameter from context and calling ability set_config and train and start training model on new dataset, validate and test

      ## creating walker
      walker train_and_val_flair {
          # Take in a training and eval dataset
          has train_file;
          has val_file;
          has test_file;
          has model_name="prajjwal1/bert-tiny";
          has model_type="trfmodel";
          has num_train_epochs=3;
          has batch_size=8;
          has learning_rate=0.02;
      
          # Train all NER models on the train set
          # and validate them on the val set
          # report accuracy performance on flair NER models
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Here we are initialize some default argument and also we are providing all argument from context. it take arguments from context and call ability to set model configuration set_config and train is for training, val and test model on new datasets.

    10. Creating walker for predicting entities from trained flair model.

      # infer
      walker predict_flair{
          has text;
          #declare default labels
          has ner_labels = ["PER","ORG", "LOC", "MISC"];
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      
      }
      
  • Final Jac Program

    • flair_ner.jac
      node model_dir;
      node flair_ner {
          # load the model actions here
          can ent_ext.set_config, ent_ext.train, ent_ext.entity_detection;
      
      
          can set_config with train_and_val_flair entry{
              ent_ext.set_config(
                  ner_model = visitor.model_name,
                  model_type = visitor.model_type
              );
          }
      
      
          can train with train_and_val_flair entry{
              # train the model with a given dataset
              train_data = file.load_json(visitor.train_file);
              val_data = file.load_json(visitor.val_file);
              test_data = file.load_json(visitor.test_file);
      
              # training model
              ent_ext.train(
                  train_data = train_data,
                  val_data = val_data,
                  test_data = test_data,
                  train_params = {
                      "num_epoch": visitor.num_train_epochs.int,
                      "batch_size": visitor.batch_size.int,
                      "LR": visitor.learning_rate.float
                      });
          }
      
          can infer with predict_flair entry{
              report ent_ext.entity_detection(
                  text = visitor.text,
                  ner_labels = visitor.ner_labels.list
              );
          }
      }
      
      
      edge ner_model {
          has model_type;
      }
      
      graph ner_eval_graph {
          has anchor ner_model_dir;
          spawn {
              ner_model_dir = spawn node::model_dir;
              flair_ner_node = spawn node::flair_ner;
              ner_model_dir -[ner_model(model_type="flair_ner")]-> flair_ner_node;
          }
      }
      
      
      walker init {
          root {
          spawn here --> graph::ner_eval_graph;
          }
      }
      
      ## creating walker for:
      # Train all NER models on the train set
      # and validate them on the val set
      # and test on test set
      # report accuracy performance on flair NER models
      
      walker train_and_val_flair {
          # Take in a training and eval dataset
          has train_file;
          has val_file;
          has test_file;
          has model_name="prajjwal1/bert-tiny";
          has model_type="trfmodel";
          has num_train_epochs=3;
          has batch_size=8;
          has learning_rate=0.02;
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
      # infer
      walker predict_flair{
          has text;
          #declare default labels
          has ner_labels = ["PER","ORG", "LOC", "MISC"];
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      
      }
      
  • Steps for calling jac program(flair_ner.jac)

    1. Build flair_ner.jac by run cmd :

      jac build flair_ner.jac
      
    2. Activate sentinal by run cmd:

      sentinel set -snt active:sentinel -mode ir flair_ner.jir
      

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir flair_ner.jir

    3. Create train and validation context: train model on train dataset file and validate and test on validate and test dataset file.

      • Input data for train and validation:

        • train_file(List(Dict)) : training dataset file
        • val_file(List(Dict)): validation datase file
        • test_file(List(Dict)): test dataset file
        • model_name(str) : prajjwal1/bert-tiny or tars-ner
        • model_type(str) : trfmodel or tars
        • num_train_epochs(int) : 3 (default)
        • batch_size(int): 8
        • learning_rate(float):0.02
    4. Run the following command to execute walker for model train and validation and pass input data in context.

      walker run train_and_val_flair -ctx "{\"train_file\":\"dataset/train.json\",\"val_file\":\"dataset/val.json\",\"test_file\":\"dataset/test.json\",\"model_name\":\"prajjwal1/bert-tiny\",\"model_type\":\"trfmodel\",\"num_train_epochs\":\"10\",\"batch_size\":\"8\",\"learning_rate\":\"0.02\"}"
      
      
    5. You'll find the following logs in train folder inside model name. Console logs

      2022-06-14 10:58:47,583 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:58:47,583 Corpus: "Corpus: 14041 train + 3250 dev + 3453 test sentences"
      2022-06-14 10:58:47,583 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:58:47,583 Parameters:
      2022-06-14 10:58:47,584  - learning_rate: "0.02"
      2022-06-14 10:58:47,584  - mini_batch_size: "128"
      2022-06-14 10:58:47,584  - patience: "3"
      2022-06-14 10:58:47,584  - anneal_factor: "0.5"
      2022-06-14 10:58:47,584  - max_epochs: "10"
      2022-06-14 10:58:47,584  - shuffle: "True"
      2022-06-14 10:58:47,584  - train_with_dev: "False"
      2022-06-14 10:58:47,584  - batch_growth_annealing: "False"
      2022-06-14 10:58:47,584 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:58:47,584 Model training base path: "train/prajjwal1/bert-tiny"
      2022-06-14 10:58:47,584 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:58:47,584 Device: cuda:0
      2022-06-14 10:58:47,584 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:58:47,584 Embeddings storage mode: cpu
      2022-06-14 10:58:47,585 ----------------------------------------------------------------------------------------------------
      2022-06-14 10:59:11,725 epoch 1 - iter 11/110 - loss 0.46690662 - samples/sec: 58.35 - lr: 0.020000
      2022-06-14 10:59:36,854 epoch 1 - iter 22/110 - loss 0.35627199 - samples/sec: 56.04 - lr: 0.020000
      2022-06-14 10:59:56,318 epoch 1 - iter 33/110 - loss 0.32018351 - samples/sec: 72.35 - lr: 0.020000
      2022-06-14 11:00:16,082 epoch 1 - iter 44/110 - loss 0.30274213 - samples/sec: 71.25 - lr: 0.020000
      2022-06-14 11:00:35,760 epoch 1 - iter 55/110 - loss 0.28451030 - samples/sec: 71.56 - lr: 0.020000
      2022-06-14 11:00:58,241 epoch 1 - iter 66/110 - loss 0.26581275 - samples/sec: 62.64 - lr: 0.020000
      2022-06-14 11:01:24,133 epoch 1 - iter 77/110 - loss 0.25145255 - samples/sec: 54.39 - lr: 0.020000
      2022-06-14 11:01:48,914 epoch 1 - iter 88/110 - loss 0.24174765 - samples/sec: 56.82 - lr: 0.020000
      2022-06-14 11:02:15,320 epoch 1 - iter 99/110 - loss 0.23233378 - samples/sec: 53.33 - lr: 0.020000
      2022-06-14 11:02:40,455 epoch 1 - iter 110/110 - loss 0.22324374 - samples/sec: 56.03 - lr: 0.020000
      2022-06-14 11:02:40,455 ----------------------------------------------------------------------------------------------------
      2022-06-14 11:02:40,456 EPOCH 1 done: loss 0.2232 - lr 0.0200000
      2022-06-14 11:04:15,844 DEV : loss 0.08416544854674485 - f1-score (micro avg)  0.1417
      2022-06-14 11:04:15,876 BAD EPOCHS (no improvement): 0
      2022-06-14 11:04:15,922 saving best model
      ..............
      ..............
      ..............
      
      2022-06-14 11:48:43,609 epoch 10 - iter 11/110 - loss 0.05611527 - samples/sec: 61.04 - lr: 0.020000
      2022-06-14 11:49:06,084 epoch 10 - iter 22/110 - loss 0.05563375 - samples/sec: 62.66 - lr: 0.020000
      2022-06-14 11:49:29,709 epoch 10 - iter 33/110 - loss 0.05567900 - samples/sec: 59.61 - lr: 0.020000
      2022-06-14 11:49:52,993 epoch 10 - iter 44/110 - loss 0.05584901 - samples/sec: 60.48 - lr: 0.020000
      2022-06-14 11:50:16,497 epoch 10 - iter 55/110 - loss 0.05558204 - samples/sec: 59.91 - lr: 0.020000
      2022-06-14 11:50:39,222 epoch 10 - iter 66/110 - loss 0.05536536 - samples/sec: 61.97 - lr: 0.020000
      2022-06-14 11:51:03,463 epoch 10 - iter 77/110 - loss 0.05519602 - samples/sec: 58.09 - lr: 0.020000
      2022-06-14 11:51:27,246 epoch 10 - iter 88/110 - loss 0.05550491 - samples/sec: 59.21 - lr: 0.020000
      2022-06-14 11:51:50,920 epoch 10 - iter 99/110 - loss 0.05559963 - samples/sec: 59.48 - lr: 0.020000
      2022-06-14 11:52:13,645 epoch 10 - iter 110/110 - loss 0.05556217 - samples/sec: 61.97 - lr: 0.020000
      2022-06-14 11:52:13,646 ----------------------------------------------------------------------------------------------------
      2022-06-14 11:52:13,646 EPOCH 10 done: loss 0.0556 - lr 0.0200000
      2022-06-14 11:53:51,555 DEV : loss 0.03640907264452083 - f1-score (micro avg)  0.7614
      2022-06-14 11:53:51,587 BAD EPOCHS (no improvement): 0
      2022-06-14 11:53:51,634 saving best model
      2022-06-14 11:53:51,725 ----------------------------------------------------------------------------------------------------
      2022-06-14 11:53:51,726 loading file train/prajjwal1/bert-tiny/best-model.pt
      2022-06-14 11:53:54,423 No model_max_length in Tokenizer's config.json - setting it to 512. Specify desired model_max_length by passing it as attribute to embedding instance.
      2022-06-14 11:55:30,534 0.7138	0.7305	0.7221	0.6166
      2022-06-14 11:55:30,534
      Results:
      - F-score (micro) 0.7221
      - F-score (macro) 0.5625
      - Accuracy 0.6166
      
      By class:
                  precision    recall  f1-score   support
      
              PER     0.7186    0.8751    0.7892      1617
              LOC     0.7645    0.8118    0.7874      1668
              ORG     0.6902    0.5527    0.6138      1661
              MISC    0.6192    0.6254    0.6223       702
           <STOP>     0.0000    0.0000    0.0000         0
      
        micro avg     0.7138    0.7305    0.7221      5648
        macro avg     0.5585    0.5730    0.5625      5648
      weighted avg    0.7115    0.7305    0.7164      5648
      samples avg     0.6166    0.6166    0.6166      5648
      
      
    6. Run the following command to execute walker predict_flair for predicting entities.

      walker run predict_flair -ctx "{\"text\":\"Two goals from defensive errors in the last six minutes allowed Japan to come from behind and collect all three points from their opening meeting against Syria\"}"
      

      After executing walker predict_flair will get output e.g.

      [
          {
              "entities": [
                  {
                      "entity_text": "Japan",
                      "entity_value": "LOC",
                      "conf_score": 0.9944729208946228,
                      "start_pos": 64,
                      "end_pos": 69
                  },
                  {
                      "entity_text": "Syria",
                      "entity_value": "LOC",
                      "conf_score": 0.9952408075332642,
                      "start_pos": 154,
                      "end_pos": 159
                  }
              ]
          }
      ]
      

Experiment and methodology

  • Zero-Shot entity detection

    Let us further look our zero-shot entity detection on Few-Nerd Dataset, Few-NERD is a large-scale, fine-grained manually annotated named entity recognition dataset, which contains 8 coarse-grained(Major) types, 66 fine-grained(All) labels types. Three benchmark tasks are built, one is supervised (Few-NERD (SUP)) and the other two are few-shot (Few-NERD (INTRA) and Few-NERD (INTER)).

    Dataset details

    | Dataset Name | train dataset | validation dataset | test dataset | | -------------- | ------------- | ------------------ | ------------ | | FEW-NERD (SUP) | 131767 | 18824 | 37648 |

    For zero-shot entity-detection we are using flair tars-ner model.

    Results

    Zero-shot performance on FEW-NERD(INTER) test Dataset.

    | Labels | Accuracy | F1_Score | | ------ | ----------- | ----------- | | All | 0.32053081 | 0.305329825 | | Major | 0.475163372 | 0.447818008 |

    Zero-shot performance on FEW-NERD(INTRA) test Dataset.

    | Labels | Accuracy | F1_Score | | ------ | ----------- | ----------- | | All | 0.143001171 | 0.171142318 | | Major | 0.540210805 | 0.462717092 |

    Zero-shot performance on FEW-NERD(SUP) test Dataset.

    | Labels | Accuracy | F1_Score | | ------ | ----------- | ----------- | | All | 0.105859864 | 0.128166615 | | Major | 0.475106014 | 0.431429256 |

    From this results we will get following insight, if we have number of labels is less, we will get higher accuracy and f1_score.

Fasttext Classifier

fast_enc module uses the facebook's fasttext -- efficient learning of word representations and sentence classification.

This tutorial shows you how to train a fasttext Classifier with a custom training loop to categorize sentences.

  1. Preparing dataset
  2. Import fasttext(fast_enc) module in jac
  3. Train the model
  4. Evaluate the model's effectiveness
  5. Use the trained model to make predictions

Walk through

1. Praparing dataset

For this tutorial, we are going to leverage the fasttext Classifier for sentence classification, which is categorizing an incoming text into a one of predefined intents. for demonstration purpose, we are going to use the SNIPS dataset as an example here. snips dataset.

SNIPS is a popular intent classificawtion datasets that covers intents such as [ "BookRestaurant", "ComparePlaces", "GetDirections", "GetPlaceDetails", "GetTrafficInformation", "GetWeather", "RequestRide", "SearchPlace", "ShareCurrentLocation", "ShareETA" ] We need to do a little data format conversion to create a version of SNIPS that work with our fasttext Classifier implemenation. For this part, we are going to use Python. First,

  1. Import the dataset from huggingface dataset library.

    # import library
    from datasets import load_dataset
    # load dataset
    dataset = load_dataset("snips_built_in_intents")
    print(dataset["train"][:2])
    

    If imported successsfuly, you should see the data format to be something like this

    {"text": ["Share my location with Hillary's sister", "Send my current location to my father"], "label": [5, 5]}

  2. Converting the format from the SNIPS out of the box to the format that can be ingested by biencoder.

    import pandas as pd
    from sklearn.model_selection import train_test_split
    import json
    
    # get labels names
    lab = dataset["train"].features["label"].names
    # create labels dictionary
    label_dict = {v: k for v, k in enumerate(lab)}
    # dataset
    dataset = dataset["train"]
    
    # create dataset function
    def CreateData(data):
        # Create dataframe
        df = pd.DataFrame(data)
        # Map labels dict on label column
        df["label"] = df["label"].apply(lambda x : label_dict[x])
        # grouping text on basis of label
        df = df.groupby("label").agg({"text": "\t".join, "label": "\t".join})
        df["label"] = df["label"].apply(lambda x: x.split("\t")[0])
    
        # Create data dictionary
        data_dict = {}
        for i in range(len(df)):
            data_dict[df["label"][i]] = df["text"][i].split("\t")
        return data_dict
    # Split dataset: Create train and test dataset and store in json file `train_bi.json` and `test_bi.json` and save to disk.
    # Split dataset in train and test set
    train, test = train_test_split(dataset, test_size=0.2, random_state=42)
    
    # Create train dataset
    train_data = CreateData(train)
    # write data in json file 'train.json'
    with open("train.json", "w", encoding="utf8") as f:
        f.write(json.dumps(train_data, indent = 4))
    
    
    # Create test dataset
    test_data = CreateData(test)
    data = {
        "contexts": [],
        "labels": []
        }
    for itm in test_data:
        data["labels"].append(itm)
        data["contexts"].extend(test_data[itm])
    # write data in json file 'test.json'
    with open("test.json", "w", encoding="utf8") as f:
            f.write(json.dumps(data, indent = 4))
    

    The example result format should look something like this.

    • train.json

      "BookRestaurant": [
          "Book me a table for 2 people at the sushi place next to the show tomorrow night","Find me a table for four for dinner tonight"
          ],
      "ComparePlaces": [
          "What's the cheapest between the two restaurants the closest to my hotel?"
          ]
      
    • test.json

      {
          {
              "contexts": [
                  "We are a party of 4 people and we want to book a table at Seven Hills for sunset",
                  "Book a table at Saddle Peak Lodge for my diner with friends tonight",
                  "How do I go to Montauk avoiding tolls?",
                  "What's happening this week at Smalls Jazz Club?",
                  "Will it rain tomorrow near my all day event?",
                  "Send my current location to Anna",
                  "Share my ETA with Jo",
                  "Share my ETA with the Snips team"
              ],
              "labels": [
                  "BookRestaurant",
                  "ComparePlaces",
                  "GetDirections",
                  "GetPlaceDetails",
                  "GetTrafficInformation",
                  "GetWeather",
                  "RequestRide",
                  "SearchPlace",
                  "ShareCurrentLocation",
                  "ShareETA"
              ]
          }
      }
      

2. Import Fasttext(fast_enc) module in jac

  1. Open terminal and run jaseci by cmd

    jsctl -m

  2. Load fast_enc module in jac by cmd

    actions load module jaseci_ai_kit.fast_enc

3. Train the model

For this tutorial, we are going to train and test the fast_enc for intent classification its train on snips train datasets and test on test dataset, which is categorizing an incoming text into a one of predefined intents.

  • Creating Jac Program (train and test fast_enc)
    1. Create a file by name fasttext.jac

    2. Create node model_dir and fasttext in fasttext.jac file

      node model_dir;
      node fasttext {};
      
    3. Initializing node fasttext and import train and infer ability inside node.

      # import train and infer ability
      can fast_enc.train, fast_enc.predict;
      
    4. Initialize module train and test inside fatstext node fast_enc.train take training argument and start traing fast_enc module

          can train_fasttext with train_and_test_fasttext entry{
          #Code snippet for training the model
          train_data = file.load_json(visitor.train_file);
          std.out("fasttext training started...",train_data.type, visitor.train_with_existing.bool);
          report fast_enc.train(
              traindata = train_data,
              train_with_existing = visitor.train_with_existing
          );
      }
      
      can tests with train_and_test_fasttext exit{
          std.out("fasttext validation started...");
          # Use the model to perform inference
          # returns the list of context with the suitable intents
          test_data = file.load_json(visitor.test_file);
      
          resp_data = fast_enc.predict(
              sentences=test_data["contexts"]
          );
      
          fn = "fasttext_val_result.json";
          file.dump_json(fn, resp_data);
      }
      
    5. Initialize module for predict intent on new text

      can predict with predict_fasttext entry{
      # Use the model to perform inference
      resp_data = fast_enc.predict(
          sentences=file.load_json(visitor.test_file)["text"]
          );
      # the infer action returns all the labels with the probability scores
      report [resp_data];
      }
      

      Parameter details

      • train: will be used to train the fasttext module on custom dataset

        • Input:
          • traindata (Dict): dictionary of candidates and suportting contexts for each candidate
          • train_with_existing (bool): if set to false train the model from scratch otherwise trains incrementally
      • infer: will be used to predits the most suitable candidate for a provided context, takes text or embedding

        • Input:
          • contexts (list of strings): context which needs to be classified
        • Return: a dictionary of probability score for each candidate and context
    6. Adding edge name of enc_model in fasttext.jac file for connecting nodes inside graph.

      # adding edge
      edge enc_model {
          has model_type;
      }
      
    7. Adding graph name of encoder_graph for initializing node .

      graph encoder_graph {
          has anchor enc_model_dir;
          spawn {
              enc_model_dir = spawn node::model_dir;
              fasttext_node = spawn node::fasttext;
              enc_model_dir -[enc_model(model_type="fasttext")]-> fasttext_node;
          }
      }
      
    8. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::encoder_graph;
          }
      }
      
    9. Creating walker name of train_and_evaluate_fasttext for getting parameter from context or default and calling ability train and test.

      # Declaring the walker:
      walker train_and_evaluate_fasttext{
          # the parameters required for training
          has train_with_existing=false;
          has train_file="train.json";
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Default parameter for train and test fasttext
      train_file : local path of train.json file
      train_with_existing : false
      test_file : local path of test.json file

    10. Declaring walker for predicting intents on new text

      # Declaring walker for predicting intents on new text
      walker predict_fasttext{
          has test_file = "test_dataset.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final fasttext.jac program

      node model_dir;
      node fasttext{
          # import train and infer ability
          can fast_enc.train, fast_enc.predict;
      
          can train_fasttext with train_and_test_fasttext entry{
              #Code snippet for training the model
              train_data = file.load_json(visitor.train_file);
              std.out("fasttext training started...",train_data.type, visitor.train_with_existing.bool);
              report fast_enc.train(
                  traindata = train_data,
                  train_with_existing = visitor.train_with_existing
              );
          }
      
      
          can tests with train_and_test_fasttext exit{
              std.out("fasttext validation started...");
              # Use the model to perform inference
              # returns the list of context with the suitable candidates
              test_data = file.load_json(visitor.test_file);
      
              resp_data = fast_enc.predict(
                  sentences=test_data["contexts"]
              );
              # the infer action returns all the candidate with the confidence scores
              # Iterate through the candidate labels and their predicted scores
      
              fn = "fasttext_val_result.json";
              file.dump_json(fn, resp_data);
          }
      
          can predict with predict_fasttext entry{
          # Use the model to perform inference
          resp_data = fast_enc.predict(
              sentences=file.load_json(visitor.test_file)["text"]
              );
          # the infer action returns all the labels with the probability scores
          report [resp_data];
          }
      }
      
      
      # adding edge
      edge enc_model {
          has model_type;
      }
      
      graph encoder_graph {
          has anchor enc_model_dir;
          spawn {
              enc_model_dir = spawn node::model_dir;
              fasttext_node = spawn node::fasttext;
              enc_model_dir -[enc_model(model_type="fasttext")]-> fasttext_node;
          }
      }
      
      walker init {
          root {
          spawn here --> graph::encoder_graph;
          }
      }
      
      
      # Declaring the walker:
      walker train_and_test_fasttext{
          # the parameters required for training
          has train_with_existing=false;
          has train_file="train.json";
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
      # declaring walker for predicting intents on new text
      walker predict_fasttext{
          has test_file = "test_dataset.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
  • Steps for running fasttext.jac programm
    1. Run the following command to Build fasttext.jac

      jac build fasttext.jac

    2. Run the following command to Activate sentinal

      sentinel set -snt active:sentinel -mode ir fasttext.jir

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir fasttext.jir

    3. Run the following command to execute walker train_and_test_fasttext with default parameter for training fast_enc module.

      walker run train_and_test_fasttext

    4. You'll find the following logs on console

      training logs

      jaseci > walker run train_and_evaluate_fasttext -ctx "{\"train_file\":\"train.json\",\"test_file\":\"test.json\",\"train_with_existing\":\"false\"}"
      Training...
      Wrote 261 sentences to C:\Users\satyam.singh\anaconda3\envs\pytorch\lib\site-packages\jaseci_ai_kit\modules\fasttext\pretrained_model\train.txt
      Read 0M words
      Number of words:  577
      Number of labels: 10
      Progress: 100.0% words/sec/thread:  105638 lr:  0.000000 avg.loss:  1.230422 ETA:   0h 0m 0s
      Saving...
      
      Model saved to C:\Users\satyam.singh\anaconda3\envs\pytorch\lib\site-packages\jaseci_ai_kit\modules\fasttext\pretrained_model\model.ftz.
      
      LABELS (10):
      - BookRestaurant
      - GetPlaceDetails
      - GetWeather
      - GetDirections
      - SearchPlace
      - RequestRide
      - ShareETA
      - GetTrafficInformation
      - ComparePlaces
      - ShareCurrentLocation
      fasttext validation started...
      {
      "success": true,
      "report": [
          "Model training Completed"
      ]
      }
      

4. Evaluation of the model effectiveness

  • Performing model effectiveness on test.json dataset

    Model testing Accuracy :  0.82
    Model testing F1_Score :  0.76
    
    Model classification Report
    
                            precision    recall  f1-score   support
    
            BookRestaurant       1.00      1.00      1.00        14
             ComparePlaces       1.00      0.25      0.40         4
             GetDirections       0.70      1.00      0.82         7
           GetPlaceDetails       0.56      0.90      0.69        10
     GetTrafficInformation       0.80      1.00      0.89         4
                GetWeather       0.88      0.78      0.82         9
               RequestRide       1.00      1.00      1.00         5
               SearchPlace       0.00      0.00      0.00         6
      ShareCurrentLocation       1.00      1.00      1.00         3
                  ShareETA       1.00      1.00      1.00         4
    
                  accuracy                           0.82        66
                 macro avg       0.79      0.79      0.76        66
              weighted avg       0.78      0.82      0.78        66
    

    Sample Result Data

    {
        "I want a table for friday 8pm for 2 people at Katz's Delicatessen": [
            {
                "sentence": "I want a table for friday 8pm for 2 people at Katz's Delicatessen",
                "intent": "BookRestaurant",
                "probability": 0.9759947061538696
            }
        ],
        "I want a table in a good japanese restaurant near Trump tower": [
            {
                "sentence": "I want a table in a good japanese restaurant near Trump tower",
                "intent": "BookRestaurant",
                "probability": 0.830787181854248
            }
        ],
        "Book a table at a restaurant near Times Square for 2 people tomorrow night": [
            {
                "sentence": "Book a table at a restaurant near Times Square for 2 people tomorrow night",
                "intent": "BookRestaurant",
                "probability": 0.9866142272949219
            }
        ],
        "Book a table for today's lunch at Eggy's Diner for 3 people": [
            {
                "sentence": "Book a table for today's lunch at Eggy's Diner for 3 people",
                "intent": "BookRestaurant",
                "probability": 0.9936538934707642
            }
        ]
    }
    

5. Use the trained model to make predictions

  • Create new input data for prdiction stored in a file for example - test_dataset.json
    Input data

    {
        "text": [
            "We are a party of 4 people and we want to book a table at Seven Hills for sunset",
            "Is Waldorf Astoria more luxurious than the Four Seasons?"
        ]
    }
    
  • Run the following command to Executing walker predict_fasttext

    walker run predict_fasttext
    

    Output Result

    {
        "We are a party of 4 people and we want to book a table at Seven Hills for sunset": [
            {
                "sentence": "We are a party of 4 people and we want to book a table at Seven Hills for sunset",
                "intent": "BookRestaurant",
                "probability": 0.9151427149772644
            }
        ],
        "Is Waldorf Astoria more luxurious than the Four Seasons?": [
            {
                "sentence": "Is Waldorf Astoria more luxurious than the Four Seasons?",
                "intent": "GetPlaceDetails",
                "probability": 0.34331175684928896
            }
        ]
    }
    

YoloV5

Yolo V5 (yolov5)

YOLOv5 yolov5 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research into future vision AI methods, incorporating lessons learned and best practices evolved over thousands of hours of research and development.

  • load_model: Allows you to load the yolov5 pytorch model.
    • Input:
      • name (string) (required): the name of the trained model.
      • confidence_threshold (integer) (optional): is a number between 0 and 1 that represents the likelihood that the output of the Machine Learning model is correct and will satisfy a user's request.
    • Return : Message whether the model was sucessfully loaded or not.
    • Note: Before loading the model (weights) need to be in a specific location at object_detection/yolov5/runs/train/exp/weights, with a specific file suffix/type (*.pt)
  • detect: Based on the image(s) provided by the user it will try to predict where an object alongside with it's label is on the each attached image(s).
    • Input:
      • file_list (files) (required): the image files you want the model to detect objects location.
      • image_size (integer) (optional): The inference size of the image(s) attached.
      • download_image (boolean) (optional): Whether or not you want the image(s) to be returned to the current API.
    • Return: The class (labels) it picked up from image, Bounding Box Coordinates (bbox) where the object was detected, confidence ( score ) and image in base64 format.

How to create an image based on the image_bas64 string

In api.py there is a function called base64EncodeImage this is what converts the image to image arrays then into string.

def base64EncodeImage(img):
    """Takes an input image and returns a base64 encoded string representation of that image (jpg format)"""
    _, im_arr = cv2.imencode(".jpg", img)
    im_b64 = base64.b64encode(im_arr.tobytes()).decode("utf-8")

    return im_b64

How it supports multiple images

For the /detect endpoint it accepts an array of files and the data have to be send through a multipart or formdata or else it won't work, which will be later encoded, processed and then returned to the user.

How to run the jaseci streaming feature

For the /ws endpoint I have set up a websocket connection between the client.py and the /ws endpoint. In order to run object detection on a streamed image, you must start the server and upload the AI model. Once the model is uploaded, you must then run the client.py file which has an image already and will constantly send this image to the server for object detection.

Summarization(t5_sum)

Module t5_sum uses the hugging face T5 transformer model to provide abstractive summary from text.

  1. Import t5_sum module in jac
  2. Summarization

Walk through

1. Import Summarizer (t5_sum) module in jac

  1. For executing jaseci Open terminal and run follow command.
    jsctl -m
    
  2. Load cl_summer module in jac by command
    actions load module jaseci_ai_kit.t5_sum
    

2. Summarization

For this tutorial, we are going to leverage the Summarizer(t5_sum) which would generate the summary from text.

  • Creating Jac Program for summarizer (t5_sum)

    1. Create a file by name summarizer.jac

    2. Create node model_dir and summarizer in summarizer.jac file.

      node model_dir;
      node summarizer{};
      
    3. import t5_sum.classify_text ability inside node summarizer.

      # import ability
      can t5_sum.classify_text;
      
    4. Initialize module summarize inside summarizer node.

      # summarizer
      can summarize with summarizer entry{
          data = file.load_json(visitor.dataset);
          report t5_sum.classify_text(
              text = data["text"],
              min_length = data["min_length"],
              max_length = data["max_length"]
              );
      }
      

      classify_text: use the T5 model to summarize a body of text

      Parameter details

      • Input Data dataset.json file

        {
            "text": "The US has passed the peak on new coronavirus cases, President Donald Trump said and predicted that some states would reopen this month. The US has over 637,000 confirmed Covid-19 cases and over 30,826 deaths, the highest for any country in the world. At the daily White House coronavirus briefing on Wednesday, Trump said new guidelines to reopen the country would be announced on Thursday after he speaks to governors. We'll be the comeback kids, all of us, he said. We want to get our country back. The Trump administration has  previously fixed May 1 as a possible date to reopen the world's largest economy, but the president said some states may be able to return to normalcy earlier than that.",
            "min_length": 30,
            "max_length": 100
        }
        
        • text (string): text to summarize

        • min_length (integer): the least amount of words you want returned from the model

        • max_length (integer): the most amount of words you want returned from the model

        • Output List of Sentences that best summarizes the context

    5. Adding edge name of summ_model in summarizer.jac file for connecting nodes inside graph.

      # adding edge
      edge summ_model {
          has model_type;
      }
      
    6. Adding graph name of summ_graph for initializing node.

      # adding graph
      graph summ_graph {
          has anchor summ_model_dir;
          spawn {
              summ_model_dir = spawn node::model_dir;
              summarizer_node = spawn node::summarizer;
              summ_model_dir -[summ_model(model_type="summarizer")]-> summarizer_node;
          }
      }
      
    7. Initializing walker init for calling graph.

      walker init {
          root {
          spawn here --> graph::summ_graph;
          }
      }
      
    8. Creating walker name of summarizer for getting parameter from context or default and calling ability summarize.

      # declaring walker for summerize text
      walker summarizer{
          has dataset="dataset.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final get_seg.jac program

      node model_dir;
      node summarizer{
          # import ability
          can t5_sum.classify_text;
      
          # summarizer
          can summarize with summarizer entry{
              data = file.load_json(visitor.dataset);
      
              report t5_sum.classify_text(
                  text = data["text"],
                  min_length = data["min_length"],
                  max_length = data["max_length"]
                  );
          }
      }
      
      # adding edge
      edge summ_model {
          has model_type;
      }
      
      # adding graph
      graph summ_graph {
          has anchor summ_model_dir;
          spawn {
              summ_model_dir = spawn node::model_dir;
              summarizer_node = spawn node::summarizer;
              summ_model_dir -[summ_model(model_type="summarizer")]-> summarizer_node;
          }
      }
      
      walker init {
          root {
          spawn here --> graph::summ_graph;
          }
      }
      
      # declaring walker for summerize text
      walker summarizer{
          has dataset="data.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
    • Steps for running summarizer.jac program

      • Execute the follow command for Build summarizer.jac

        jac build summarizer.jac
        
      • Execute the follow command to Activate sentinal

        sentinel set -snt active:sentinel -mode ir summarizer.jir
        
      • Execute the walker summarizer with default parameter for summarizer(cl_summer) module by following command

        walker run summarizer
        
      • After executing walker summarizer result data will show on console.

        Result

          "report": [
                        "the president predicts some states will reopen this month. the country has over 637,000 confirmed cases and over 30,826 deaths, the highest for any country in the world. we'll be the comeback kids, all of us."
                    ]
        

Text Segmenter (text_seg)

Module text_seg implemented for the Topical Change Detection in Documents via Embeddings of Long Sequences.

  1. Import text_seg module in jac
  2. Get segments

Walk through

1. Import text_segementer (text_seg) module in jac

  1. For executing jaseci Open terminal and run follow command.
    jsctl -m
    
  2. Load text_seg module in jac by command
    actions load module jaseci_ai_kit.text_seg
    

2. Get segments

For this tutorial, we are going to leverage the text segmenter (text_seg) for the purpose of text segmentation,

  • Creating Jac Program text segmenter (text_seg)

    1. Create a file by name segment.jac

    2. Create node model_dir and text_seg in segment.jac file.

      node model_dir;
      node text_seg{};
      
    3. Initializing node text_seg and import text_seg.load_model and text_seg.get_segments ability inside node text_seg.

      # import ability
      can text_seg.load_model, text_seg.get_segments;
      
    4. Initialize module load_model and get_segments inside get_seg node.

      # loading model
      can load_model with text_segment entry{
          text_seg.load_model(
              model_name = visitor.model_name
          );
      }
      
      # text segmentation
      can get_segments with text_segment entry{
          data = file.load_json(visitor.data);
      
          report text_seg.get_segments(
              text = data["text"],
              threshold = data["threshold"]
          );
      }
      

      load_model to load the available model for text segmentation. get_segements: gets different topics in the context provided, given a threshold

      Parameter details

      • Input
        • model_name(String): name of the transformer model to load, options are:

          • wiki : trained on wikipedia data
          • legal: trained on legal documents
        • Input Data

          {
              "text": "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains. One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. 'Poor thing,' said Bruce: 'you, too, know what it is to fail. But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time. Bruce almost forgot his own troubles as he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there.",
              "threshold": 0.65
          }
          
          • text(String): text the contain the entire context
          • threshold(Float): range is between 0-1, make each sentence as segment if, threshold is 1.
        • Output List of Sentences that best summarizes the context

    5. Adding edge name of seg_model in segment.jac file for connecting nodes inside graph.

      # adding edge
      edge seg_model {
          has model_type;
      }
      
    6. Adding graph name of text_seg_graph for initializing node.

      graph text_seg_graph {
          has anchor seg_model_dir;
          spawn {
              seg_model_dir = spawn node::model_dir;
              text_seg_node = spawn node::text_seg;
              seg_model_dir -[seg_model(model_type="text_seg")]-> text_seg_node;
          }
      }
      
    7. Initializing walker init for calling graph.

      walker init {
          root {
          spawn here --> graph::text_seg_graph;
          }
      }
      
    8. Creating walker name of text_segment for getting parameter from context or default and calling ability load_model and get_segments.

      # declaring walker for summerize text
      walker text_segment{
          has model_name="wiki";
          has data="text.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final segment.jac program

      node model_dir;
      node text_seg{
          # import all module ability
          can text_seg.load_model, text_seg.get_segments;
      
          # loading model
          can load_model with text_segment entry{
              text_seg.load_model(
                  model_name = visitor.model_name
              );
          }
      
          # text segmentation
          can segment with text_segment entry{
              data = file.load_json(visitor.data);
      
              report text_seg.get_segments(
                  text = data["text"],
                  threshold = data["threshold"]
              );
          }
      }
      
      
      
      # adding edge
      edge seg_model {
          has model_type;
      }
      
      # adding graph
      graph text_seg_graph {
          has anchor seg_model_dir;
          spawn {
              seg_model_dir = spawn node::model_dir;
              text_seg_node = spawn node::text_seg;
              seg_model_dir -[seg_model(model_type="text_seg")]-> text_seg_node;
          }
      }
      
      # declare init graph
      walker init {
          root {
          spawn here --> graph::text_seg_graph;
          }
      }
      
      
      # declaring walker for summerize text
      walker text_segment{
          has model_name="wiki";
          has data="text.json";
      
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      
    • Steps for running segment.jac program

      • Execute the follow command for Build segment.jac

        jac build segment.jac
        
      • Execute the follow command to Activate sentinal

        sentinel set -snt active:sentinel -mode ir segment.jir
        

        Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

        sentinel register -set_active true -mode ir segment.jir

      • Execute the walker text_segment with default parameter for text segmentation (text_seg) module by following command

        walker run text_segment
        

        After executing walker text_segment result data will show on console.

        Result

        "report": [
                    [
                        "There was once a king of Scotland whose name was Robert Bruce. He needed to be both brave and wise because the times in which he lived were wild and rude. The King of England was at war with him and had led a great army into Scotland to drive him out of the land. Battle after battle had been fought. Six times Bruce had led his brave little army against his foes and six times his men had been beaten and driven into flight. At last his army was scattered, and he was forced to hide in the woods and in lonely places among the mountains.",
                        "One rainy day, Bruce lay on the ground under a crude shed listening to the patter of the drops on the roof above him. He was tired and unhappy. He was ready to give up all hope. It seemed to him that there was no use for him to try to do anything more. As he lay thinking, he saw a spider over his head making ready to weave her web. He watched her as she toiled slowly and with great care. Six times she tried to throw her frail thread from one beam to another, and six times it fell short. ' Poor thing,' said Bruce: 'you, too, know what it is to fail. But the spider did not lose hope with the sixth failure. With still more care, she made ready to try for the seventh time.",
                        "Bruce almost forgot his own troubles as he watched her swing herself out upon the slender line. Would she fail again? No! The thread was carried safely to the beam and fastened there."
                ]
        

Entity Extraction Using Transformers (tfm_ner)

What is NER: named entity recognition (NER) — sometimes referred to as entity chunking, extraction, or identification — is the task of identifying and categorizing key information (entities) in text. An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category.

tfm_ner: module based on transformers to identify and extract entities. It uses TokenClassification method from Huggingface.

This tutorial show you how to train test and validate tfm_ner module.

Transformer NER

  1. Preparing Dataset
  2. Import tfm_ner module
  3. Model training and validation
  4. Predicting entities
  5. Experiments and methodology

1. Preparing Dataset

For train, test and validation dataset, we are going to creating list of dict and storing in json file by name train.json, validation.json and test.json for demonstration perpose we are using here conll2003 dataset and storing dataset file in directory name dataset and put all required file in this directory.

Example Dataset file data format

  • train_data:: (List(Dict)): a list dictionary containing contexts and list of entities in each context.

    [{
        "context": "However a loophole in the law allowed Buddharakkita and Jayewardene evade the death penalty as the Capital Punishment Repeal Act allowed for a sentence of death to a person convicted for murder committed prior to December 2 1959 and not for the offence of conspiracy to commit murder",
        "entities": [
            {
                "entity_value": "Buddharakkita",
                "entity_type": "person",
                "start_index": 38,
                "end_index": 51
            },
            {
                "entity_value": "Jayewardene",
                "entity_type": "person",
                "start_index": 56,
                "end_index": 67
            },
            {
                "entity_value": "Capital Punishment Repeal Act",
                "entity_type": "event",
                "start_index": 99,
                "end_index": 128
            }
        ]
        }
    ]
    
  • val_data: (List(Dict)): a list dictionary containing contexts and list of entities in each context

    [
        {
            "context": "The Stavros Niarchos Foundation Cultural Center inaugurated in 2016 will house the National Library of Greece and the Greek National Opera",
            "entities": [
                {
                    "entity_value": "Stavros Niarchos Foundation Cultural Center",
                    "entity_type": "building",
                    "start_index": 4,
                    "end_index": 47
                },
                {
                    "entity_value": "National Library of Greece",
                    "entity_type": "building",
                    "start_index": 83,
                    "end_index": 109
                },
                {
                    "entity_value": "Greek National Opera",
                    "entity_type": "building",
                    "start_index": 118,
                    "end_index": 138
                }
            ]
        }
    ]
    
  • test_data: (List(Dict)): a list dictionary containing contexts and list of entities in each context

    [
        {
            "context": "The project proponents told the Australian Financial Review in December that year that they had been able to demonstrate that the market for backpacker tourism was less affected by these events and that they intended to apply for an air operator 's certificate in January 2004",
            "entities": [
                {
                    "entity_value": "Australian Financial Review",
                    "entity_type": "organization",
                    "start_index": 32,
                    "end_index": 59
                }
            ]
        }
    ]
    

2. Import tfm_ner module in jaseci

  1. Open terminal and run jaseci by cmd
    jsctl -m
    
  2. Load module tfm_ner in jac by cmd
    actions load module jaseci_ai_kit.tfm_ner
    

3. Model training and validation

For this tutorial we are going to train the model on train dataset and validate the model on validation dataset and finaly we test model on the test dataset.

1. Creating Jac Program

  1. Create a file (main.jac)

  2. Create node model_dir and tfm_ner in main.jac

    node model_dir;
    node tfm_ner {}
    
  3. Initializing node tfm_ner and adding abilty:- train, infer

    Here we are importing ability to train and infer model.

    node tfm_ner {
        # train, infer
        can tfm_ner.train, tfm_ner.extract_entity, tfm_ner.load_model, tfm_ner.save_model, tfm_ner.get_train_config, tfm_ner.set_train_config;
    }
    
    
  4. Initializing module for train and validation inside node tfm_ner In this step we are initializing training and validation of tfm_ner model. It will take 5 parameter train, test, val (file contain list of dict) and mode and epochs

    # train and validation model
    can train_and_val with train_and_val_tfm entry {
        train_data = file.load_json(visitor.train_file);
        val_data = file.load_json(visitor.val_file);
        test_data = file.load_json(visitor.test_file);
        std.out("corpus : ",train_data.length," train + ",val_data.length," val +",test_data.length," test sentences");
        tfm_ner.train(
            mode = visitor.mode,
            epochs = visitor.num_train_epochs.int,
            train_data = train_data,
            dev_data = val_data,
            test_data = test_data
            );
        std.out("training and validation done ");
        }
    
  5. Adding module for infer entity inside node tfm_ner

    Infer module will take text input and return entities list.

    # infer entity from text
    can infer with predict_entity_from_tfm entry {
        report tfm_ner.extract_entity(
            text = visitor.text
        );
    }
    
  6. Adding edge name of ner_model in main.jac file for connecting nodes inside graph.

    # adding edge
    edge ner_model {
        has model_type;
    }
    
  7. Adding graph name of ner_val_graph for initializing node .

    # Adding Graph
    graph ner_val_graph {
        has anchor ner_model_dir;
        spawn {
            ner_model_dir = spawn node::model_dir;
            tfm_ner_node = spawn node::tfm_ner;
            ner_model_dir -[ner_model(model_type="tfm_ner")]-> tfm_ner_node;
        }
    }
    
  8. Initializing walker init for calling graph

    walker init {
        root {
        spawn here --> graph::ner_val_graph;
        }
    }
    
  9. Creating walker name of train_and_val_tfm for getting parameter from context and calling ability training and validation model.

    # creating walker
    walker train_and_val_tfm {
        has train_file;
        has val_file;
        has test_file;
        has num_train_epochs;
        has mode;
    
        # Train NER models on the train set
        # and validate them on the val set (Optional)
        # and test them on the test set (Optional)
    
        # report accuracy performance of NER model on inside dir creating on place of current path `main.jac` file "train/logs/"
        root {
            take --> node::model_dir;
        }
        model_dir {
            take -->;
        }
    }
    
  10. Creating a walker name of predict_entity_from_tfm for getting new text as input from context and infer entities.

    walker predict_entity_from_tfm{
        has text;
    
        root {
            take --> node::model_dir;
        }
        model_dir {
            take -->;
        }
    }
    
  • Final jac programm (main.jac)

    node model_dir;
    node tfm_ner {
        # train,infer
        can tfm_ner.extract_entity, tfm_ner.train;
    
        # extracting entities
        can infer with predict_entity_from_tfm entry {
            report tfm_ner.extract_entity(
                text = visitor.text
            );
        }
    
        ## train and validate
        can train_and_val with train_and_val_tfm entry {
    
            train_data = file.load_json(visitor.train_file);
            val_data = file.load_json(visitor.val_file);
            test_data = file.load_json(visitor.test_file);
            std.out("corpus : ",train_data.length," train + ",val_data.length," dev +",test_data.length," test sentences");
            tfm_ner.train(
                mode = visitor.mode,
                epochs = visitor.num_train_epochs.int,
                train_data = train_data,
                dev_data = val_data,
                test_data = test_data
                );
            std.out("training and validation done ");
            }
    }
    
    
    edge ner_model {
        has model_type;
    }
    
    graph ner_val_graph {
        has anchor ner_model_dir;
        spawn {
            ner_model_dir = spawn node::model_dir;
            tfm_ner_node = spawn node::tfm_ner;
            ner_model_dir -[ner_model(model_type="tfm_ner")]-> tfm_ner_node;
        }
    }
    
    
    walker init {
        root {
        spawn here --> graph::ner_val_graph;
        }
    }
    
    ## creating walker
    walker train_and_val_tfm {
        has train_file;
        has val_file;
        has test_file;
        has num_train_epochs;
        has mode;
    
        # Train all NER models on the train set
        # and validate them on the val set
        # report accuracy performance across all NER models
        root {
            take --> node::model_dir;
        }
        model_dir {
            take -->;
        }
    }
    
    walker predict_entity_from_tfm{
        has text;
    
        root {
            take --> node::model_dir;
        }
        model_dir {
            take -->;
        }
    }
    
    

2. Steps for running main.jac file and train and validate tfm_ner model

  1. Build main.jac by run:

    jac build main.jac
    
  2. Run the following command to Activate sentinal:

    sentinel set -snt active:sentinel -mode ir main.jir
    
  3. Create Training Input

    • mode: (String): mode for training the model, available options are : * default: train the model from scratch. * incremental: providing more training data for current set of entities. * append: changing the number of entities (model is restarted and trained with all of traindata).
    • epochs : (int): Number of epoch you want model to train.
    • train_file: list[dict] train data file name.
    • val_file: list[dict] validation data file name.
    • test_file: list[dict] test data file name.
  4. Run the following command to execute walker train_and_val_tfm

    walker run train_and_val_tfm -ctx "{\"train_file\":\"dataset/train.json\",\"val_file\":\"dataset/dev.json\",\"test_file\":\"dataset/test.json\",\"num_train_epochs\":\"50\",\"mode\":\"default\"}"
    
  5. You'll find the following logs in train folder.

    2022-06-06 11:23:46.832007    Training epoch: 1/50
    2022-06-06 11:23:46.847969    Training loss per 100 training steps: 0.9243220090866089
    2022-06-06 11:23:47.186904    Training loss epoch: 0.8817697350795453
    2022-06-06 11:23:47.191905    Training accuracy epoch: 0.11538461538461539
    2022-06-06 11:23:47.330845    Validation loss epoch: 0.7973677378434402
    2022-06-06 11:23:47.336076    Validation accuracy epoch: 0.038461538461538464
    2022-06-06 11:23:47.442199    Epoch 1 total time taken : 0:00:00.610192
    2022-06-06 11:23:47.448885    ------------------------------------------------------------
    2022-06-06 11:23:47.456886    Training epoch: 2/50
    2022-06-06 11:23:47.474848    Training loss per 100 training steps: 0.8383979797363281
    2022-06-06 11:23:47.814906    Training loss epoch: 0.7714704366830679
    2022-06-06 11:23:47.820939    Training accuracy epoch: 0.023076923076923078
    2022-06-06 11:23:47.969910    Validation loss epoch: 0.70741940003175
    2022-06-06 11:23:47.976287    Validation accuracy epoch: 0.0
    2022-06-06 11:23:48.075297    Epoch 2 total time taken : 0:00:00.618411
    2022-06-06 11:23:48.081293    ------------------------------------------------------------
    ............
    ............
    ............
    2022-06-01 07:07:56.382809     Training epoch: 50/50
    2022-06-01 07:08:22.558677     Training loss epoch: 0.06641783774011399
    2022-06-01 07:08:22.558712     Training accuracy epoch: 0.9109369783381449
    2022-06-01 07:08:22.558778     evaluation loss epoch: 0.16095292149111629
    2022-06-01 07:08:22.558790     evaluation accuracy epoch: 0.8269142243363511
    2022-06-01 07:08:22.647852     Epoch 50 total time taken : 0:00:26.265050
    2022-06-01 07:08:22.647874     ------------------------------------------------------------
    2022-06-01 07:08:22.797730     Model Training is Completed
    2022-06-01 07:08:22.797769     ------------------------------------------------------------
    2022-06-01 07:08:22.797779     Total time taken to completed training :  0:22:06.770898
    2022-06-01 07:08:22.797795     ------------------------------------------------------------
    2022-06-01 07:08:22.797807     Model testing is started
    2022-06-01 07:08:22.797819     ------------------------------------------------------------
    2022-06-01 07:08:27.092534     f1_score(macro) : 0.6822521889259535
    2022-06-01 07:08:27.092573     Accuracy : 0.7892490767336889
    2022-06-01 07:08:27.092581     Classification Report
    2022-06-01 07:08:27.092584     ------------------------------------------------------------
                precision    recall  f1-score   support
    
            O       0.00      0.00      0.00         0
        I-PER       0.91      0.90      0.91      2496
        I-ORG       0.82      0.68      0.74      1018
        I-MISC      0.84      0.54      0.66       236
        I-LOC       0.66      0.64      0.65       252
        B-PER       0.84      0.85      0.85      2714
        B-ORG       0.83      0.71      0.77      2513
        B-MISC      0.82      0.64      0.72       991
        B-LOC       0.85      0.84      0.85      1965
    
     accuracy                           0.79     12185
    macro avg       0.73      0.65      0.68     12185
    weighted avg    0.85      0.79      0.82     12185
    
    2022-06-01 07:08:27.092694     ------------------------------------------------------------
    2022-06-01 07:08:27.092707     Total time taken to completed testing :  0:00:04.294899
    2022-06-01 07:08:27.092722     ------------------------------------------------------------
    
    

4. Predicting Entities

  • For predicting entities we are going execute walker predict_entity_from_tfm and provide text input in context and will get output list of entities available in text-data. Run the following command to execute walker predict_entity_from_tfm

    walker run predict_entity_from_tfm -ctx "{\"text\":\"It was the second costly blunder by Syria in four minute\"}"
    
  • After executing walker predict_entity_from_tfm , will get predicted entities list as

    [
        {
            "entity_value": "syria",
            "entity_type": "B-LOC",
            "score": 0.8613966703414917,
            "start_index": 36,
            "end_index": 41
        }
    ]
    

5. Experiments and methodology

Let us further look into our model training and evaluation methodology

Evaluation of tfm_ner with tiny-bert and bert

We are evaluating tiny-bert and bert-base-uncased model on following datasets.

Dataset description

We are using two different dataset.

1. CONLL2003 dataset is a named entity recognition dataset released as a part of CoNLL-2003 shared task. CoNLL-2003 dataset have 4 labels(PER, ORG, LOC, MISC).

2. FEW-NERD dataset is a large-scale, fine-grained manually annotated named entity recognition dataset, which contains 8 coarse-grained(Major) types, 66 fine-grained(All) types labels.

Dataset details

Dataset Nametrain datasetvalidation datasettest dataset
Conll200314,0413,2503,453
FEW-NERD (SUP)131,76718,82437,648

Training methodology

For training model we are using pytorch model from huggingface for token classification .

  • Default training parameter are following.

    "MAX_LEN": 128,
    "TRAIN_BATCH_SIZE": 64,
    "VALID_BATCH_SIZE": 32,
    "EPOCHS": 50,
    "LEARNING_RATE": 2e-05,
    "MAX_GRAD_NORM": 10,
    "MODE": "default"
    

    Machine description for training model

    • RAM : 32GB
    • GPU : TESLA T4
    • Memory GPU : 16GB

    Results

    Training on sample data from FEW-NERD(SUP) Dataset on major labels

    | Model_Name | Evaluation_Accuracy | Test_Accuracy | Test F1_Score | Time Taken(avg) | | ------------------- | ------------------- | ------------- | ------------- | --------------- | | bert-base-uncased | 0.6910 | 0.6888 | 0.6353 | 2HR+20MIN | | prajjwal1/bert-tiny | 0.2184 | 0.2182 | 0.2090 | 26MIN |

    Training on sample data from FEW-NERD(SUP) Dataset

    | Model_Name | Evaluation_Accuracy | Test_Accuracy | Test F1_Score | Time Taken(avg) | | ------------------- | ------------------- | ------------- | ------------- | --------------- | | bert-base-uncased | 0.5841 | 0.5830 | 0.5702 | 3HR+36MIN | | prajjwal1/bert-tiny | 0.1900 | 0.1894 | 0.0546 | 35MIN |

    Training on CONLL2003 dataset

    | Model_Name | Evaluation_Accuracy | Test_Accuracy | Test F1_Score | Time Taken(avg) | | ------------------- | ------------------- | ------------- | ------------- | --------------- | | bert-base-uncased | 0.9585 | 0.9859 | 0.8100 | 3HR+45MIN | | prajjwal1/bert-tiny | 0.8269 | 0.7892 | 0.6823 | 35MIN |

    After comparing these results we will get the insight from this

    • Number of labels upto 4

      • We prefare training time, we need to go with small model e.g. tiny-bert
      • We are focussing higher accuracy result, we need to go with bigger model e.g. bert-base-uncased
    • Number of labels greater then 4

      • We need to go with bigger model so we will get higher accuracy and f1-score e.g. bert-base-uncased

USE Encoder (use_enc)

Module use_enc uses the universal sentence encoder to generate sentence level embeddings. The sentence level embeddings can then be used to calculate the similarity between two given text via cosine similarity and/or dot product.

For this tutorial we are going to leverage the Use encoder for Zero-shot text classification.

  1. Preparing dataset for evaluation
  2. Import Use-Encoder(use_enc) module in jac
  3. Evaluate the models effectiveness

Walk through

1. Praparing dataset

For this tutorial, we are going to leverage the use_encoder for text classification, which is categorizing an incoming text into a one of predefined class. for demonstration purpose, we are going to use the SNIPS dataset as an example here. snips dataset.

SNIPS is a popular intent classificawtion datasets that covers intents such as [ "BookRestaurant", "ComparePlaces", "GetDirections", "GetPlaceDetails", "GetTrafficInformation", "GetWeather", "RequestRide", "SearchPlace", "ShareCurrentLocation", "ShareETA" ] We need to do a little data format conversion to create a version of SNIPS that work with our use_encoder implemenation. For this part, we are going to use Python. First,

  1. Import the dataset from huggingface dataset library.

    # import library
    from datasets import load_dataset
    # load dataset
    dataset = load_dataset("snips_built_in_intents")
    print(dataset["train"][:2])
    

    If imported successsfuly, you should see the data format to be something like this in output

    {"text": ["Share my location with Hillary's sister", "Send my current location to my father"], "label": [5, 5]}

  2. Converting the format from the SNIPS out of the box to the format that can be ingested by use_encoder.

    import pandas as pd
    from sklearn.model_selection import train_test_split
    import json
    
    # get labels names
    lab = dataset["train"].features["label"].names
    # create labels dictionary
    label_dict = {v: k for v, k in enumerate(lab)}
    # dataset
    dataset = dataset["train"]
    
    # create dataset function
    def CreateData(data):
        # Create dataframe
        df = pd.DataFrame(data)
        # Map labels dict on label column
        df["label"] = df["label"].apply(lambda x : label_dict[x])
        # grouping text on basis of label
        df = df.groupby("label").agg({"text": "\t".join, "label": "\t".join})
        df["label"] = df["label"].apply(lambda x: x.split("\t")[0])
    
        # Create data dictionary
        data_dict = {}
        for i in range(len(df)):
            data_dict[df["label"][i]] = df["text"][i].split("\t")
        return data_dict
    # Split dataset: Create test dataset and store in json file `test.json` and save to disk.
    _, test = train_test_split(dataset, test_size=0.2, random_state=42)
    
    # Create test dataset
    test_data = CreateData(test)
    
    data = []
    classes = []
    for itms in test_data:
        if itms not in classes:
            classes.append(itms)
        for text in test_data[itms]:
            data.append({
                "text": text,
                "class":itms
                })
    test_dataset = {"text":data, "classes":classes}
    # write data in json file 'test.json'
    with open("test.json", "w", encoding="utf8") as f:
            f.write(json.dumps(test_dataset, indent = 4))
    

    The resulting format should look something like this.

    • test.json

      {
      	"text": [
      		{
      			"text": "Book a table at Galli for 6 people tonight",
      			"class": "BookRestaurant"
      		},
      		{
      			"text": "What's the best hotel between Soho Grand and Paramount Hotel?",
      			"class": "ComparePlaces"
      		},
      		{
      			"text": "Give me transit directions from Grand Central to Brooklyn bridge",
      			"class": "GetDirections"
      		},
      		{
      			"text": "What's today's menu at The Water Club?",
      			"class": "GetPlaceDetails"
      		},
      		{
      			"text": "Should I expect traffic from here to Kennedy international airport?",
      			"class": "GetTrafficInformation"
      		},
      		{
      			"text": "What will the weather be like tomorrow morning?",
      			"class": "GetWeather"
      		},
      		{
      			"text": "I need an Uber right now",
      			"class": "RequestRide"
      		},
      		{
      			"text": "Find me the closest theatre for tonight",
      			"class": "SearchPlace"
      		},
      		{
      			"text": "Share my location to mum until I get to school",
      			"class": "ShareCurrentLocation"
      		},
      		{
      			"text": "Send my time of arrival to Nina",
      			"class": "ShareETA"
      		}
      	],
      	"classes": [
      		"BookRestaurant",
      		"ComparePlaces",
      		"GetDirections",
      		"GetPlaceDetails",
      		"GetTrafficInformation",
      		"GetWeather",
      		"RequestRide",
      		"SearchPlace",
      		"ShareCurrentLocation",
      		"ShareETA"
      	]
      }
      

2. Import USE-Encoder(use_enc) module in jac

  1. Open terminal and run jaseci by follow command.

    jsctl -m
    
  2. Load use_enc module in jac by command

    actions load module jaseci_ai_kit.use_enc
    

3. Evaluate the models effectiveness

For this tutorial, we are going to evaluation of text classification with use_encoder for intent classification its tested on snips test dataset, which is categorizing an incoming text into a one of predefined class.

  • Creating Jac Program (evaluation use_enc)

    1. Create a file by name use_enc.jac

    2. Create node model_dir and use_encoder in use_enc.jac file

      node model_dir;
      node use_encoder {};
      
    3. Initializing node use_encoder and import text classify train and infer ability inside node.

      # import ability
      can use.text_similarity, use.text_classify;
      
    4. Initialize module eval_text_classification inside use_encoder node.

      # classify text amd evaluate text classification
      can eval_text_classification with eval_text_classification entry{
          test_data = file.load_json(visitor.test_file);
          classes = test_data["classes"];
          result = [];
          for itm in test_data["text"]{
              text = itm["text"];
              class_true = itm["class"];
              resp = use.text_classify(
                  text = text,
                  classes = classes.list
                  );
              result.list::append({"text":text,"class_true":class_true,"class_pred":resp["match"]});
          }
          fn = "result_use_enc.json";
          file.dump_json(fn, result);
      }
      

      Parameter details

      • Input:
        • text (string): text to classify
        • classes (list of strings): candidate classification classes
      • output:
        • dict will contain matching class ,match index and score

      for the evaluation we are passing here test data file e.g. test.json for evaluation.

    5. Adding edge name of use_model in use_enc.jac file for connecting nodes inside graph.

      # adding edge
      edge use_model {
          has model_type;
      }
      
    6. Adding graph name of use_encoder_graph for initializing node .

      graph use_encoder_graph {
          has anchor use_model_dir;
          spawn {
              use_model_dir = spawn node::model_dir;
              use_encoder_node = spawn node::use_encoder;
              use_model_dir -[use_model(model_type="use_encoder")]-> use_encoder_node;
          }
      }
      
    7. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::use_encoder_graph;
          }
      }
      
    8. Creating walker name of eval_text_classification for getting parameter from context or default and calling ability text_classify.

      # Declaring the walker:
      walker eval_text_classification{
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final use_enc.jac program

      node model_dir;
      node use_encoder{
          # import all module ability
          can use.text_classify;
      
          # evaluate text classification
          can eval_text_classification with eval_text_classification entry{
              test_data = file.load_json(visitor.test_file);
              classes = test_data["classes"];
              result = [];
              for itm in test_data["text"]{
                  text = itm["text"];
                  class_true = itm["class"];
                  resp = use.text_classify(
                      text = text,
                      classes = classes.list
                      );
                  result.list::append({"text":text,"class_true":class_true,"class_pred":resp["match"]});
              }
              fn = "result_use_enc.json";
              file.dump_json(fn, result);
          }
      }
      
      # adding edge
      edge use_model {
          has model_type;
      }
      
      # creating graph
      graph use_encoder_graph {
          has anchor use_model_dir;
          spawn {
              use_model_dir = spawn node::model_dir;
              use_encoder_node = spawn node::use_encoder;
              use_model_dir -[use_model(model_type="use_encoder")]-> use_encoder_node;
          }
      }
      
      # initialize init walker
      walker init {
          root {
          spawn here --> graph::use_encoder_graph;
          }
      }
      
      
      # Declaring the walker fro calling :
      walker eval_text_classification{
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

    Steps for running use_enc.jac program

    • Execute the follow command for Build use_enc.jac

      jac build use_enc.jac
      
    • Execute the follow command to Activate sentinal

      sentinel set -snt active:sentinel -mode ir use_enc.jir
      

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir use_enc.jir

    • Execute the walker eval_text_classification with default parameter for evaluation use_enc module by following command

      walker run eval_text_classification
      

    After executing walker eval_text_classification result data will store in file result_use_enc.json in your current local path.

    Evaluation Result

    Evaluation accuracy score        :  0.0303
    Evaluation F1_score              :  0.0495
    
    Evaluation classification_report :
    
                            precision    recall  f1-score   support
    
            BookRestaurant       0.00      0.00      0.00        17
             ComparePlaces       0.50      0.33      0.40         3
             GetDirections       0.06      0.33      0.10         3
           GetPlaceDetails       0.00      0.00      0.00        13
     GetTrafficInformation       0.00      0.00      0.00         1
                GetWeather       0.00      0.00      0.00        13
               RequestRide       0.00      0.00      0.00         2
               SearchPlace       0.00      0.00      0.00         5
      ShareCurrentLocation       0.00      0.00      0.00         4
                  ShareETA       0.00      0.00      0.00         5
    
                  accuracy                           0.03        66
                 macro avg       0.06      0.07      0.05        66
              weighted avg       0.03      0.03      0.02        66
    

USE QA (use_qa)

Module use_qa uses the multilingual-qa to generate sentence level embeddings. The sentence level embeddings can then be used to calculate best match between question and available answers via cosine similarity and/or dist_score.

For this tutorial we are going to leverage the USE QA for text classification.

  1. Preparing dataset for evaluation
  2. Import Use-QA(use_qa) module in jac
  3. Evaluate the models effectiveness

Walk through

1. Praparing dataset

For this tutorial, we are going to leverage the use_qa for text classification, which is categorizing an incoming text into a one of predefined class. for demonstration purpose, we are going to use the SNIPS dataset as an example here. snips dataset.

SNIPS is a popular intent classificawtion datasets that covers intents such as [ "BookRestaurant", "ComparePlaces", "GetDirections", "GetPlaceDetails", "GetTrafficInformation", "GetWeather", "RequestRide", "SearchPlace", "ShareCurrentLocation", "ShareETA" ] We need to do a little data format conversion to create a version of SNIPS that work with our use_qa implemenation. For this part, we are going to use Python. First,

  1. Import the dataset from huggingface dataset library.

    # import library
    from datasets import load_dataset
    # load dataset
    dataset = load_dataset("snips_built_in_intents")
    print(dataset["train"][:2])
    

    If imported successsfuly, you should see the data format to be something like this

    {"text": ["Share my location with Hillary's sister", "Send my current location to my father"], "label": [5, 5]}

  2. Converting the format from the SNIPS out of the box to the format that can be ingested by use_encoder.

    import pandas as pd
    from sklearn.model_selection import train_test_split
    import json
    
    # get labels names
    lab = dataset["train"].features["label"].names
    # create labels dictionary
    label_dict = {v: k for v, k in enumerate(lab)}
    # dataset
    dataset = dataset["train"]
    
    # create dataset function
    def CreateData(data):
        # Create dataframe
        df = pd.DataFrame(data)
        # Map labels dict on label column
        df["label"] = df["label"].apply(lambda x : label_dict[x])
        # grouping text on basis of label
        df = df.groupby("label").agg({"text": "\t".join, "label": "\t".join})
        df["label"] = df["label"].apply(lambda x: x.split("\t")[0])
    
        # Create data dictionary
        data_dict = {}
        for i in range(len(df)):
            data_dict[df["label"][i]] = df["text"][i].split("\t")
        return data_dict
    # Split dataset: Create test dataset and store in json file `test.json` and save to disk.
    _, test = train_test_split(dataset, test_size=0.2, random_state=42)
    
    # Create test dataset
    test_data = CreateData(test)
    
    data = []
    classes = []
    for itms in test_data:
        if itms not in classes:
            classes.append(itms)
        for text in test_data[itms]:
            data.append({
                "text": text,
                "class":itms
                })
    test_dataset = {"text":data, "classes":classes}
    # write data in json file 'test.json'
    with open("test.json", "w", encoding="utf8") as f:
            f.write(json.dumps(test_dataset, indent = 4))
    

    The resulting format should look something like this.

    • test.json

      {
      	"text": [
      		{
      			"text": "Book a table at Galli for 6 people tonight",
      			"class": "BookRestaurant"
      		},
      		{
      			"text": "What's the best hotel between Soho Grand and Paramount Hotel?",
      			"class": "ComparePlaces"
      		},
      		{
      			"text": "Give me transit directions from Grand Central to Brooklyn bridge",
      			"class": "GetDirections"
      		},
      		{
      			"text": "What's today's menu at The Water Club?",
      			"class": "GetPlaceDetails"
      		},
      		{
      			"text": "Should I expect traffic from here to Kennedy international airport?",
      			"class": "GetTrafficInformation"
      		},
      		{
      			"text": "What will the weather be like tomorrow morning?",
      			"class": "GetWeather"
      		},
      		{
      			"text": "I need an Uber right now",
      			"class": "RequestRide"
      		},
      		{
      			"text": "Find me the closest theatre for tonight",
      			"class": "SearchPlace"
      		},
      		{
      			"text": "Share my location to mum until I get to school",
      			"class": "ShareCurrentLocation"
      		},
      		{
      			"text": "Send my time of arrival to Nina",
      			"class": "ShareETA"
      		}
      	],
      	"classes": [
      		"BookRestaurant",
      		"ComparePlaces",
      		"GetDirections",
      		"GetPlaceDetails",
      		"GetTrafficInformation",
      		"GetWeather",
      		"RequestRide",
      		"SearchPlace",
      		"ShareCurrentLocation",
      		"ShareETA"
      	]
      }
      

2. Import USE-QA(use_qa) module in jac

  1. Open terminal and run jaseci by follow command.

    jsctl -m
    
  2. Load use_qa module in jac by command

    actions load module jaseci_ai_kit.use_qa
    

3. Evaluate the models effectiveness

For this tutorial, we are going to evaluation of text classification with use_qa for intent classification its tested on snips test dataset, which is categorizing an incoming text into a one of predefined class.

  • Creating Jac Program (evaluation use_enc)

    1. Create a file by name use_qa.jac

    2. Create node model_dir and use_encoder in use_qa.jac file

      node model_dir;
      node use_qa {};
      
    3. Initializing node use_qa and import qa_classify ability inside node.

      # import ability
      can  use.qa_classify;
      
    4. Initialize module eval_text_classification inside use_qa node.

      # evaluate text classification
      can eval_text_classification with eval_text_classification entry{
          test_data = file.load_json(visitor.test_file);
          // std.out(test_data);
          classes = test_data["classes"];
          result = [];
          for itm in test_data["text"]{
              text = itm["text"];
              class_true = itm["class"];
              resp = use.qa_classify(
                  text = text,
                  classes = classes.list
                  );
              result.list::append({"text":text,"class_true":class_true,"class_pred":resp["match"]});
          }
          fn = "result_use_qa.json";
          file.dump_json(fn, result);
      }
      

      Parameter details

      • Input:
        • text (string): text to classify
        • classes (list of strings): candidate classification classes
      • output:
        • dict will contain matching class ,match index and score

      for the evaluation we are passing here test data file e.g. test.json for evaluation.

    5. Adding edge name of use_model in use_qa.jac file for connecting nodes inside graph.

      # adding edge
      edge use_model {
          has model_type;
      }
      
    6. Adding graph name of use_qa_graph for initializing node .

      graph use_qa_graph {
          has anchor use_model_dir;
          spawn {
              use_model_dir = spawn node::model_dir;
              use_qa_node = spawn node::use_qa;
              use_model_dir -[use_model(model_type="use_qa")]-> use_qa_node;
          }
      }
      
    7. Initializing walker init for calling graph

      walker init {
          root {
          spawn here --> graph::use_qa_graph;
          }
      }
      
    8. Creating walker name of eval_text_classification for getting parameter from context or default and calling ability text_classify.

      # Declaring the walker:
      walker eval_text_classification{
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

      Final use_enc.jac program

      node model_dir;
      node use_encoder{
          # import all module ability
          can use.text_classify;
      
          # evaluate text classification
          can eval_text_classification with eval_text_classification entry{
              test_data = file.load_json(visitor.test_file);
              classes = test_data["classes"];
              result = [];
              for itm in test_data["text"]{
                  text = itm["text"];
                  class_true = itm["class"];
                  resp = use.text_classify(
                      text = text,
                      classes = classes.list
                      );
                  result.list::append({"text":text,"class_true":class_true,"class_pred":resp["match"]});
              }
              fn = "result_use_qa.json";
              file.dump_json(fn, result);
          }
      }
      
      # adding edge
      edge use_model {
          has model_type;
      }
      
      # creating graph
      graph use_encoder_graph {
          has anchor use_model_dir;
          spawn {
              use_model_dir = spawn node::model_dir;
              use_encoder_node = spawn node::use_encoder;
              use_model_dir -[use_model(model_type="use_encoder")]-> use_encoder_node;
          }
      }
      
      # initialize init walker
      walker init {
          root {
          spawn here --> graph::use_encoder_graph;
          }
      }
      
      
      # Declaring the walker fro calling :
      walker eval_text_classification{
          has test_file="test.json";
          root {
              take --> node::model_dir;
          }
          model_dir {
              take -->;
          }
      }
      

    Steps for running use_qa.jac program

    • Execute the follow command for Build use_qa.jac

      jac build use_qa.jac
      
    • Execute the follow command to Activate sentinal

      sentinel set -snt active:sentinel -mode ir use_qa.jir
      

      Note: If getting error ValueError: badly formed hexadecimal UUID string execute only once

      sentinel register -set_active true -mode ir use_qa.jir

    • Execute the walker eval_text_classification with default parameter for evaluation use_qa module by following command

      walker run eval_text_classification
      

    After executing walker eval_text_classification result data will store in file result_use_qa.json in your current local path.

    Evaluation Result

        Evaluation accuracy score        :  0.7424
        Evaluation F1_score              :  0.6503
    
        Evaluation classification_report :
    
                            precision    recall  f1-score   support
    
            BookRestaurant       0.74      1.00      0.85        17
             ComparePlaces       1.00      0.67      0.80         3
             GetDirections       1.00      0.67      0.80         3
           GetPlaceDetails       1.00      0.23      0.38        13
     GetTrafficInformation       0.00      0.00      0.00         1
                GetWeather       0.87      1.00      0.93        13
               RequestRide       0.67      1.00      0.80         2
               SearchPlace       0.29      0.40      0.33         5
      ShareCurrentLocation       0.57      1.00      0.73         4
                  ShareETA       1.00      0.80      0.89         5
    
                  accuracy                           0.74        66
                 macro avg       0.71      0.68      0.65        66
              weighted avg       0.80      0.74      0.71        66
    

Personalized Head (PH) Module

What is Personalized Head: Using the Personalized Head module, you can create a custom model head which you can train over the time. You can use your own custom models and datasets to create a personalized head with just using a configuration file and a python file.

Personalized Head Architecture

PH Actions: PH Jaseci Actions

How Inferecing Works: How Inferecing Works

Recommended way of Using PH Head in your App: Recommended Usage of PH

How to Use

1. Using the 'Personalized Head' as a Transformer Head

Default Model for the Personalized Head is a Transformer Head. You can inference the transformer head with just using a single configuration file. For training the model, you need to create a python file which contains the torch.utils.data.Dataset inherited class. But in the following example we will use the inbuilt SnipsDataloader class.

1.1. Creating a Configuration File

# PATH: ./config.yaml

# Inference Configuration
Inference:
  postprocess:
    args:
      to_list: true # converts the output to list as Jaseci doesnt support tensor
    type: PersonalizedHeadPostProcessor
  preprocess:
    type: PersonalizedHeadPreProcessor
  weights: '' # if you want to use a pretrained weights, you can specify the path here

# Model Configuration
Model:
  args:
    batch_first: true
    embedding_length: 768
    n_classes: 10
    ph_ff_dim: 512
    ph_nhead: 8
    ph_nlayers: 1
  type: PersonalizedHead

# Training Configuration (Optional: If you want to train the model)
Trainer:
  dataloader:
    args:
      batch_size: 32
      num_workers: 1
      shuffle: true
      train_json: train.json
      validation_split: 0.2
    type: SnipsDataLoader #
  loss: nll_loss
  lr_scheduler:
    args:
      gamma: 0.1
      step_size: 50
    type: StepLR
  metrics:
  - accuracy
  - top_k_acc
  n_gpu: 1
  name: PersonalizedHeadTrainer
  optimizer:
    args:
      amsgrad: true
      lr: 0.001
      weight_decay: 0
    type: Adam
  trainer:
    early_stop: 10
    epochs: 100
    monitor: min val_loss
    save_dir: saved_models
    save_period: 1
    tensorboard: true
    verbosity: 2

1.2. Create your JAC Program

# Path: ./main.jac

walker identify_intent{
  has input_text;
  can ph.create_head_list, ph.create_head, ph.predict, ph.train_head, ph.load_weights;

  root {
      #creating a head
      ph.create_head_list(config_file='config.yaml'); # creates a head list (Only need to done once)
      uid = ph.create_head(uuid='ph');
      pred = ph.predict(uuid=uid, data=input_text);
      report pred;

      # training the head
      ph.train_head(config_file='config.yaml',uuid='ph');

      #loading the trained weights
      ph.load_weights(uuid=uid, weights_file='saved/models/PersonalizedHeadTrainer/ph/best_model.pt');
      pred = ph.predict(uuid=uid, data=input_text);
      report pred;

  }
}

walker init {
  has input_text;
  root {
      spawn here walker::identify_intent(input_text=input_text);
  }
}

1.4. Running your JAC Program

  • Open the terminal and run Jaseci Command Line Tool using the command below.
jsctl -m
  • Load the 'personalized_head' module using the command below.
actions load module jaseci_kit.ph
  • Run the JAC program using the command below.
jac run main.jac -ctx '{"input_text": "I want to order a pizza"}'

2. Using the 'Personalized Head' as a Custom Model

You can use the personalized head as a full custom model other than being a transformer head of a model. For an example you can create your own YOLO model, that you can train and inference. Follow the steps below to use the personalized head as a standalone module with custom model. Example shows how to use the personalized head as a MNIST Classification model.

2.1. Creating Custom Python Model

Python File contains the torch.nn.Module class which is the model. You can use any model you want. and a torch.utils.data.Dataset class which is the dataset. You can use any dataset you want. and Preprocessor and Postprocessor classes which are used in inferencing. You can use any preprocessor and postprocessor you want but with the same method format. follow it to create your custom python model.

# Path: ./user_input.py
import torch
from torch.nn import functional as F
from torchvision import datasets, transforms
import os
import PIL

class MnistModel(torch.nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = torch.nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = torch.nn.Dropout2d()
        self.fc1 = torch.nn.Linear(320, 50)
        self.fc2 = torch.nn.Linear(50, num_classes)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


class MnistDataset(torch.utils.data.Dataset):
    def __init__(self, data_dir):
        trsfm = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])
        self.data_dir = data_dir
        os.makedirs(self.data_dir, exist_ok=True)
        self.dataset = datasets.MNIST(
            self.data_dir, train=True, download=True, transform=trsfm)

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        return self.dataset[idx]


class MnistPreProcessor:
    def __init__(self):
        self.trsfm = transforms.Compose([
            transforms.Grayscale(),
            transforms.Resize((28, 28)),
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])

    def process(self, x):
        img = PIL.Image.open(x)
        return self.trsfm(img)


class MnistPostProcessor:
    def __init__(self):
        pass

    def process(self, x):
        x = x.argmax(dim=1)
        x = x.detach().cpu().numpy()[0]
        return x.tolist()

2.2. Creating a Configuration File

Configuration file is YAML file which contains the information about the model and training parameters. You can follow this example yaml file as a reference to create your own configuration file.

#Path: ./config.yaml
# Inference Configuration
Inference:
  postprocess:
    type: CustomProcessor
    args:
      python_file: user_input.py
      module_name: MnistPostProcessor
  preprocess:
    type: CustomProcessor
    args:
      python_file: user_input.py
      module_name: MnistPreProcessor
  weights: ""

# Custom Model Configuration
Model:
  args:
    python_file: user_input.py
    module_name: MnistModel
    num_classes: 10
  type: CustomModel

# Training Configuration (Optional: If you want to train the model)
Trainer:
  name: MnistTrainer
  dataloader:
    args:
      python_file: user_input.py
      module_name: MnistDataset
      batch_size: 32
      data_dir: data/
      num_workers: 1
      shuffle: true
      validation_split: 0.2
    type: CustomDataLoader
  loss: nll_loss
  lr_scheduler:
    args:
      gamma: 0.1
      step_size: 50
    type: StepLR
  metrics:
    - accuracy
    - top_k_acc
  n_gpu: 1
  optimizer:
    args:
      amsgrad: true
      lr: 0.001
      weight_decay: 0
    type: Adam
  trainer:
    early_stop: 10
    epochs: 100
    monitor: min val_loss
    save_dir: saved/
    save_period: 1
    tensorboard: true
    verbosity: 2

2.3. Create your JAC program

# Path: ./main.jac
walker identify_number {
  has input_image;
  can ph.create_head, ph.predict, ph.train_head, ph.load_weights;

  root {
      #creating a head
      uid = ph.create_head(config_file='config.yaml', uuid='mnist');
      pred = ph.predict(uuid=uid, data=input_image);
      report pred;

      # training the head
      ph.train_head(config_file='config.yaml',uuid='mnist');

      #loading the trained weights
      ph.load_weights(uuid=uid, weights_file='saved/models/MnistTrainer/mnist/best_model.pt');
      pred = ph.predict(uuid=uid, data=input_image);
      report pred;

  }
}

walker init {
  has input_image;
  has output;

  root {
      spawn here walker::identify_number(input_image=input_image);
  }
}

1.4. Import 'personalized_head' module in jaseci

  • Open the terminal and run Jaseci Command Line Tool using the command below.
jsctl -m
  • Load the 'personalized_head' module using the command below.
actions load module jaseci_kit.ph
  • Run the JAC program using the command below.
jac run main.jac -ctx '{"input_image": "test.jpg"}'

Todo

  • Need to work on the Concurrency, Currently cannot use the service while training is going on using a single script. Workaround need 2 JAC Scripts running for separate tasks. And if we are running the service with multiple workers, the personalized heads won’t be shared among all the workers.
  • Lots of Boilerplate coding (need to simplify)
  • Need to add the ability to change specific attributes without writing the whole Trainer configuration
  • Logging should be changed to a standard format of jaseci.
  • Current way of connecting 2 models is through loading 2 modules and combining them in the JAC code. But this can be made into a combined module (Compositor is Proposed), where you can pass a model config to the compositor to create the Model Inference & Training Pipeline.
  • Dynamic device selection for each personalized head (for multi GPU usage)

Guidance to work with Jaseci AI kit

In this page, you will find all the basic commands needed to work with Jaseci AI kit.

Starting a Jaseci Shell session;

jsctl

At this point a js.session file will be generated at the current directory which we are working on. This session file will stores status of memory, graphs, walkers, configurations, etc. Every time when the state changes via the jsctl tool the session file will update. We also can have multiple session files as of our need with -f or --filename flag.

To start a in memory session -m or --mem-only flag can be used. This won't create a session file but will create a temporary session in memory;

Running a Jaseci Program

jsctl> jac run [file_name].jac

We can launch any jsctl commands directly from the terminal without first entering to the jaseci shell. To run Jaseci program directly from the below command line;

jsctl jac run [file_name].jac

To ensure the program runs fast, we can first compile the program using build command in prior to run the program.

jsctl jac build [file_name].jac

This will create a [file_name].jir file in the working directory. To run the compiled program run this jir file.

jsctl jac run [file_name].jir

Using Jaseci AI kit

Jaseci Kit is a collection of state-of-the-art machine learning models that are already trained on large amount of data and available to load into jaseci can retrained with our owndata.

To load a module from Jaseci AI kit;

We can simply do pip install jaseci_ai_kit in the python environment we are currently working and import the ai models from jaseci ai kit to the jaseci using actions load module jaseci_ai_kit.[module_name] command.

Example module load:

$ jsctl
jaseci > actions load module jaseci_ai_kit.bi_enc
{
 "success": true
}

Load from remote

Also we can load jaseci ai models from a remote server using actions load remote [url_to_model] command. For this each AI model should deployed as a separate service. This URL should obtained from the remote server which AI model was deployed.

Example remote load:

jaseci > actions load remote  http://192.168.49.2:32267
{
  "success": true
}

Load from local

Once we cloned the jaseci main repository to local machine we can load AI models from jaseci_ai_kit using actions load local [path_to_model]. Example local load:

jaseci > actions load local jaseci_ai_kit\jaseci_ai_kit\modules\use_enc\use_enc.py

{
  "success": true
}

The complete list of available module names and their details can be viewed here. once loaded any model use actions list command to view available actions' as below.

jaseci > actions list
[
  "net.max",
  "net.min",
  "net.pack",
  "net.unpack",
  "net.root",
  "rand.seed",
  "rand.integer",
  "rand.choice",
  "rand.sentence",
  "rand.paragraph",
  .
  .
  .
  .
]

Retraining a Jaseci model with customized data.

Jaseci AI kit provides pretrained model with large scale of data. We can retrain these models with custom data. Before training any model the model should be load from one of the method which is mentioned above. The training data should be in JSON format. An example of training jac code is in below;

node bi_enc {
	can bi_enc.train, bi_enc.infer;
	can train {
		train_data = file.load_json(visitor.train_file);
		bi_enc.train(
			dataset=train_data,
			from_scratch=visitor.train_from_scratch,
			training_parameters={
			"num_train_epochs": visitor.num_train_epochs
			});
			if (visitor.model_name):
			bi_enc.save_model(model_path=visitor.model_name);
			}
	can infer {
		res = bi_enc.infer(
		contexts=[visitor.query],
		candidates=visitor.labels,
		context_type="text",
		candidate_type="text")[0];
		std.out(res);
		visitor.prediction = res["predicted"];
		}
	}

walker train {
	has train_file;
	has  num_train_epochs = 50, from_scratch = true;
	root {
		spawn here --> node::bi_enc;
		take --> node::bi_enc;
		}
	bi_enc: here::train;
}

walker infer {
	has query, interactive = true;
	has labels, prediction;
	root {
		spawn here --> node::bi_enc;
		take --> node::bi_enc;
		}
	bi_enc {
		if (interactive) {
			while  true {
				query = std.input("Enter input text (Ctrl-C to exit)>");
				here::infer;
				std.out(prediction);}}
		else {
	here::infer;
	report prediction;}}
}

walker save_model {
	has model_path;
	can bi_enc.save_model;
	bi_enc.save_model(model_path);
}

walker load_model {
	has model_path;
	can bi_enc.load_model;
	bi_enc.load_model(model_path);
}

As we can see there are four walkers in the jac code above, train, infer, save_model and load_model.

jac run [file_name].jac -walk train -ctx "{\"train_file\": \[file_nam_of_training_data].json\"}"

  • -walk specifies the name of the walker to run. By default, it runs the init walker. but in this case we have set the walker as train.
  • -ctx stands for context. This lets us provide input parameters to the walker. This accept parameters in JSON format.

Example:

jaseci > jac run bi_enc.jac -walk train -ctx "{\"train_file\": \"clf_train_1.json\"}"
.
.
.

           Epoch : 50
           loss : 0.0435857983926932
           LR : 0.0

Epoch: 100%|████████████████████████████████████████████████████████████████| 50/50 [00:59<00:00,  1.19s/batch]{
 "success": true,
 "report": [],
 "final_node": "urn:uuid:529fae48-ea21-4f4c-9cac-02d2750b4ceb",
 "yielded": false
}

Each training epoch, the above output will print with the training loss and learning rate at that epoch. By default, the model is trained for 50 epochs.If the training successfully finishes, you should see "success": true at the end.

Make inferencing with the retrained model

After finishes the training the infer walker can be used to make inferencing,

jac run [file_name].jac -walk infer -ctx "{\"labels\": [\"[label_1]\", \"[label_2"]}"

An example of inferensing is shown below;

jaseci > jac run bi_enc.jac -walk infer -ctx "{\"labels\": [\"test drive\", \"order a tesla\"]}"
Enter input text (Ctrl-C to exit)> how can I order a tesla?
{"context": "how can I order a tesla?", "candidate": ["test drive", "order a tesla"], "score": [3.8914384187135695, 9.004763714012604], "predicted": {"label": "order a tesla", "score": 9.004763714012604}}
{"label": "order a tesla", "score": 9.004763714012604}

Saving the retrained model in Jaseci.

The retrained model is kept in memory it can be save into the local machine in a preffered location.

jac run [file_name].jac -walk save_model -ctx "{\"model_path\": \"[model_path]\"}"

We set walker as save_model here. Inside the model file we should create a node with the name save_model

Example:

jaseci > jac run bi_enc.jac -walk save_model -ctx "{\"model_path\": \"retrained_model\"}"
Saving non-shared model to : retrained_model
{
  "success": true,
  "report": [],
  "final_node": "urn:uuid:f43dc5e0-bd77-4c6b-b5eb-0e117dfc36d8",
  "yielded": false
}

If the model is saved successfully the success status will be shown as true.

Loading a saved model

Similarly, you can load a saved model with load_model,

jac run [file_name].jac -walk load_model -ctx "{\"model_path\": \"model_path\"}"

Example:

jaseci > jac run bi_enc.jac -walk load_model -ctx "{\"model_path\": \"retrained_model\"}"
Loading non-shared model from : retrained_model
{
  "success": true,
  "report": [],
  "final_node": "urn:uuid:b1dd56b0-865a-4150-9df0-50e97ffb8388",
  "yielded": false
}

If the model is saved successfully the success status will be shown as true.

Test creation guidelines for Jaseci-AI-Kit

This document defines the process to create test cases for Jaseci-Ai-Kit module. For creating this test we'll make use of Jaseci CoreTest Class and pytest framework. Let's walk through the test kit creation by following the steps below.

  1. Defining the folder structure
  2. Creating the jac file
  3. Creating the python test file
  4. Running the test

1. Defining the folder structure

In the main module we'll need to create a tests folder within test folder we'll need to have a fixtures folder

    mkdir tests
    mkdir tests/fixtures 

2. Creating the jac file

We'll need to create a jac file inside the fixtures folder.

    touch tests/fixtures/<file_name>.jac

The above jac file would contain the walkers to test all the functional flow of the AI module.

    walker test_bi_enc_get_model_config{
    can bi_enc.get_model_config;
    report bi_enc.get_model_config();
    }

3. Creating the python test file

We'll need to create a python file inside the tests folder for the testcases.

    touch tests/<file_name>.py
  1. Relevant imports

    We'll need to import CoreTest class and jac_testcase function from the test_core module.

     from jaseci.utils.test_core import CoreTest, jac_testcase
    

    We'll also need to import load and unload actions module to load and unload the actions

     from jaseci.actions.live_actions import load_module_actions, unload_module
    

    we'll need the pytest import for ordering the test cases

     import pytest
    
  2. Creating the Test Class

    We'll need a main Test class that would contain all the test cases.

    class BiEncTest(CoreTest):
         fixture_src = __file__
    
    1. Defining Setup and Tear down classes

      The Setup and TearDown classes are used to load and unload the module before and after executing the test cases

          @classmethod
          def setUpClass(cls):
              super(BiEncTest, cls).setUpClass()
              ret = load_module_actions("jaseci_ai_kit.bi_enc")
              assert ret is True
          @classmethod
          def tearDownClass(cls):
              super(BiEncTest, cls).tearDownClass()
              ret = unload_module("jaseci_ai_kit.modules.encoders.bi_enc")
              assert ret is True
      
    2. Creating testcase

      Defining each testcase has 3 steps

      1. Marking the order of excution
          @pytest.mark.order(1)
      
      1. Passing the jac file and the walker name to the jac_testcase decorator to regiter the code and excute the walker
          @jac_testcase("bi_enc.jac", "test_bi_enc_get_model_config")
      
      1. Create the testcase to manipulate and validate the return value
          def test_biencoder_get_model_config(self, ret):
              self.assertEqual(ret["success"], True)
      

      Each test case in the class would look something similar to

          @pytest.mark.order(1)        
          @jac_testcase("bi_enc.jac", "test_bi_enc_get_model_config")
          def test_biencoder_get_model_config(self, ret):
              self.assertEqual(ret["success"], True)
      

4. Running the test

To run the test file we'll use the pytest command to execute the all test cases

    pytest tests/<file_name>.py

Built With Stencil

Stencil Component Starter

This is a starter project for building a standalone Web Component using Stencil.

Stencil is also great for building entire apps. For that, use the stencil-app-starter instead.

Setup & Development

npm install
yarn start // to start dev server
yarn install
yarn start #to start dev server

Building

To build the component for production, run:

npm run build

OR

yarn build

Testing

To run the unit tests for the components, run:

npm test

OR

yarn test

Need help? Check out our docs here.

Naming Components

When creating new component tags, we recommend not using stencil in the component name (ex: <stencil-datepicker>). This is because the generated component has little to nothing to do with Stencil; it's just a web component!

Instead, use a prefix that fits your company or any name for a group of related components. For example, all of the Ionic generated web components use the prefix ion.

Using this component

There are three strategies we recommend for using web components built with Stencil.

The first step for all three of these strategies is to publish to NPM.

Script tag

  • Put a script tag similar to this <script type='module' src='https://unpkg.com/my-component@0.0.1/dist/my-component.esm.js'></script> in the head of your index.html
  • Then you can use the element anywhere in your template, JSX, html etc

Node Modules

  • Run npm install my-component --save
  • Put a script tag similar to this <script type='module' src='node_modules/my-component/dist/my-component.esm.js'></script> in the head of your index.html
  • Then you can use the element anywhere in your template, JSX, html etc

In a stencil-starter app

  • Run npm install my-component --save
  • Add an import to the npm packages import my-component;
  • Then you can use the element anywhere in your template, JSX, html etc

Jaseci UI is a library of web components built to integrate (and to make it easy to bootstrap web projects) with a Jaseci API. This library allows you to compose components and build experiences with Jaseci through the use of JSON.

To get started please follow our [[Installation Guide]] or Click to Learn More About Jaseci

Creating a Component

Each component in Jaseci UI Kit is rendered within a jsc-app. After setting up your code, and assuming your jsc-app component is placed somewhere in your html tree, the next step is to create the markup. We won't be using HTML to create the structure of our webpage, however, in this case, we will use our jsc-app component to generate the markup using JSON.

Creating a component is simple: at the bare minimum we need to create an object with a component, props, and sections properties. Here's an example of how we can render a Navbar component.

	[
		{
			"component": "Navbar",
			"sections": {
				"links": [
					{
						"component": 'NavLink',
						"props": {"label": "Home"}
					}
				]
			},
			"props": {
				"label": "Jaseci",
				"background": "red"
			},
		}
	]

In the code above we asked for a Navbar component with a single link element, we set the label of our navbar to 'Jaseci' and the background to 'red'. We also asked for a link within the navbar and for it to be rendered within the links section.

Names

Names are unique values we attach to a component that will allow us to reference it in the future to get the value of its properties.

Sections

Sections allow us to place components within another component. It also allows us to place components at specific locations within another component. In our example above, with the navbar component, links is a section specific to the navbar component that allows us to add Link components within the navbar. Some components have sections and some do not, so be sure to review the available sections for each component to know when and where you can place components within another.

Events

Events allow us to interact with our markup. Just like we can generate the structure of our webpage using props to control the styling and content of our components we can use events to add logic and functionality to make our site interactive. Let's have a look at how we can add an event to our navbar component.

Let's assume we want to display an alert message when a link is clicked: we can modify our link component to listen to an onClick event we can perform an action when we click on the link component.

Taking the code from our example above, let's modify it, we'll start by adding an events property to our link component, and within that events property, our action. Let's do it.

	[
		{
			"name": "nav",
			"component": "Navbar",
			"sections": {
				"links": [
					{
						"component": 'NavLink',
						"props": { "label": 'Home' }
						"events": {
							"onClick": [
								{
									"fn": "alert",
									"args": ["Jaseci"]
								}
							]
						}
					}
				]
			},
			"props": {
				"label": "Jaseci"
			},
		}
	]

After adding the code above, whenever we click on the Home link we are going see the message "Jaseci" printed in an alert dialog.

Actions

Actions allow us to run arbitrary javascript functions and built-in functions provided by the ui kit in response to certain events. You can perform multiple actions, one after the other, or compose actions to perform an action after another completes, or fails.

In the example above we created an alert action in response to an onClick event. Let's take a closer look at what we did.

To create an action we need to create an object with the fn property and the args property.

  • fn - the name of the function we want to call
  • args - the values we pass to the function as arguments

Here what our action looked like.

"onClick": [
	{
		"fn": "alert",
		"args": ["Jaseci"]
	}
]

What if we wanted to update a property instead? We can changing our action function to update along provide its two args: the property we want to update and the value of the property.

Here what our action looks like now.

"onClick": [
	{
		"fn": "update",
		"args": ["nav.label", "Jaseci 2.0"]
	}
]

Assuming this onClick event is still attached to our nav link component, whenever we click on this link the label of the navbar will change from Jaseci to Jaseci 2.0.

Action conditions

Consider the following code.

"onClick": [
	{
		"fn": "update",
		"args": ["nav.label", "Jaseci 2.0"],
		"cond": ['var(nav.label)::#neq::Jaseci 2.0'],
	}
]

What have you noticed? It works the same as before, however, there's now a cond property, this property allow us to prevent the execution of some actions unless a certain condition is satisfied. Each condition is a string. In this example, if we click update and the label is not already Jaseci 2.0 the action will run, if is Jaseci 2.0, it remain the same.

We can provide multiple conditions to an action and the action will only run if all conditions are satisfied.

Chaining Actions

We can run until another is executed. We can do so by providing an action as the value to the onCompleted property of another action. Here's an example using the code above:

"onClick": [
	{
		"fn": "update",
		"args": ["nav.label", "Jaseci 2.0"],
		"cond": ['var(nav.label)::#neq::Jaseci 2.0'],
		"onCompleted": {
			"fn": "alert",
			"args": ["Navbar title updated."]
		}
	}
]

Also consider the following code.

"onClick": [
	{
		"fn": "update",
		"args": ["nav.label", "Jaseci 2.0"],
		"cond": ['var(nav.label)::#neq::Jaseci 2.0']
	},
	{
		"fn": "alert",
		"args": ["Navbar title updated."]
	}
]

In the first code above, we ask for an alert dialog to display after the update action is finished. But what about the second code, doesn't it work the same? In this case, yes, but also no. Taking a closer lock at the update action of the first code, it has a cond, this condition will also prevent the onCompleted action from running if the update function did not run. The second code will give us an alert dialog even if the update function did not run.

Operations

Operations are "custom actions" that we can create that will allow us to reuse a particular sequence of actions across components. Each operation has a name and can optionally accept a number of arguments.

Take a look at the following example:

[
    {
        "component": "Container",
        "name": "container1",
        "operations": {
            "sayHi": {
                "args": [
                    "message"
                ],
                "run": [
                    {
                        "fn": "alert",
                        "args": [
                            "arg(message) cool!"
                        ]
                    }
                ]
            }
        }
    },
    {
        "component": "Button",
        "events": {
            "onClick": [
                {
                    "fn": "runOperation",
                    "operation": "container1.sayHi",
                    "args": [
                        "Hello world!"
                    ]
                }
            ]
        },
        "props": {
            "name": "btn1",
            "label": "Say Hello"
        }
    }
]

In the example above, we defined an operation within the container1 component then we called this operation in the btn1 component. Each operation is called using the callOperation action. The callOperation action requires the operation property to be defined with a valid operation as its value. An operation is referenced using the format [Component Name].[Operation Name] which translates to container1.sayHi in this case.

Operation Args

When defining an operation we can set the args property to a list of strings. This can be used to accept args to your operation. Each arg can be used in the actions within an operation and will be replaced by the values passed in the args of the callOperation action. In the example above, arg(message) is replaced with 'Hello world!' and the message alerted is 'Hello world! cool'.

Property References

Property references allow us to get the value of the property of a component. This allows us to move across components. For example, let's say you have an Inputbox component and you want to alert the value of the input box whenever the user presses the Enter button. How would we do that? Let's explore.

Let's start by taking a look at the following code:

[
		{
			"name": "inputbox1",
			"component": "Inputbox",
			"sections": []
			"events": {
				"onEnter": [
					{
						fn: "alert",
						args: "Hello, var(inputbox1.value)!"
					}
				]
			},
			"props": {
				"placeholder": "Enter your name..."
			},
		}
]

The above code will render an Inputbox component with a placeholder of Enter your name.... If we enter our name in the box and press the Enter button on the keyboard an alert dialog is shown and var(inputbox1.value) is replaced with the value of the input box.

Update

The update action allows us to update a property of a component.

Args

  1. component property in the format [component name].[property]
  2. the new value

Example

"onClick": [ { "fn": "update", "args": ["nav.label", "Jaseci 2.0"], } ]

Alert

Runs the browser alert function.

Args

  1. alert message

Example

"onClick": [
    {
        "fn": "add",
        "args": [1, 3],
    }
]

Call Endpoint

Runs the browser alert function.

Properties

  • endpoint - the api url that will be called

Args

  1. HTTP Verb
  2. Request body

Example

"onClick": [
        {
            "fn": "callEndpoint",
            "endpoint": "http://localhost:3334/message",
            "args": [
                "POST",
                {
                    "message": "Hello!"
                }
            ]
        }
]

Append

Adds a component as a child of another component.

Args

  1. component name
  2. component structure - the child component

Example

"onClick": [
    {
        "fn": "append",
        "args": ["msgs",
            {
                "component": "Text",
                "props": {
                    "value": "Hello"
                }
            }
        ],
    }
]

Creating an API Endpoint

The webkit is able to call api endpoints, as long as the api endpoint sends back a list of supported / built-in actions.

For example, let's say we want to alert a message on the frontend. To do so, we'll need to ensure that the response body for the api endpoint is an array that contains an alert action.

Example API Endpoint

Route.post('/alert-message', async ({ request }) => {
  return [
    {
      fn: 'alert',
      args: [`Hey, how are you?`],
    },
    {
      fn: 'alert',
      args: [`How is the weather?`],
    },
  ]
})

From the code above, once it reaches the frontend, each action specified in the array will execute; you should see two alert messages.

Calling the Endpoint

Once the api endpoint is ready, we can call it on the frontend with the callEndpoint action.

For example:

{
    "component": "Button",
    "props": {
        "label": "Alert Messages"
    },
    "events": {
        "onClick": [
            {
                "fn": "callEndpoint",
                "endpoint": "http://localhost:3334/alert-message",
                "args": [
                    "POST",
                    {}
                ]
            }
        ]
    }
}

The callEndpoint action can receive optional args, the HTTP verb and the request body.

Install Jaseci using Helm

Prerequisite:

Install Helm in your Computer device

Installing Jaseci

Update the Values.yaml files according to the requirement and then run below command


helm install jaseci .

Installing Jaseci Ai Models

Its very easy to add any ai model you wish to add that uses Jaseci KIT docker image.

In values.yaml file, aimodels acts as a list of ai model you want to install with jaseci. Each model needs to be added below as a new object below aimodels like :


aimodels:

  - name: aimodel1
    ..........
    ..........
  - name: aimodel2
    ..........
    ..........

the variables of object need to be defined, an example below, you can copy it and change the values as needed


- name: js-use-qa
    script:
      - pip install jaseci-kit==1.3.3.19
      - uvicorn jaseci_kit.use_qa:serv_actions --host 0.0.0.0 --port 80    
    resources: 
      requests:
          memory: 2Gi
      limits:
          memory: 2Gi

Adding Ingress for Production workload.

For Production use case its better to activate Ingress for more secured and optimal usage.

To Activate, In values.yaml, Please update Ingress values as below


ingress:
  enabled: true   # to enabled ingress
  className: nginx-ingress   # Type of ingress controller, example nginx-ingress
  # annotations: add annotation for SSL , Add Cloud Specific annotions for SSL
  hosts:
    - host: test.xyz.com   # host URL for your jaseci services
      paths:
        - path: /
          pathType: Prefix


To Install Nginx-Controller, you can install it from below helm command

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install nginx-ingress ingress-nginx/ingress-nginx

Create AWS EKS Infrastructure for Jaseci using Terraform

Preparation

  1. Make sure you have a valid AWS Account
  2. Create Access Key and Secret with EKS, Ec2 , VPC , IAm permissions
  3. Connect your local computer to aws using AWS CLI (using AWS Configure)
  4. Install terraform CLI from https://www.terraform.io/downloads
  5. Update the config.tf file with respective key values as described
  6. Update environment.auto.tfvars file with your AWS Account Id and related configuiration of cluster as needed.

Initilaize

Run below commoand to initialize your terraform


Terraform init

Plan - To check if all Configuration is properly Set

Terraform plan

Apply - Get Set Go, Create jaseci Cluster

Terraform apply

Destroy - if you need to delete the infrastructure, you can use below command

Terraform destroy

NOTE

By default terraform runs on default workspace which adds dev environment to the cluster suffix. If you want to create multiple environment , for example prod , add Prod related values in config.tf

Workspace can be created and switched using below command

terraform workspace new <environment-Name>

or you can select when you want to run updates on any environmnet using

terraform workspace select <environment-Name>

Create Azure AKS Infrastructure for Jaseci using Terraform

Preparation

  1. Make sure you have a valid Azure Account.
  2. Create a service principal from https://learn.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal
  3. Connect your local computer to azure using your service principal credentials "az login --service-principal -u < appID > -p < PASSWORD > --tenant < tenantID >"
  4. Install terraform CLI from https://www.terraform.io/downloads
  5. Update the config.tf file with respective key values as described and related configuiration of cluster as needed
  6. Update environment.auto.tfvars file with your Azure Service Principal details.

Initilaize

Run below commoand to initialize your terraform


Terraform init

Plan - To check if all Configuration is properly Set

Terraform plan

Apply - Get Set Go, Create jaseci Cluster

Terraform apply

Destroy - if you need to delete the infrastructure, you can use below command

Terraform destroy

NOTE

By default terraform runs on default workspace which adds dev environment to the cluster suffix. If you want to create multiple environment , for example prod , add Prod related values in config.tf

Workspace can be created and switched using below command

terraform workspace new <environment-Name>

or you can select when you want to run updates on any environmnet using

terraform workspace select <environment-Name>

Locust Load Test for JASECI

Locust is an easy-to-use, distributed, user load testing tool. It is intended for load-testing web sites (or other systems) and figuring out how many concurrent users a system can handle.

Run Locust natively

Install Locust

pip install locust

Configure the test

Create test users. This following script will prompt you for the jaseci server URL and number of test users you wish to create.

python create_users.py

Then create a folder in sample_code/. Set up a file config.json in the folder. Here is an example:

{
    "walkers" : ["init"], 
    "src" : "walker.jac",
    "remote_actions" : ["http://flair-ner:80/"],
    "local_actions" : []
}

walkers is a list of walkers that you want to call (in sequence). src the name of the file that contains your code. remote_actions should contain a list of URLs of your remote services. local_actions should contain a list of names of your modules.

Run the test

The program reads the environment variable LOCUST_TEST_SRC for the location of the test configuration and LOCUST_HOST for the jaseci server URL.

LOCUST_HOST='JASECI_URL' LOCUST_TEST_SRC='sample_code/<YOUR TEST>' locust -f run_jac.py

Go to the link specified in console, e.x http://0.0.0.0:8089 and specify the desired number of users for the load test and initiate the test. Please make sure that the host url is the same with LOCUST_HOST.

Run Locust with docker

Set up the environment

Install docker

pip install docker

Build the custom docker image

docker build -t locust-jac-test .

Note If you are testing a localhost jaseci, please make sure that your Jaseci service is exposed to 0.0.0.0 since we are going to access the service from docker, not local. To achieve that, run

kubectl port-forward <JASECI POD NAME> 8888:80 --address="0.0.0.0"

Configure the tests

Note Please make sure that you have configured the tests properly as we did in the previous section.

Since we are not going to open a Web UI this time, we need some more information. Please give all the information in test.json. Here is an example

{
    "hostName": "http://172.17.0.1:8888",
    "testName": "simple",
    "testSRC": "sample_code/simple",
    "userNum": 5,
    "spawnRate": 1,
    "duration": "10s"
}

hostName gives the URL of the host. Note that localhost on the host computer is mapped to 172.17.0.1 inside docker containers. testName is a simple name for the test. It will be included in the name of the container and the result that we retrieve. testSRC specifies the path to the specific test configuration. userNum specifies the number of users that we need to spawn in this test. spawnRate specfies the speed that we spawn the users (How many users created in one second). duration sets the time length of the test.

Run the test

To run the test

python start_docker.py

All the tests will be created inside a separate docker container. The containers are named Locust_<TESTNAME>. All the tests should be run in parallel. When all the tests are done, the python script automatically removes and kills all the containers.

Retrieve the test data

All available data are retrieved after you ran the script. They should be available under results/<testName>/. logs.txt is the log of the test. data.tar file should contain four CSV files. They are directly from locust.

Setting of Monitoring for JASECI

Prerequisite

Install Helm

We will use Helm Chart for installing our monitoring tools. Helm is widely known as "the package manager for Kubernetes".

https://helm.sh/docs/intro/install/

The Promethues and Grafana Stacks will be used as monitoring tools. We will start first by installing the Promethues pods in our cluster.

You can install pods in the same namespaces as your workload namespaces or you can also create a separate namespace for the monitoring pods.

Promethues

Step 1

add prometheus Helm repo

First we will add chart repository reference in our local

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

Step 2

Then we will install the promethues service using below command.

If you are installing in your local kubernetes , run below command :

 helm install prometheus prometheus-community/prometheus

If you are using Cloud , For example here we use AWS :

 helm install prometheus prometheus-community/prometheus \
    --set alertmanager.persistentVolume.storageClass="gp2" \
    --set server.persistentVolume.storageClass="gp2"

After this run the Prometheus server can be accessed via port 80 on the following DNS name from within your cluster: prometheus-server.<namespace>.svc.cluster.local

Here, in place of <namespace> put the name of namespace where your service lies. Note this URL for reference to put in Grafana setup later.

Step 3

Run below command to check all the pods has been successfully created. This will create 5 pods

kubectl get all

The 5 pods have different roles , as below :

Promethues-alert-manager - The Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts.

promethueus-kube-state-metrics - kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. (See examples in the Metrics section below.) It is not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.

promethues-node-exporter - The Prometheus Node Exporter exposes a wide variety of hardware- and kernel-related metrics.

promethues-pushgateway - The Pushgateway is an intermediary service which allows you to push metrics from jobs which cannot be scraped. For details, see Pushing metrics.

promethues-server - It is the main promethues server pod which is responsible for all queries.

Step 4

Use Port-forward to test if promethues is running in your local browser to check if all setup works and you are able to get the promethues running.

kubectl port-forward deploy/prometheus-server 8080:9090

Grafana

Go to grafana.yaml under grafana Directory in the code and update the values of promethues URL as required

In the cloned repository folder , go to grafana/grafana.yaml file and update the URL to the promethues service URL as we noted doen in step 2 above, i.e prometheus-server.<namespace>.svc.cluster.local

Please note that in the previous example, we did not create any specific namespace for prometheus so the <namespace> here (and later) should be replaced with default if you follow the tutorial completely.

This is required to connect Grafana to collect data from promethues .

Now from the monitoring folder of the repo, run below command:

Step 1

First we will add chart repository reference in our local

helm repo add grafana https://grafana.github.io/helm-charts

Step 2

If your running in your local kubernetes , run below :

Please Note to replace with the password you want to add for your grafana portal.

helm install grafana grafana/grafana \
    --set adminPassword='<YOUR PASSWORD>' \
    --values grafana/grafana.yaml \

If you are using Cloud here, AWS :

helm install grafana grafana/grafana \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true \
    --set adminPassword='<YOUR PASSWORD>' \
    --values helmcharts/grafana/grafana/grafana.yaml \
    --set service.type=LoadBalancer

Step 3

Run the following command to check if Grafana is deployed properly and you can able to see running grafana pods:

kubectl get all

Step 4

Now, try to run grafana in your browser:

If you running in your local kubernetes, run below :

kubectl port-forward deploy/grafana 80:3000

if you have used AWS Cloud, and with Load Balancer as in step 2 , you will get a Public External URL/OP

Run below code to get the External-IP for Grafana

kubectl get svc grafana

Step 5

Visit the URL from your Browser and login with credentials

Username : admin password- "As given while applying grafana Helm"

Step 6

For creating a dashboard to monitor the cluster:

Cluster Monitoring Dashboard

Click '+' button on left panel and select ‘Import’.

Copy paste the json file in grafana-dashboards folder.

Click ‘Load’.

Select ‘Prometheus’ as the endpoint under prometheus data sources drop down.

Click ‘Import’.

This will show monitoring dashboard for all cluster nodes

Generally, this dashboard contains a set of necessary panels. It monitors CPU and memory utilization for the nodes and for each pod.

Credits https://www.eksworkshop.com/

HOW TO USE SERVICE

CommonService (jaseci.svc.common)

This class is the base for implementing service. Dev should use this class if they want to use Service's common attribute and life cycle.


Common Attributes

AttributeDescription
appAttribute for the actual library used in service. For example in TaskService is Celery
enabledIf service is enabled in config. The service can be available (upon building) but not enabled (from config)
stateFor service life cycle
quietFor log control and avoid uncessary logs

Service Settings

Config

  • service use MemoryHook's service_glob method. This will automatically add default config if it's not existing
  • You need to use ConfigApi (config_set and config_refresh) to update configs on every hooks including redis.
  • If config is updated through admin portal and redis is running, redis needs to remove the old copy of config since redis is the first hook where the service get the configs
    // Structure
    {
        "enabled": True,
        "quiet": True,
        "field1": "val1",
        "field1": "val2",
        "field1": "val3",
        ...
    }
    
    

Kube

  • this is similart to config's behavior but it uses different structure

        // Structure: grouped values from `yaml.safe_load_all(...yaml_file...)`
        // map each safe_load_all to $.kind
        {
            "ServiceAccount": [
                {
                    "apiVersion": "v1",
                    "kind": "ServiceAccount",
                    "metadata": {
                        "labels": {
                            "helm.sh/chart": "kube-state-metrics-4.13.0",
                            "app.kubernetes.io/managed-by": "Helm",
                            "app.kubernetes.io/component": "metrics",
                            "app.kubernetes.io/part-of": "kube-state-metrics",
                            "app.kubernetes.io/name": "kube-state-metrics",
                            "app.kubernetes.io/instance": "jaseci-prometheus",
                            "app.kubernetes.io/version": "2.5.0",
                        },
                        "name": "jaseci-prometheus-kube-state-metrics",
                        "namespace": "default",
                    },
                    "imagePullSecrets": [],
                }
            ]
        }
    

Common Methods

MethodsArgumentsDescriptionExample
starthook=nullablestart the actual service based on settings (kube,config)RedisService().start()
is_readycheck if state is NOT_STARTED and app is not yet set
is_runningcheck if state is RUNNING and app is set
has_failedcheck if state is FAILED
spawn_daemonname_of_daemon=targe_methodspawn daemon threads for background processself.spawn_daemon(jsorc=self.interval_check)
terminate_daemonname_of_daemon_to_terminate...terminate daemon threadsself.terminate_daemon("jsorc", "other_daemon_name")

Service Life Cycle (can be overriden)

  • __init__ (initial trigger for build_service)
    • this is optional to be overriden if you have additional fields to be use
    • initial state would be NOT_STARTED
    def __init__(self, hook=None):
        super().__init__(hook) # run CommonService init
        # ... your other code here ...
  • build_kube (required to be overriden if you have kube settings)
    • will be called upon build and before build_config
    • sample kube config are on jaseci_serv.jaseci_serv.kubes
    def build_kube(self, hook) -> dict:
        return hook.service_glob("REDIS_KUBE", REDIS_KUBE) # common implementation using global vars
  • build_config (required to be overriden)
    • will be called upon build and after build_kube
    • sample config are on jaseci_serv.jaseci_serv.configs
    def build_config(self, hook) -> dict:
        return hook.service_glob("REDIS_CONFIG", REDIS_CONFIG) # common implementation using global vars
  • run (required to be overriden)
    • triggered upon service.start()
    • upon trigger start it still need to check if it's enabled and on ready state (NOT_STARTED)
    • if service is not enabled this method will be ignored
    • if service is enabled but state is not equal to NOT_STARTED run method will also be ignored
    • if all requirements were met, state will be updated to STARTED before running run method
    • if run method raised some exception, state will update to FAILED and failed method will be called
    • if run method is executed without error state will be updated to RUNNING
    def run(self, hook=None):
        self.__convert_config(hook)
        self.app = self.connect() # connect will return Mailer class with stmplib.SMTP (mail.py example)
  • post_run (optional)
    • triggered after run method and if state is already set to RUNNING
   def post_run(self, hook=None):
        self.spawn_daemon(
            worker=self.app.Worker(quiet=self.quiet).start,
            scheduler=self.app.Beat(socket_timeout=None, quiet=self.quiet).run,
        ) # spawn some threads for Celery worker and scheduler
  • failed (optional)
    • this will be used if you have other process that needs to be executed upon start failure
    def failed(self):
        super().failed()
        self.terminate_daemon("worker", "scheduler") # close some thread when error occurs (task.py example)
  • reset (optional)
    • this will be used if you have other process that needs to be executed upon resetting
    def reset(self, hook):
        if self.is_running():
            self.app.terminate() # app needs to be terminated before calling the actual reset

        super().reset(hook)

MetaService (base from CommonService)

This class will now be the handler for every service class. It's attributes are different from other services as they are static variable instead of instance variable. This is to have a global handler for every services and will not reinitialize every time it was called.

Usage

  • add_context
    • for adding a class that used for initalization
        from jaseci.hook import RedisHook
        from jaseci_serv.hook.orm import OrmHook

        ms1 = MetaService()
        ms1.add_context("hook", RedisHook, *args, **kwargs) # args/kwargs are optional

        ms2 = MetaService()
        ms2.get_context("hook")["class"] == RedisHook # True
        ms2.get_context("hook")["args"] == args # True
        ms2.get_context("hook")["kwargs"] == kwargs # True

        ms2.add_context("hook2", RedisHook, *args, **kwargs)
        # is equal to
        MetaService().add_context("hook2", RedisHook, *args, **kwargs)

        ms3 = MetaService()
        ms3.add_context("hook", OrmHook, *args, **kwargs)# will override hook From RedisHook to OrmHook
  • get_context
    • for getting the class without initializing
        from jaseci.hook import RedisHook

        ms1 = MetaService()
        ms1.add_context("hook", RedisHook, *args, **kwargs) # args/kwargs are optional

        ms1.get_context("hook")["class"] == RedisHook # True
        ms1.get_context("hook")["args"] == args # True
        ms1.get_context("hook")["kwargs"] == kwargs # True
  • build_context
    • initialize selected context
        ms1 = MetaService()
        ms1.add_context("hook", RedisHook, *args, **kwargs) # args/kwargs are optional

        hook = ms1.build_context("hook") # hook will be RedisHook instance
  • add_service_builder
    • for adding service class that used for initialization
        ms1 = MetaService()
        ms1.add_service_builder("redis", RedisService)
  • build_service
    • for getting service instance
    • has option to make it disposable or reusable
        ms1 = MetaService()
        backround = False # False == disposable | True == will be reusable and will initialize only once
        redis = ms1.build_service("redis", background, *other_args, **other_kwargs)
  • get_service
    • for getting service instance but it will assume it was on background
    • if it's not yet initialized, it will automatically run build_service with background:True
        ms1 = MetaService()
        redis = ms1.get_service("redis", *other_args, **other_kwargs)

Common Builder (uses build_context)

  • build_hook = will call build_context("hook") but will run and add some default services such as kube, jsorc, promon, redis, task, mail
        from jaseci.hook import RedisHook
        from jaseci.element.master import Master

        ms1 = MetaService()
        ms1.add_context("hook", RedisHook, *args, **kwargs) # args/kwargs are optional

        hook = ms1.build_hook() # hook will be RedisHook instance
        hook.kube # kube service
        hook.jsorc # jsorc service
        hook.promon # promon service
        hook.redis # redis service
        hook.task # task service
        hook.mail # mail service
        hook.meta # actual meta service
  • build_master
    • will call build_context("master") and add build_context("hook") for _h
        ###################################################
        #       No need to add this part unless you       #
        #        need to override populate_context        #
        ###################################################
        # from jaseci.hook import RedisHook
        # from jaseci.element.master import Master
        # ms1 = MetaService()
        # ms1.add_context("hook", RedisHook, *args, **kwargs)
        # ms1.add_context("master", Master, *args, **kwargs)
        ###################################################
        #  ---------------------------------------------- #
        ###################################################

        master = ms1.build_master() # hook will be RedisHook instance
        master._h # hook instance
        _h.kube # kube service
        _h.jsorc # jsorc service
        _h.promon # promon service
        _h.redis # redis service
        _h.task # task service
        _h.mail # mail service
        _h.meta # actual meta service
  • build_super_master
    • will call build_context("super_master") and add build_context("hook") for _h
        ###################################################
        #       No need to add this part unless you       #
        #        need to override populate_context        #
        ###################################################
        # from jaseci.hook import RedisHook
        # from jaseci.element.super_master import SuperMaster
        # ms1 = MetaService()
        # ms1.add_context("hook", RedisHook, *args, **kwargs)
        # ms1.add_context("super_master", SuperMaster, *args, **kwargs)
        ###################################################
        #  ---------------------------------------------- #
        ###################################################

        master = ms1.build_master() # hook will be RedisHook instance
        master._h # hook instance
        _h.kube # kube service
        _h.jsorc # jsorc service
        _h.promon # promon service
        _h.redis # redis service
        _h.task # task service
        _h.mail # mail service
        _h.meta # actual meta service

Example Usage (StripeService)


import stripe
from jaseci.svc import CommonService
from .config import STRIPE_CONFIG

class StripeService(Co):

    def run(self):
        self.app = stripe
        self.app.api_key = self.config.get("key") # ex: "sk_test_4eC39HqLyjWDarjtT1zdp7dc"

    def build_config(self, hook) -> dict:
        return hook.service_glob("STRIPE_CONFIG", STRIPE_CONFIG)


    def other_method_for_automation1():
        print("run_payment")

    def other_method_for_automation2():
        print("run_add_user")

    def other_method_for_automation3():
        print("run_remove_user")

# ----------------------------------------------- #

from path.to.stripe import StripeService
from jaseci_serv.svc import MetaService

    # ...

    meta = MetaService()
    meta.add_service_build("stripe", StripeService)

    # ...

    # for disposable service
    stripe_service = meta.build_service("stripe", False, hook)
    stripe_service.app.call_any_stripe_methods()

    # for reusable service
    stripe_service1 = meta.get_service("stripe", hook)
    stripe_service2 = meta.get_service("stripe", hook)
    stripe_service3 = meta.get_service("stripe", hook)
    # stripe_service1 == stripe_service2 == stripe_service3

    stripe_service1.app.call_any_stripe_methods()
    stripe_service2.other_method_for_automation2()
    stripe_service3.other_method_for_automation3()

HOW TO USE JSORC SERVICE

  • JsOrcService will try to run all of not RUNNING services but tagged to keep alive
  • it will check each service if it has kube config and will try to add every setting to kubernetes.
  • It will also check on the cluster if the service is properly running before it runs the actual jaseci service
  • if service doesn't have kube config it will just try to rerun the service

!! PREREQUISITE !!

  • Kubernetes should be enabled, running and connected
  • JsOrc should be enabled and running

USAGE

  • adding the service to keep_alive will let the jsorc handle it
  • any kube and config for services is set to the actual service not on jsorc

JSORC_CONFIG = {
    "enabled": False,
    "quiet": True,

    // interval checker for each service to keep alive
    "interval": 10,

    // kubernetes namespace
    "namespace": "default",

    // service to keep alive
    "keep_alive": ["promon", "redis", "task", "mail"],
}

EXAMPLE

with kube config

  • prometheus.py
  • kube.py
    • PROMON_KUBE == grouped values from yaml.safe_load_all(...yaml_file...)
      • ex:
          // map each safe_load_all to $.kind
          {
              "ServiceAccount": [
                  {
                      "apiVersion": "v1",
                      "kind": "ServiceAccount",
                      "metadata": {
                          "labels": {
                              "helm.sh/chart": "kube-state-metrics-4.13.0",
                              "app.kubernetes.io/managed-by": "Helm",
                              "app.kubernetes.io/component": "metrics",
                              "app.kubernetes.io/part-of": "kube-state-metrics",
                              "app.kubernetes.io/name": "kube-state-metrics",
                              "app.kubernetes.io/instance": "jaseci-prometheus",
                              "app.kubernetes.io/version": "2.5.0",
                          },
                          "name": "jaseci-prometheus-kube-state-metrics",
                          "namespace": "default",
                      },
                      "imagePullSecrets": [],
                  }
              ]
          }
      
# ... other imports

from .kube import PROMON_KUBE

class PromotheusService(CommonService):

    # ... all other codes ...

    def build_kube(self, hook) -> dict:
        return hook.service_glob("PROMON_KUBE", PROMON_KUBE)

  • since promon is included on keep_alive, JsOrcService will include it on interval_check
  • during interval_check, JsOrcService will try to add every kube configuration from PROMON_KUBE grouped by commands
  • on first interval_check, it is expected to ignore the rerun of the promon service because the pods that has been generated is not yet fully initialized and running.
  • subsequent interval_check should have the ability to restart the promon service since pods for prometheus server should be available by that time (this may vary depends on network or server)
  • if promon is now running it will now be ignored on next interval_check

without kube config

  • task.py
  • if it's not yet running, every interval_check will check if TaskService is running.
  • if it returns false, it will just run task.reset(hook)

SUMMARY

  • Initialization of every service included in keep_alive config should be automatically handled by jsorc. JsOrc restarting a service is identical to triggering it via config_refresh api

HOW TO TRIGGER TASK

Any walker that can be called with is_async field

GET

  • just add is_async=true on query param

POST (just choose one)

  • add is_async=true query param
  • if json body, add "is_async":true
  • if multipart, add new field is_async value is equal to true

RESPONSE

{
    // False if celery is not running
    "is_queued": true,

    // task id else actual report structure
    "result": "efd67095-a7a0-40db-8f89-6887ae56dbb3"
}

GET TASK STATE

UNCONSUMED/RUNNING TASK

/js/walker_queue_check

RESPONSE

{
    "scheduled": {
        "celery@BCSPH-LPA-0327": []
    },
    "active": {
        "celery@BCSPH-LPA-0327": []
    },
    "reserved": {
        "celery@BCSPH-LPA-0327": []
    }
}

SPECIFIC TASK

/js/walker_queue_check?task_id={{task_id}}

  • status check only

/js/walker_queue_wait?task_id={{task_id}}

  • will forcely wait for the result

RESPONSE

{
    "state": "SUCCESS",

    // will show if result is available
    "result": {
        "success": true,
        "report": [
            ...
        ]
    }
}

HOW TO SETUP SCHEDULE

SCHEDULED_WALKER

  • Add periodic task
  • Select jaseci.svc.task.common.ScheduledWalker
  • set your schedule (interval, crontab, solar, clocked, start/end data are supported)
  • set argument with below kind of structure

ARGUMENT STRUCTURE

{
    // Required
    "name": "run",

    // Required
    "ctx": {},

    // Optional but may not have default
    // accepted value: urn | alias
    "nd": "active:graph",

    // Optional but may not have default
    // accepted value: urn | alias | global
    "snt": "active:sentinel",

    // Required
    // used also for getting aliases
    "mst": "d6851f2a-e4a1-4fca-b582-9db5e146af59"
}

OPTIONAL FEATURES

SCHEDULED_SEQUENCE

  • Add periodic task
  • Select jaseci.svc.task.common.ScheduledSequence
  • set your schedule (interval, crontab, solar, clocked, start/end data are supported)
  • set argument with below kind of structure

ARGUMENT STRUCTURE

{
        // optional if you just want to add default values
        "persistence": {
            "additional_field": "can_be_call_via_#.additional_field",
        },

        // not recommended to use but possible
        "container": {
            // act as previous response
            "current": {},

            // will auto generate for the loop :: all of these are optional
            "parent_current": {},
            "index": {},
        },

        "requests": [{
            "method": "POST",
            "api": "http://localhost:8000/user/token/",
            "body": {
                "email": "dummy@dummy.com",
                "password": "Bcstech123!"
            },

            // save to persistence
            // accessible via #.login / #.login_req
            "save_to": "login",
            "save_req_to": "login_req"
        },

        {
            "method": "POST",
            "api": "http://localhost:8000/js_admin/master_allusers",
            "body": {
                "limit": 0,
                "offset": 0
            },
            "header": {

                // $ == previous response
                "Authorization": "token {{$.token}}"

            },

            // by default exception will break the loop
            // ignore_error true will continue the loop/sequence even exception occured
            "ignore_error": true,

            // initialize loop after the current block trigger
            "__def_loop__": {

                // $ == current response
                // $.path.to.your.array !! required to be an array
                "by": "$.data",

                // filter structure
                // supports `or` and `and` operator.
                // filter = [] is default to `and` operator
                "filter": [{
                    "or": [{
                        "by": "$.user",
                        "condition": {

                            // optional constraints
                            // can be remove if not needed
                            "eq": "dummy+testing3@dummy.com",
                            "ne": null, "gt": null, "gte": null,
                            "lt": null, "lte": null, "regex": null

                        }
                    }, {
                        "and": [{
                            "by": "$.user",
                            "condition": {
                                "eq": "dummy+testing2@dummy.com"
                            }
                        }, {
                            "by": "$.jid",
                            "condition": {
                                "eq": "urn:uuid:29cba0c9-e24e-4d15-a2b6-4354c59a4c86"
                            }
                        }]
                    }]
                }],

                // nested request used for the loop
                // same mechanism from requests above
                "requests": [{
                        "method": "POST",
                        "api": "http://localhost:8000/js/object_get",
                        "body": {
                            // $ on first request on loop is from current response from the looper
                            "obj": "{{$.jid}}",
                            "depth": 0,
                            "detailed": true
                        },
                        "header": {
                            // # == persistence
                            "Authorization": "token {{#.login.token}}"
                        },
                        "ignore_error": true,

                        // ! == current index :: default 0
                        "save_to": "object_get_{{!}}",

                        // nested loop is supported
                        "__def_loop": ...
                    },
                    {
                        "method": "POST",
                        "api": "http://localhost:8000/js/walker_run",
                        "body": {
                            "name": "get_botset",
                            "ctx": {},
                            "nd": "{{$.active_gph_id}}",
                            "snt": "active:sentinel"
                        },
                        "header": {
                            "Authorization": "token {{#.login.token}}"
                        },

                        // @ == the current index `data` on the loop
                        "save_to": "response_{{@.jid}}",
                        "save_req_to": "req_{{@.jid}}"
                    }
                ]
            }
        },
        {
            "method": "GET",
            "api": "https://jsonplaceholder.typicode.com/todos/100",
            "save_to": "testing_nested",
            "save_req_to": "req_testing_nested"
        },
        {
            // Trigger will use jaseci interface
            "method": "JAC",
            "api": "master_allusers",
            "body": {
                "limit": 0,
                "offset": 0
            },
            // Optional: if master is present trigger is
            // considered authenticated or general else public
            "master": "{{#.master}}", // will use persistence.master
            "save_to": "master_allusers_by_walker"
        }
    ]
}

MAIL SERVICE CONFIG

Version 1

Structure

EMAIL_ACTIVATION_BODY= "Thank you for creating an account!\n\nActivation Code: {{code}}\nPlease click below to activate:\n{{link}}"
EMAIL_ACTIVATION_HTML_BODY= "Thank you for creating an account!<br><br>Activation Code: {{code}}<br>Please click below to activate:<br>{{link}}"
EMAIL_ACTIVATION_SUBJ= "Thank you for creating an account!\n\n"
EMAIL_DEFAULT_FROM= "{{some values}}"
EMAIL_HOST= "{{some values}}"
EMAIL_HOST_PASSWORD= "{{some values}}"
EMAIL_HOST_USER= "{{some values}}"
EMAIL_PORT= "{{some values}}"
EMAIL_RESETPASS_BODY= "Your Jaseci password reset token is: {{token}}"
EMAIL_RESETPASS_HTML_BODY= "Your Jaseci password reset" "token is: {{token}}"
EMAIL_RESETPASS_SUBJ= "Password Reset for Jaseci Account"
EMAIL_USE_TLS= True
  • one key per attribute

Version 2 (Current Implementation)

Structure

MAIL_CONFIG = {
    "enabled": True,
    "quiet": True,
    "version": 2,
    "tls": True,
    "host": "",
    "port": 587,
    "sender": "",
    "user": "",
    "pass": "",
    "backend": "smtp",
    "templates": {
        "activation_subj": "Please activate your account!",
        "activation_body": "Thank you for creating an account!\n\nActivation Code: {{code}}\nPlease click below to activate:\n{{link}}",
        "activation_html_body": "Thank you for creating an account!<br><br>Activation Code: {{code}}<br>Please click below to activate:<br>{{link}}",
        "resetpass_subj": "Password Reset for Jaseci Account",
        "resetpass_body": "Your Jaseci password reset token is: {{token}}",
        "resetpass_html_body": "Your Jaseci password reset" "token is: {{token}}",
    },
    // optional
    "migrate": false
}
  • Backward compatible with version 1
  • version attribute will determine which version should be used
    • version: 2
      • this will use the actual MAIL_CONFIG values including the templates
    • version: 1
      • this will override MAIN_CONFIG's attribute (but not saved!!).
      • Values from Version 1 will be copied to their equivalent Version 2 attributes.
  • migrate
    • if this field is true. Version 1 values will be copied to their equivalent Version 2 attributes and it will be saved on DB

Contributing to the Jaseci Open Source Project


How to start contributing

Welcome to Jaseci! To start contributing, we would like you to start with issues.

Working on a new feature or fixing a bug you found

If you would like to add a new feature or fix a bug you have found, we prefer that you open a new issue in the Github repo before creating a pull request.

It’s important to note that when opening an issue, you should first do a quick search of existing issues to make sure your suggestion hasn’t already been added as an issue. If your issue doesn’t already exist, and you’re ready to create a new one, make sure to state what you would like to implement, improve or bugfix.

Work on an existing issue

If you want to contribute code, but don't know what to work on, check out the existing list of issues

Certain issues are marked with the "good first issue" label. These are issues that we think are great for first time contributor to work on while they are still getting familarized with the Jaseci codebase.

To work on an existing issue, go to the issue in Github, add a comment stating you would like to work on it and include any solutions you may already have in mind. Assign the issue to yourself.

The Jaseci team will then work with you on the issue and the downstream pull request to guide you through merging your code into the Jaseci codebase.


How to contribute code

Code contribution will be in the form of Pull Request (PR) on Github.

What is a Pull Request (PR)?

This is how the GitHub team defines a PR:

“Pull requests let you tell others about changes you’ve pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.”

This process is used by both Jaseci team members and Jaseci contributors to make changes and improvements.

How to open a PR and contribute code to Jaseci Open Source

1. Forking the Jaseci Repository

Head to Jaseci repository and click ‘Fork’. Forking a repository creates you a copy of the project which you can edit and use to propose changes to the original project.

Once you fork it, a copy of the Jaseci repository will appear inside your GitHub repository list, under your username.

2. Cloning the Forked Repository Locally

To make changes to your copy of the Jaseci repository, clone the repository on your local machine. To do that, run the following command in your terminal:

git clone https://github.com/your_github_username/jaseci.git

Note: this assumes you have git installed on your local machine. If not, check out the following guide to learn how to install it.

3. Update your Forked Repository

Before you make any changes to your cloned repository, make sure you have the latest version of the original Jaseci repository. To do that, run the following commands in your terminal:

cd jaseci
git remote add upstream https://github.com/Jaseci-Labs/jaseci.git
git pull upstream main

This will update the local copy of the Jaseci repository to the latest version.

4. Implement your code contribution on a feature branch

We recommend you to add your code contribution to a new branch (different from main). Then you can continuously run the previous step to always keep the main branch in your forked repo up-to-date with the original repo. This way you have the flexibility to easily inspect your changes and resolve any potential merge conflicts all within the forked repo.

git checkout -b name-of-your-new-branch

5. Push changes to your forked repository on GitHub

Once you are happy with the changes you made in the local files, push them to the forked repository on GitHub. To do that, run the following commands:

git add .
git commit -m ‘fixed a bug’
git push origin name-of-your-new-branch

This will create a new branch on your forked Jaseci repository, and now you’re ready to create a Pull Request with your proposed changes!

6. Opening the Pull Request on Jaseci Open Source

Head to the forked repository and click on a Compare & pull request button.

This will open a window where you can choose the repository and branch you would like to propose your changes to, as well as specific details of your contribution. In the top panel menu choose the following details:

  • Base repository: Jaseci-Labs/jaseci
  • Base branch: main
  • Head repository: your-github-username/jaseci
  • Head branch: name-of-your-new-branch

Next, make sure to update the pull request card with as many details about your contribution as possible. Proposed changes section should contain the details of what has been fixed/implemented, and Status should reflect the status of your contributions. Any reasonable change (not like a typo) should include a changelog entry, a bug fix should have a test, a new feature should have documentation, etc.

Once you are happy with everything, click the Create pull request button. This will create a Pull Request with your proposed changes.

If you are ready to get feedback on your contribution from the Jaseci team, leave a comment on the PR.

7. Merging your PR and the final steps of your contribution

A member from the Jaseci team will review your PR and might ask you to make additional changes and update. To update your PR, head back to the local copy of your repo, implement the changes requested and repeat the same steps above. Your PR will automatically be updated with your latest changes. Once you've implemented all of the suggested changes, tag the person who first reviewed your PR in a comment of the PR to ask them to review again.

Finally, if your contribution is accepted, one of the Jaseci team member will merge it to the codebase!

Things to know about creating a PR

Opening issues before PRs

Like, mentioned above, We recommend opening an issue before a pull request if there isn’t already an issue for the problem you’d like to solve. This helps facilitate discussions and tracking progress.

Draft/Work-in-progress(WIP) PRs

If you're ready to get some quick initial feedback from the Jaseci team, you can create a draft pull request. You can prefix the PR title with [WIP] to indicate this is still work in progres.

Unit Tests

To test out your new functionalities you can run unit tests . Here are the steps to create and run your unit test.

  1. write your test for the new functionality in jaseci_core/jaseci/actions/standard/tests . Below is an example on how to write a test for a standard action .
# importing the ability to peform the test.
from jaseci.utils.test_core import CoreTest


class FileLibTest(CoreTest):

    fixture_src = __file__
    # fucntion that will test new feature 
    def test_json_dump(self):
        #makes the api call to run the jac code which uses the new functionality .
        ret = self.call(
            self.mast,
            ["sentinel_register", {"code": self.load_jac("file_stuff.jac")}], # file_stuff.jac , the jac code being ran 
        )
        ret = self.call(self.mast, ["walker_run", {"name": "pack_it"}])
        #running walker that was loaded from the JAC code
        self.assertEqual(ret["report"][0], {"hello": 5})

  1. In this Folder , jaseci_core/jaseci/actions/standard/tests/fixtures you will create a JAC file . This JAC file will contain the walker that will run to utilize the new functionality you implemented. It can follow the pattern below .
walker test_new_functionality {
    report new.functionality();
}
  1. Before we can run our test we have to install redis . If you don't have redis installed , follow the intructions here .

Once installed run :

> redis-server

  1. Now that your test and redis server running is written you can run it by using one of the following command:
> cd jaseci_core
> python3 -m unittest discover jaseci/ --failfast -p "test_$1*.py" && flake8 --exclude=settings.py,*migrations*,jac_parse,ci_app,kube.py --max-line-length=88 --extend-ignore=E203
pytest [python_file] 
python3 -m unittest [python_file]

Now all the test will be run including the one you have written.

Code style & Linting

To standardize coding style, Jaseci code is enforeced by the flake8 linter and a set of linting rules. Please run the linting command to check your code style before creating a PR.

flake8 --exclude=settings.py,*migrations*,jac_parse --max-line-length=88 --extend-ignore = E203,

How to Be a Jaseci-tributor

There are two classes of contributors to Jaseci, casual contributors and committed contributors. This section defines each group and outlines the general guidelines for both.

The Casual Jaseci-tributor

A casual contributor is one that is passionate about or otherwise interested in Jaseci or Jac and would like to improve the Jaseci ecosystem in some way. They make their contribution of time, energy, and most importantly, their brain juice for the betterment of Jaseci hackers everywhere. We welcome you! And thanks in advance!

General Guidelines

  • Create and Issue, spark a discussion, or simply lob a PR our way with description/rational and we'll merge and/or improve upon then merge anything that makes sense for a better Jaseci.
  • We like to have a test and some documentation (even if brief) for any changes for which they may be relevant, certainly not a deal breaker though. If your contribution is important, another casual or committed contributor can help.
  • Thats pretty much it, we like to make friends in the Jaseci community so we hope to hear from you no matter what the idea/contribution is. Everything is more fun with friends!

The Committed Jaseci-tributor

A committed contributor is one that is on the TOTC (team of the committed). A TOTC member is one that in some way sustains themself by being a contributor to Jaseci. This comes about in a number of ways, whether it be funded from donations, funding from government grants, being dedicated to the TOTC by some employer, or a volunteer that would like to dedicate themselves to TOTC roadmap items (i.e., be relied upon by other TOTC contributors).

Note

The TOTC is an open group. We have and accept members from all across the world from many walks of life that can take the TOTC oath which we fit in to 1 sentence. Oath: I will strive my very best to be a good human, be reliable, share my ideas without reservation, be patient when they aren't accepted right away, follow the guidelines, suggest improvements instead of get resentful, and take breaks when I'm not having fun.

Guidelines

  1. Come to the meetings or share in relevant Slack channel a note before, or we will worry insensently that something is terribly wrong
  2. For a given thing (e.g., task, todo, doc, note, comment, delegation, request for help, etc), it exists iff it is present in our system of Objective and Centralized Truth (OCT for short, aka Click-Up atm, links to other systems like github, and google docs are ok).
  3. For the days committed to work, if nothing is checked off, leave a little comment on how things went iff nothing gets checked off that day.
  4. And most importantly!! Follow Below!
from work import fun, you, open_prs

if you.having_fun() in [False, None] and you.engaged() in [False, None]:
    you.take_break() if you.need_break() else you.work_on(
        open_prs.get(fun_factor=fun.HIGH)
    ) if open_prs.find(fun_factor=fun.HIGH) is not None else open_prs.create(
        fun.ANYTHING, start_hack=True
    )
else:
    you.hack_on(leet_code=True)

Contributing to the Jaseci Open Source Project


How to start contributing

Welcome to Jaseci! To start contributing, we would like you to start with issues.

Working on a new feature or fixing a bug you found

If you would like to add a new feature or fix a bug you have found, we prefer that you open a new issue in the Github repo before creating a pull request.

It’s important to note that when opening an issue, you should first do a quick search of existing issues to make sure your suggestion hasn’t already been added as an issue. If your issue doesn’t already exist, and you’re ready to create a new one, make sure to state what you would like to implement, improve or bugfix.

Work on an existing issue

If you want to contribute code, but don't know what to work on, check out the existing list of issues

Certain issues are marked with the "good first issue" label. These are issues that we think are great for first time contributor to work on while they are still getting familarized with the Jaseci codebase.

To work on an existing issue, go to the issue in Github, add a comment stating you would like to work on it and include any solutions you may already have in mind. Assign the issue to yourself.

The Jaseci team will then work with you on the issue and the downstream pull request to guide you through merging your code into the Jaseci codebase.


How to contribute code

Code contribution will be in the form of Pull Request (PR) on Github.

What is a Pull Request (PR)?

This is how the GitHub team defines a PR:

“Pull requests let you tell others about changes you’ve pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.”

This process is used by both Jaseci team members and Jaseci contributors to make changes and improvements.

How to open a PR and contribute code to Jaseci Open Source

1. Forking the Jaseci Repository

Head to Jaseci repository and click ‘Fork’. Forking a repository creates you a copy of the project which you can edit and use to propose changes to the original project.

Once you fork it, a copy of the Jaseci repository will appear inside your GitHub repository list, under your username.

2. Cloning the Forked Repository Locally

To make changes to your copy of the Jaseci repository, clone the repository on your local machine. To do that, run the following command in your terminal:

git clone https://github.com/your_github_username/jaseci.git

Note: this assumes you have git installed on your local machine. If not, check out the following guide to learn how to install it.

3. Update your Forked Repository

Before you make any changes to your cloned repository, make sure you have the latest version of the original Jaseci repository. To do that, run the following commands in your terminal:

cd jaseci
git remote add upstream https://github.com/Jaseci-Labs/jaseci.git
git pull upstream main

This will update the local copy of the Jaseci repository to the latest version.

4. Implement your code contribution on a feature branch

We recommend you to add your code contribution to a new branch (different from main). Then you can continuously run the previous step to always keep the main branch in your forked repo up-to-date with the original repo. This way you have the flexibility to easily inspect your changes and resolve any potential merge conflicts all within the forked repo.

git checkout -b name-of-your-new-branch

5. Push changes to your forked repository on GitHub

Once you are happy with the changes you made in the local files, push them to the forked repository on GitHub. To do that, run the following commands:

git add .
git commit -m ‘fixed a bug’
git push origin name-of-your-new-branch

This will create a new branch on your forked Jaseci repository, and now you’re ready to create a Pull Request with your proposed changes!

6. Opening the Pull Request on Jaseci Open Source

Head to the forked repository and click on a Compare & pull request button.

This will open a window where you can choose the repository and branch you would like to propose your changes to, as well as specific details of your contribution. In the top panel menu choose the following details:

  • Base repository: Jaseci-Labs/jaseci
  • Base branch: main
  • Head repository: your-github-username/jaseci
  • Head branch: name-of-your-new-branch

Next, make sure to update the pull request card with as many details about your contribution as possible. Proposed changes section should contain the details of what has been fixed/implemented, and Status should reflect the status of your contributions. Any reasonable change (not like a typo) should include a changelog entry, a bug fix should have a test, a new feature should have documentation, etc.

Once you are happy with everything, click the Create pull request button. This will create a Pull Request with your proposed changes.

If you are ready to get feedback on your contribution from the Jaseci team, leave a comment on the PR.

8. Merging your PR and the final steps of your contribution

A member from the Jaseci team will review your PR and might ask you to make additional changes and update. To update your PR, head back to the local copy of your repo, implement the changes requested and repeat the same steps above. Your PR will automatically be updated with your latest changes. Once you've implemented all of the suggested changes, tag the person who first reviewed your PR in a comment of the PR to ask them to review again.

Finally, if your contribution is accepted, one of the Jaseci team member will merge it to the codebase!

Things to know about creating a PR

Opening issues before PRs

Like, mentioned above, We recommend opening an issue before a pull request if there isn’t already an issue for the problem you’d like to solve. This helps facilitate discussions and tracking progress.

Draft/Work-in-progress(WIP) PRs

If you're ready to get some quick initial feedback from the Jaseci team, you can create a draft pull request. You can prefix the PR title with [WIP] to indicate this is still work in progres.

Validate your changes through test

Jaseci has a set of automated tests and PRs are required to pass these tests for them to be merge into the main branch. So we recommend you to validate your changes via these tests before creating a PR. Checkout scripts/script_sync_code_kube_test to see how to run the tests.

Code style & Linting

To standardize coding style, Jaseci code is enforeced by the flake8 linter and a set of linting rules. Please run the linting command to check your code style before creating a PR.

flake8 --exclude=settings.py,*migrations*,jac_parse --max-line-length=88 --extend-ignore = E203,

How to Update the Official Documentation

The source of the Jaseci Official Documentation comes from the collection of README.md files placed in specific folders throughout the codebase. Developers and Maintainers must ensure that their contributions are properly documented according to the following procedures outlined in this section.

Adding a new module or library

Ensure that you follow the prevailing directory sturcture convention when adding a new module or library to Jaseci.

  • All source files belonging to your module or library must be contained within a folder bearing the non-whitespace, lowercase name of your module or library.
  • You must author a README.md document to describe the purpose of your module or library, any features, configurations or uses as well as code excerpts on how to implement your module's functionaliy.
  • The README.md must be included in the root folder of the module or library.
  • Ensure you update the related README.md in the subsection (if applicable) which contains your module, e.g. jaseci_ai_kit/README.md as well as the main README.md in the root directory of the codebase to include references to your new module or library.

Adding a new code lab example

All codelabs are organized within the /support/codelabs folder. You may add new codelabs to this folder by following the prescribed guidelines below:

  • Ensure that your codelab is organized within its own named folder. Ensure you use all lowercase, non-whitespace names.
  • Ensure your new codelab has its own README.md file placed in its root folder. This should be the main page of the documented codelab.
  • If any images are used, ensure they are stored in the [your_code_lab]/assets folder and referenced using relative paths.
  • Once the codelab is added, ensure that you update the main README.md in the root directory of the codebase to include references to your new codelab under the section "Samples and Tutorials".

Adding a new guide

All informational content which do not directly refer to modules / libraries or codelabs are typically stored under the /support/guide folder.

  • Ensure that your new guide is contained within its own named folder. Ensure you use all lowercase, non-whitespace names.
  • If any images are used, ensure they are stored in the [your_guide]/assets folder and referenced using relative paths.
  • The markdown pages of your guide must be named based on the title of the rendered page in lowercase, non-whitepsace characters, e.g. this_is_my_guide.md.
  • Once the guide is added, ensure that you update the main README.md in the root directory of the codebase to include references to your new guide under the applicable section.

Contributors

Here is a list of the contributors who have helped create and improve Jaseci. Big shout-out to them!

If you feel you're missing from this list, feel free to add yourself in a PR.

Jaseci Change / Release Notes

Version 1.3.5

Updates

  • New Major Feature: jsctl can now run scripts of commands from file
  • New Feature: Introduction of std.round for rounding floating point valuse, and std.uniform for random float values
  • Improvement: jac test and sentinel test apis have new paramerter profiling to enable internal stack profile outputs
  • Improvement: jac test and sentinel test apis have new paramerter single to specify a single named test to run
  • New Feature: Tests now can be named (see docs)
  • New Feature: Added ability to flush report with standard library through std.clear_report()
  • Major Improvement: Multipass compilation framework implemented, new optimization pass introduced, code size down by more than 2x
  • New Lang Feature: introduced type structs with type::custom_data style notations
  • New Feature: Introduced graph node view API
  • Major Improvement: Walkers are now proper architypes in stack, all code in architypes
  • Improvement: Attributes like anchored and private are not fuzed with anchored objects.
  • Improvement: Incompatible/outdated IR now rejected by Jaseci stack

Notes

  • Imports updated, imports of style import {walker*} with... is now import {walker::*} with...
  • Walker register and set deprecated. Now architype register and set should be used.
  • Deprecated spawn graphs using specail dot syntax (overcomplicates language grammar)

Version 1.3.4

Updates

  • Improvement: Indicator for being logged in in JSCTL
  • New Feature: Email action set
  • Major Feature: Async walkers
  • Deprecation: Stripe API soft removed.
  • Improvement: Improved the deref operator * to be more nimble
  • New Feature: Can now pluck lists of values from collections of nodes and edges.
  • Major Language Feature: Introducing the yeild feature. See bible for details
  • Improvement/Bug: Here behavior is now specified for ability calls and inheritance in intuitive way
  • Major Feature: Can now specify various forms of breadth first and depth first search on take commands (e.g., take:bfs, take:dfs, and take:b and take:d for short)
  • Improvement: Added deep copy for lists and dictionaries
  • Improvement: The connect operator between 2 nodes now returns the left-hand side. (e.g., n1 --> n2 --> n3 will create an intuitive chain of connections not n1 --> n3 <-- n2)
  • Bug Fix: Root nodes now return valid .type
  • Bug Fix: With exit within walker now executes after exit events in nodes

Notes

  • Behavior change for jac programs utilizing chained connection operators. Connection orders are now intuitive (e.g., n1 --> n2 --> n3 will create an intuitive chain of connections not n1 --> n3 <-- n2)
  • API interface update: sentinel_register auto_run_ctx replaces ctx to be more specific, auto_gen_graph is now auto_create_graph for same reason as well
  • API interface update: master_create API return format updated

Version 1.3.3

Updates

  • Improvement: Added reversed to set of list builtin functions
  • Bug Fix: Mem leak on graph node setting fixed
  • Major Feature: Jsctl graph walking tooling
  • Improvement: Optimized the self generation of jaseci internal APIs
  • New Feature: Added jaseci standard library as patch through to all jaseci core APIs
  • New Features: Report payloads can be customized with report:custom
  • Improvement: Disengage can now do disengage with report action
  • Improvement: import now works recursively through chain of files
  • Improvement: JSCTL shows token on login
  • Major Feature: JSCTL has persistent log in sessions, and can logout
  • Improvement: * and & precedence hierarch locations improved.
  • Bug Fix: Indirect node field updates tag elements to be written to db
  • Major Feature: Multiple inheritance support on nodes and edges!
  • Improvement: Fixed and much improved actions load local functionality
  • Bug Fix: Globals imports of imports working
  • Improvement: Sentinel registering improved to include ir mode
  • Improvement: edge semantics improved
  • Major bug fix: Re registering new code was breaking architype abilities
  • Improvement: Tests now only show stdout and stderr on a test by test basis in detailed mode (Much cleaner)
  • Improvement: JSKit package architecture established, normalized, and standardized
  • New Lang Feature: Added list built in call of .l::max, .l::min, .l::idx_of_max, and .l::idx_of_min
  • Improvement: Api so super masters can become any master id, also jsctl can issue master allusers
  • New Lang Feature: Can now have can statements in spawn graphs after has anchor rootname
  • Improvement: actions load module added as capability where module strings are accepted
  • New Feature: Added global root finder net.root to std lib and net.min to go with existing net.max
  • New Feature: New global element type and global keyword

Notes

  • Special report actions now use : instead of . eg report.status = 200 is now report:status = 200

Version 1.3.2

Updates

  • New Feature: Introduction of new standard library option for loading actions in Jac with std.actload_local and std.actload_remote
  • Improvement: Disallowing spawning of unlinked edges, i.e., spawn --> node::generic not allowed without here
  • New Feature: Random library adds random text generation lorem style with rand.word(), rand.sentence(), rand.paragraph(), and rand.text().
  • New Feature: Standard input std.input(prompt) :-p
  • Improvement: Status codes auto plucked from return payload in jsserv
  • New Feature: Can now control status codes with report.status = 201 style statements
  • Improvement: No longer saves action data into graph and keeps it in architypes
  • New Feature: Walkers can be called directly using wapi/{walkername} api
  • New Feature: New master_allusers API available for super master users
  • Improvement: Superusers now have access to all data
  • Improvement: Jaseci's admin api route changed to /js_admin/... vs /admin/ to not conflict with Django's internals
  • Update: Django 3 upgraded to latest as well as all other dependencies.
  • Fix: Believe it or not, I never fully implemented continue. LIKE REALLY??? Anyway, fixed now. FTLOG!
  • New Feature: Added jac dot cli command much like jac run but prints dot graph
  • New Feature: Created shorthand for string, list, and dict functions i.e., .s::, .d::, and .l:: respectively
  • New Feature: Added suite of dict manipulation functions
  • New Feature: Added suite of list manipulation functions

Notes

  • All api calls to Jaseci admin apis using the /admin/ route must be updated to /js_admin/

Version 1.3.1

Updates

  • New Feature: File I/O Library in with json support
  • New Lang Feature: .str::load_json added to string library
  • Fix: Error output when key not in object or dict
  • New Lang Feature: Can now spawn root nodes in addition to generic nodes
  • Improvement: Line numbers provided for all "Internal Errors"
  • Fix: Dot strings now handled as expected (stripping quotes etc)
  • Improvement: General improvements to error reporting
  • Improvement: Changed meta requirement for actions to be option at hook points
  • Improvement: Now you can arbitrarily chain array indexs into function calls as per std.get_report()[0].
  • New Feature: std.get_report gives the what is to be reported so far
  • Improvement: General polish items of JSCTL UI
  • Improvement: Raised the default core logging reporting level to warning

Version 1.3

Updates

  • Improvement: JSCTL now takes args without flags in sensible places for quality of life.
  • Improvement: Better Error reporting all around
  • New Feature: APIs for manipulating actions
  • New Feature: Hotloading jaseci action modules
  • Update: New action creation methodology and architecture
  • New Feature: Decorator interface for creating jaseci action modules
  • New Feature: New profiling flag added to run walker api for performance profiling
  • New Feature: Direct jac file building, test, and run from in JSCTL
  • New Language Feature: Tests and testing features as first order language semantics
  • New Lang Feature: Asserts!
  • Fix: Simplified and optimized global abilities
  • New Support Feature: Started vs code for JAC extension first beta
  • New Lang Feature: Multifile codebase support and import keyword and semantic added
  • New Lang Feature: Try-else blocks introduced for exception handling
  • New Lang Feature: Added new & reference and * dereference semantic for getting psuedo-pointers to node, edges, etc
  • New Lang Feature: Massively expanded functionality with destroy and list slice management
  • New Lang Feature: can now explicitly reference and dereference graph elements (nodes, edges, etc)
  • New Lang Feature: Field filtering for dictionaries, particularly useful for context, info, details
  • New Lang Feature: Type checking primitives, and type casting primitives
  • New Lang Feature: String library finally present

Notes

  • Various flags are now args for jsctl i.e., walker run -name init is now walker run init as name is now the standard arg. If you wanted to specify a node the flag would be used as per walker run init -nd {uuid}
  • Reports back from walker is now dictionary of form {'report': list(report)} instead of currnet list(report)
  • std.sort_by_col tweaked to make last paramter a boolean for reverse (instead of string)
  • Format of walker get -mode key api changed from {key:namespace} to {namespace:key}
  • test is now a keyword with added test capabilities in jaseci
  • Type, int, float, str, list, dict, bool, are now keywords, if you used these as variable names in legacy code, must make updates.
  • The destroy built-in is totally revised lst.destroy(idx) on lists should be changed to destroy lst[idx].
  • Get_uuid standard library function is deprecated since we have string manipulation
  • Internal representation of element now jac:uuid: format, should not be visible to coder, & references still produce urn:uuid: as strings. To dereference use new * dereference operators.
  • Standard, output and logging now will print proper values (e.g. json values for null, true, and false)

Version 1.2.2

Updates

  • New Language Feature: can now perform assignments arbitrarily (not just to named variables)
  • New Language Feature: can spawn assign on creation of nodes and edges
  • New Language Feature: can filter references to nodes and edges
  • Added new built-ins for nodes and edges (context, info, and details)
  • Fixed dot output
  • Added reset command to jsctl to clear complete state
  • Various language grammar tweaks

Version 1.2.1

Updates

  • Both jaseci and jaseci_serv are architected to be standalone packages
  • Stripe API integrated
  • EMails can be templated with HTML content
  • Token expiry time can be set as config through live api
  • Added auto sync to global sentinel for spawned walkers
  • FIX: Global sentinels cascade to all users on change
  • FIX: Multi pod concurrency issue corrected

Version 1.2.0

Updates

  • New Hierarchal user creation and management through core Jaseci
  • New version labels for Jac programs
  • New custom action for nodes and edges
  • New Jaseci server support for new API and Jaseci architecture
  • New namespaces for public walker permissions management with key access
  • New object sharing across users and access control APIs
  • New Jaseci object permissions architecture
  • New Jac library for outbound requests
  • New Globals Jac standard library and API interfaces
  • New support for server-side Jac deployments and relevant APIs
  • New Jac language updates
  • New access language features for edge manipulation and traversal
  • New code IR format and handling across Architypes and Walkers
  • New dot integration redesign
  • New added editor to JSCTL
  • New complete API redesign and deprecation of legacy APIs
  • New introduced new standard Jaseci Bible (unfinished)
  • New redesigned graphs nodes and edges to support multi-graph semantic.

jaseci

Concepts

Action

  • Actions are computational processing elements. [0]
  • Actions take in a list of context items as input and outputs one or many context items. [0]
  • All possible actions are provided by Jaseci. [0]
  • Actions attached to nodes, edges, etc will search walker, itself, then higher dimentional nodes for input contexts in that order,

Context

  • Context is a list of one or more key-value pairs. [1]

Nodes and HD Graph Domains and Planes

  • Nodes are composed of context and executable actions. [1]
  • Nodes accumulate context via a push function, context can be read as well. [1]
  • Nodes execute a set of actions upon: [1]
    1. Entry [1]
    2. Exit [1]
  • Activity actions in a node denote actions that a walker can call at any time
  • Nodes must be aware of the set of HDGDs of which it is a member. [1]
  • Nodes must trigger processing of HDGDs of which it is a member upon events. [0]
    • Seperate actions at the HDGDs above can occur on entry and exit events [0]
  • At the first dimension, these are essentially subsets of nodes for which there are common traits. Traits can include context both static and dynamic (built over time) or actions to be executed upon []
    1. entry from some edge into the domain from another domain at the given dimensionality,
    2. edge traversals within a domain, or
    3. exits out of the domain to antoher domain at the domains given dimensionality.
  • At higher dimensions, these HDGDs are comprised of HDGDs of the next dimension down and function similarly with the element case but instead of nodes they comprise of HDGDs. (HDGD_2 is a graph comprised of HDGD_1 graphs)
  • Transitions at a given dimension can only occur between domains of that dimension or to non-domain nodes.
  • HDGDs can overlap at a given dimension.
  • Each domain at any dimensionality has a root node.
  • Transitions at the node level trigger computation in parallel at all dimensions of HDGDs for which that node is a member of.
  • Optional:
    • Upon creation of an edge between nodes that is first to span HDGD bounderies at any dimensionality, an edge is created implicity betwen HDGDs at that dimensionality (these edges function exactly like edges between nodes). At this point the edges can carry context and actions.
    • Upon deletion of an edge between nodes that is the last to span HDGD bounderies at any dimensionality, the edge that was created connecting those HDGDs is deleted.
    • Nodes have an anchor context value that is used to represent the 'value' of the node, for example when deciding which outbound node a walker should take based on some evaluation. Each node can select only one element from context to be it's anchor

Edges

  • Edges are composed of context and executable actions [1]
  • Edges accumulate context via a push function, context can be read as well [1]
  • Edges execute a set of actions when traversed. [0]
  • Edges much be aware of the set of HDGDs of which it is a member. [0]
  • Edges crossing HDGD boundaries must trigger higher order HDGD plane edges. [0]

Walkers

  • Walkers walk nodes triggering execution at the node level and HDGD levels.
  • Walkers can pick up context as they traverse.
  • Walkers also decide which node to travel through next and records the path of travel (trail) to be recorded within it's own context.
  • Walkers can be spawned at any node.
  • Walkers can spawn other walkers.
  • Computation happens at the Node and HDGD levels. However walkers make decisions on where to walk next and which context to pick up.
  • Walker context must be preloaded by sentinel or applications as a form of input and configuration for the walk
  • Walkers can carry with them their own actions and contexts that is executed whenever specified in the walkers own code

Sentinels

  • Sentinels watch walkers, aggregate outcomes of walking, and enact policies.
  • Each walker must have a sentinel and the division of labor is
    1. Walkers are concerned primarily for walking the graph.
    2. Sentinels will take the results of one or more walkers to perform some higher order objective (i.e., resolution)
  • Keeps context that can be used to save derivitave application behavior.
  • Sentials harbor walkers, architype and jac programs that encode walkers and architypes
  • Sentials can 'register' Jac programs which is a sort of compile that will generate architypes and walkers
  • If the program has syntax errors registration fails, once registered walkers fail at runtime

Architypes

  • Registers templatized version of instances of any Jaseci abstractions or collections of instances (e.g., subgraphs, etc)

Graph

  • A graph is a root node (inhereted from node) and manages sentinels, and higher dimentional nodes

Masters

  • This is the center of management of a jaseci instance that orchestrates the manipulation of graphs

Jac Language

  • Jac is a language for expressing computations using Jaseci concepts.
  • Jac is is a means to express the utilization of Jaseci as a machine.
  • Walkers can spawn other workers and use the with keyword to specify input context
  • Jac encodes the description of architype nodes and edges (with binded contexts and actions)
  • Jac describes how walker and sentinel execution should be performed

Jac Grammar

program     : element*

element     : architype
            : walker

architype   : KW:NODE (COLON INT)? ID LBRACE attr_stmts RBRACE
            : KW:EDGE ID LBRACE attr_stmts RBRACE

walker      : KW:WALKER ID code_block

statements  : statement*

statement   : architype
            : walker
            : code_block
            : node_code
            : expression SEMI
            : if_stmt
            : for_stmt
            : while_stmt

code_block  : LBRACE statements RBRACE
            : COLON statement SEMI

node_code   : dotted_name code_block

expression  : dotted_name EQ expression
            : compare (KW:AND|KW:OR compare)*

if_stmt     : KW:IF expr code_block (elif_stmt)* (else_stmt)*

for_stmt    : KW:FOR expression KW:TO experssion KY:BY expression code_block

while_stmt  : KW:WHILE expression code_block

attr_stmts  : attr_stmt*

dotted_name : ID (DOT ID)*

compare     : NOT compare
            : arithmetic ((EE|LT|GT|LTE|GTE) arithmetic)*

attr_stmt   : KW:HAS ID (, ID)* SEMI
            : KW:CAN dotted_name (, dotted_name)* SEMI
            : arch_set SEMI

arch_set    : KW:NAME EQ expression
            : KW:KIND EQ expression

arithmetic  : term ((PLUS|MINUS) term)*

term        : factor ((MUL|DIV) factor)*

factor      : (PLUS|MINUS) factor
            : power

power       : func_call (POW factor)*

func_call   : atom (LPAREN (expression (COMMA expression)*)? RPAREN)?

atom        : INT|FLOAT|STRING
            : dotted_name (LSQUARE expression RSQUARE)*
            : LPAREN expr RPAREN
            : list

Semenatic Notes

  • Language essential expresses
    1. the architypes of the graph to be constructed
    2. the walkers of the graph and how they behave on the graph
  • Execution means registering these definitions to the machine and execution occurs via api calls to launch walkers
  • Initially, reports will come back via API once a walker has completed it's walk, these reports will be a json payload of objects
  • When architypes or walkers are defined within walkers/architypes, the internal name inherits outer name in the form a.b.c on the creation of walker c nested in b nested in a
  • Cannot make arbitrary assignments to dotted names
  • with entry, exit, activity ignored if is has a can statement in walker or edge
  • Graphs root nodes hold global context and actions that walkers can access by referring to root as a built in node variable. Scopes inherit actions as part of the live variable call chain

Notes

  • Walkers can encode a language like basic WalkLang
  • Each jaseci object in the model is the same with the object instance pickled inside
  • Can create a seperate model at the code level for each of the jaseci object types
  • Create a separate django app per api with naming jsci_*_api, one per object
  • Tables of all jsci objects are kept in mem as part of the user's object and persisted in models
  • Create set of loader and saver functions in separate file in element, frist try to load from mem then from DB if needed
  • Serializers can live in this file if not in models file
  • Nodes need an anchor value for take / takegen type statements in language, take will evaluate expression then travel to connected node with anchor value that matches output
  • all reports are concatanated and reported at end of walk
  • I need to have context (items) reference owners so i can delete from all owners upon destruction of item

Important todos

  • Allow assignments to array elements

  • Add dereferencer to get address of node/edges (get uuid with & instead of .id)

  • On JAC compile create /jac/run API for walkers so you can use urls to call walkers

  • CONTEXTS ARE DICTS (not elements)

  • NODES REPORT OUTBOUDN AND INBOUND AS WELL AS EDGE IDS

  • Auto gen get. adn set. for all architypes

  • Hack LL Jac code to report what I need in client

  • PLAN - In LL check outbound is workette

  • Make owner_id in item type an id_list

  • Have ID list class auto delete items from store when owner id's go to zero

  • ID list class should add and remove owner ids as it is used to record membership of items to other jaseci objects

  • If we enforce the rule that all object membership is done through id_lists then this will stay conherent/consistent

  • have owner id's keep track of list name it belongs to so the standard mem_hook destroy function can delete from list upon destruction

  • Revise teh way walkers and sentinels are saved to persistent store

  • Add type check in action protocol for recieving parameters (parameter format check)

  • Nix id_list and just do translations of object to ids in the json encoding of antyhign of type element (isinstance(obj, element) added to UUIDEncoder)

  • Make json blob auto translate ids even when in variable that is not id_list

  • Create separate logger out for stand output from jac library

  • Create wrapper for each type in jac and infrastructure for checks on those types and opperations on those types, already done-ish with "jac_set" but needs to apply to lists etc

  • If spawns support both jac_sets adn nodes at the moement then we may need infrastructure features around that. Features many need more scalable code architecture

  • Need to support here.id root.id my.id etc

Trickier bugs

  • memory db stores, if db is changed then memory become insonsistent, problematic since memory is assumed to be up to date adn write through: jsci_engine_test.py
  • Wierd issue where on a load from db, new id is randomly generated in object, then new object is stored back to db: orm_hook.py

Example Use Cases

LifeLogify

  • Each user has one life node in their model thats a pointer in jaseci node for entire lifelogify account
  • Walkers start from this node
  • Graph Structure
    1. Each life is a basic HGDG
      1. Children of root are years
      2. Childern of years are months
      3. Childern of months are weeks
      4. Childern of weeks are days
      5. Children of days are workettes
      6. Workettes link back to prior carried over workettes
    2. HDGD maintians last day touched as context
    3. Nodes are only created when days are touched
  • Walkers
    1. Walkers used to find days, and load workettes
    2. Walkers used for carry forward functionality

gata put semis in

Toy script

node life {
  can infer.year_from_date
}

node year {
  has year
  can infer.month_from_date
}

node month {
  has month
  can infer.week_from_date
}

node week {
  has week
  can infer.day_from_date
}

node day {
  has day
}

node workette {
  has name
  has date
  has owner
  has status
  has snooze_till
  has note
  has is_MIT
  has is_ritual
}

walker get_day {
  has date
  life: take infer.year_from_date(date)
  year: take infer.month_from_date(date)
  month: take infer.week_from_date(date)
  week: take infer.day_from_date(date)
  day: report day.all_outbound
}

walker get_gen_day {
  has date
  life: takegen infer.year_from_date(date)
  year: takegen infer.month_from_date(date)
  month: takegen infer.week_from_date(date)
  week: takegen infer.day_from_date(date)
  day: report day.outbound_nodes
}

walker get_sub_workettes {
  workette: report day.outbound_nodes
}

walker get_latest_day {
  life: take year.max_outbound
  year: take month.max_outbound
  month: take week.max_outbound
  week: report day.max_outbound
}

walker carry_forward {
  has my_root
  day {
    spawn node.day new_day
    my_root = new_day
    take day.outbound_nodes
  }
  workette {
    if(workette.status == 'done' or 'eliminated') { continue }
    spawn node.workette new_workette
    new_workette.connectfrom workette
    new_workette.connectfrom my.spawn.node.last(-1)
    takeall workette.outbound_nodes with {
      is not new_workette
      is untraveled
    }
  }
  report spawn.all
  report my_root
}