Skip to main content

OpenAI (openai)

Module openai provides a set of actions to interact with OpenAI API. You can use the actions to generate text, transcribe audio, translate audio, generate images, and more.

1. Setup

You have to set the OpenAI API key before using the module. You can get the API key from here. You can set the API key using the setup action or by setting the environment variable OPENAI_API_KEY.

Setting the Environment Variable

export OPENAI_API_KEY=<your-api-key>

Then you can load the module as usual as follows.

jaseci> actions load module jac_misc.openai

Using the setup Action

If you have already loaded the module, you can use the setup action to set the API key.

jaseci> actions call openai.setup -ctx '{"api_key":"<your-api-key>"}'

Otherwise you can load the module and set the API key in a single step as follows.

jaseci> action load module jac_misc.openai -ctx '{"api_key":"<your-api-key>"}'

2. Text Generation

2.1. Generate Text (completion)

You can use the completion action to generate text completions based on a prompt using OpenAI's GPT-3 models.

Parameters:

  • prompt: str or list of str, optional (default="") The prompt(s) to generate text completions for.
  • model: str, optional (default="text-davinci-003") The name of the GPT-3 model to use for generating completions.
  • suffix: str, optional (default=None) A suffix to append to the completion(s).
  • max_tokens: int, optional (default=16) The maximum number of tokens to generate for each completion.
  • temperature: float, optional (default=1.0) Controls the randomness of the generated completions. Higher values will result in more varied completions.
  • top_p: float, optional (default=1.0) Controls the diversity of the generated completions. Lower values will result in more conservative completions.
  • n: int, optional (default=1) The number of completions to generate.
  • logprobs: int or None, optional (default=None) Controls whether to include log probabilities with the generated completions. If set to an integer value, the top n log probabilities for each token will be returned.
  • echo: bool, optional (default=False) Controls whether the prompt should be included in the generated completions.
  • stop: str, list of str, or None, optional (default=None) The sequence at which the model should stop generating text. If a list is provided, the model will stop at any of the specified sequences.
  • presence_penalty: float, optional (default=0.0) Controls the model's tendency to generate new words or phrases. Higher values will result in more novel completions.
  • frequency_penalty: float, optional (default=0.0) Controls the model's tendency to repeat words or phrases. Higher values will result in less repetitive completions.
  • best_of: int, optional (default=1) Controls how many completions to return, and selects the best one(s) based on their log probabilities.

Returns:

completions: list of str A list of completions generated by the GPT-3 model based on the provided prompt(s).

Example:

walker complete_sentence {
can openai.completion;
report openai.completion(prompt="Once upon a time", n=5); //return 5 completions
}

2.2. Chat (chat)

You can use the chat action to chat with an AI using OpenAI's Davinci model. In default config, generates responses to a list of messages using OpenAI's GPT-3.5 model.

Parameters:

  • messages : list of str A list of messages to prompt the GPT-3.5 model with.
  • model : str, optional (default='gpt-3.5-turbo') The name of the GPT-3.5 model to use for generating responses.
  • temperature : float, optional (default=1.0) Controls the randomness of the generated responses. Higher values will result in more varied responses.
  • top_p : float, optional (default=1.0) Controls the diversity of the generated responses. Lower values will result in more conservative responses.
  • n : int, optional (default=1) The number of responses to generate for each message.
  • stop : str, list of str, or None, optional (default=None) The sequence at which the model should stop generating text. If a list is provided, the model will stop at any of the specified sequences.
  • presence_penalty : float, optional (default=0.0) Controls the model's tendency to generate new words or phrases. Higher values will result in more novel responses.
  • frequency_penalty : float, optional (default=0.0) Controls the model's tendency to repeat words or phrases. Higher values will result in less repetitive responses.

Returns:

  • responses : list of str A list of responses generated by the GPT-3.5 model based on the provided messages.

Example:

walker chat {
can openai.chat;
report openai.chat(messages=[{"role": "user", "content": "Hello!"}]);
}

3. Image Generation

3.1. Generate Image (generate_image)

You can use the generate_image action to generate images using OpenAI's DALL-E 2 model.

Parameters:

  • prompt : str The textual prompt for generating the image(s).
  • n : int, optional The number of images to generate (default is 1).
  • size : str, optional The size of the image in pixels (default is "512x512").
  • response_format : str, optional The format of the response, either "url" or "json" (default is "url").

Returns:

A list of generated images, either as URLs or Base64-encoded JSON strings, depending on the value of response_format.

Example:

walker generate_image {
can openai.generate_image;
report openai.generate_image(prompt="A painting of a cat", n=2);
}

3.2. Edit Image (edit_image)

You can use the edit_image action to edit images using OpenAI's DALL-E 2 model.

Parameters:

  • prompt : str Prompt describing the editing task.
  • image_file : str, optional Path to the input image file. Default is None.
  • mask_file : str, optional Path to the mask image file. Default is None.
  • image_b64 : str, optional Base64 encoded input image. Default is None.
  • mask_b64 : str, optional Base64 encoded mask image. Default is None.
  • n : int, optional Number of images to generate. Default is 1.
  • size : str, optional Size of the generated image. Default is "512x512".
  • response_format : str, optional Format of the response. "url" or "base64". Default is "url".

Returns:

A list of URLs or base64 encoded images, depending on the value of response_format.

Example:

walker edit_image {
can openai.edit_image;
report openai.edit_image(prompt="A painting of a cat", image_file="cat.jpg", mask_file="cat_mask.jpg", n=2);
//masks will be filled
}

3.3. Image Variation (variations_image)

Generates n variations of an image using the DALL-E 2 Image Variation API.

Parameters:

  • image_file: (Optional) Path to the image file to use. If not provided, image_b64 must be provided instead. Type: str
  • image_b64: (Optional) Base64-encoded image data. If not provided, image_file must be provided instead. Type: str
  • n: (Optional) Number of variations to generate. Default is 1. Type: int
  • size: (Optional) Size of the output images in the format "widthxheight". Default is "512x512". Type: str
  • response_format: (Optional) Format of the response data. Can be "url" or "b64_json". Default is "url". Type: str

Returns:

  • A list of URLs or base64-encoded JSON strings representing the generated images. Type: List[str]

Example:

walker variations_image {
can openai.variations_image;
report openai.variations_image(image_file="cat.jpg", n=2);
}

4. Audio Transcription and Translation

4.1. Transcribe Audio (transcribe)

You can use the transcribe action to transcribe audio using OpenAI's Speech-to-Text model.

Parameters:

  • audio_file: str, optional The path to the audio file to transcribe.
  • audio_url: str, optional The URL of the audio file to transcribe.
  • audio_array: list, optional A list containing the audio waveform as a sequence of floats between -1 and 1.
  • model: str, optional (default="whisper-1") The OpenAI Whisper model to use for transcription.
  • prompt: str, optional A prompt to use in addition to the audio input when transcribing with the OpenAI API.
  • temperature: float, optional (default=0) The temperature to use when generating text for the transcription.
  • language: str, optional The language of the audio being transcribed, specified as a BCP-47 language code.
  • translate: bool, optional (default=False) If True, the transcribed text will be translated to English.

Returns:

  • str: The transcribed text.

Example:

walker whisper_trnascribe {
can openai.transcribe;
report openai.transcribe(audio_file="audio.wav");
}

walker whisper_translate {
can openai.transcribe;
report openai.transcribe(audio_file="audio.wav", translate=True);
}