Making use of speech to text and speech to text modules in Jaseci
Speech to Text
Introduction
The process of converting spoken words into written text is called speech to text. The jaseci stt
module can instantly transcribe the audio into a text script so that it can be shown and used as needed. Jaseci's stt
module currently supports 99 languages.
Functionality of Jaseci Speech to Text Module (stt
)
Before work with the Jaseci stt
module make sure to install fmmeg via sudo apt-get install ffmpeg
.
Then install pip install jac_speech[all]
which will also install the speech-to-text module which we are going to use later in this section.
And load the jaseci stt
module via actions load module jac_speech.stt
.
Transcribe
The specified audio's written transcript will be returned by stt.transcribe
action. The audio can be provided as a mp3 file, a list of amplitude values (array), the URL to the audio(mp3) file.
Example
Note
Here in all the below example's we are trying with the
URL
option. if you wanna try with an mp3 file or with list of amplitudes try giving the input parameters accordingly.
walker try_transribe {
can stt.transcribe;
report stt.transcribe("en", null,null, "https://ia800205.us.archive.org/27/items/fifty_famous_stories_lc_librivox/fiftyfamous_00_baldwin.mp3");
}
Try executing the code and the compare the real audio script and the transcript return by the jaseci stt
module. If the audio is clear and pronunciations are correct this will return 100% correct transcript.
Output:
{
"success": true,
"report": [
"Introduction to 50 Famous Stories Retold. This is a Librevox recording. All Librevox recordings are in the public domain. For more information or to volunteer, please visit Librevox.org. 50 Famous Stories Retold by James Baldwin. Concerning these stories. There are numerous time honored stories which have become so incorporated into the literature and thought of"
],
"final_node": "urn:uuid:a5eece10-d29f-485b-893f-304a7cd63a2b",
"yielded": false
}
Translate
In cases where the input audio is provided in any other supported language, the Jaseci's stt.translate
action will offer the transcript in English.
Example
walker try_translate {
can stt.translate;
report stt.translate("fr", null, null, "https://www.audio-lingua.eu/IMG/mp3/les_sports.mp3");
}
Output
{
"success": true,
"report": [
"Hello, I'm Papal Christian and I'm going to tell you all the things I did. I did two years of calculations. I said, we did a little ... it's the coursetle ... it's the course of the multipurpose. I did after a year of foot. And after that, at that time I was going to do the gymnasticconsists of several degs. The gymnastics that consists of several degrees. And it's there, but I'm going to do it."
],
"final_node": "urn:uuid:7a0fe76c-49c2-45fa-88e2-c788618949d0",
"yielded": false
}
Text to Speech
Introduction
The text to speech or speech synthesizing is converting written text into audio forms.
Functionality of Jaseci Text to Speech Module (vc_tts
)
Right before jump in to code lab of the Jaseci vc_tts
module make sure to install dependencies of sound files via sudo apt-get install -y libsndfile1 espeak-ng
and load the jaseci vc_tts
module via actions load module jac_speech.vc_tts
.
Synthesize
From the provided raw text of input, this action can produce the amplitudes of the speech. You can select your preffred voice as male
or female
. This list of amplitudes can either be used to make audio in your preferred formatĀ , or it can be saved into a local file system as a wav file by providing the path to the preferred location.
Example
walker init{
has text_input = "Hello world! This is a sample audio generated from the Text to speech model";
can vc_tts.synthesize;
has result = vc_tts.synthesize(text = text_input, speaker= "female", path = "./");
}
Note
If you don't wanna save the generated file into the current file system simply ignore giving the path variable.
Output
{
"success": true,
"report": [
{
"save_status": true,
"voice": "female",
"file_path": "./audio_file_1673841425.549086.wav"
}
],
"final_node": "urn:uuid:fc151279-3cd7-4e67-8122-80ec766fc2b2",
"yielded": false
}
If the code is properly executed, the created audio will be saved to the 'jac' program's current file directory. Change the input text and play around the program.
Clone Voice
The voice conversion, often known as voice cloning, facility is supported by the jaseci "vc tts" module. Aka you can enter a sample voice that is at least 2.5 minutes long, and the algorithm will create the audio files by imitating the voices from the reference audio files that you have provided. Please take aware that the audio generated here will simply mimic the voice; the accent will still be American English.
Example
walker clone_voice {
can vc_tts.clone_voice;
has input_text = "
"Hello, I'm Papal Christian and I'm going to tell you all the things I did. I did two years of calculations. I said, we did a little. it's the coursetle. it's the course of the multipurpose. I did after a year of foot. And after that, at that time I was going to do the gymnasticconsists of several degs. The gymnastics that consists of several degrees. And it's there, but I'm going to do it."
report vc_tts.clone_voice(input_text = input_text, reference_audio= "./test_ref_audio.wav", save_path="./");
}
Example
{
"success": true,
"report": [
{
"save_status": true,
"file_path": "./cloned_audio_file_1673841506.9412549.wav"
}
],