Enable Google Cloud Text-to-Speech Service
Google Cloud Text-to-Speech is a text-to-speech conversion service which got launched a few days back by Google Cloud. This was one of the most important service missing from Google Cloud AI portfolio which is now available and completes the loop for text-to-speech and speech-to-text services by Google Cloud. In next few weeks, you will learn about different usages of Google Cloud text-to-speech service with other Google cloud services.
In this post, you will learn about some of the following:
The following are some of the key aspects of setting up the development environment using Eclipse IDE:
Figure 1. Enable Google Cloud Text-to-Speech Service
Figure 2. Google Cloud Service – Create Service Account Key
Figure 3. Google Cloud Text to Speech – Setting Environment Variable
The following are two key steps which needed to be taken to create a sample program/app for demonstrating google cloud text-to-speech services
The following are some of the artifacts which need to be included for working with Google Cloud Text-to-speech APIs
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>24.1-jre</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.threeten/threetenbp -->
<dependency>
<groupId>org.threeten</groupId>
<artifactId>threetenbp</artifactId>
<version>1.3.6</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.http-client/google-http-client -->
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>1.22.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-texttospeech -->
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-texttospeech</artifactId>
<version>0.42.0-beta</version>
</dependency>
Pay attention to some of the following aspects which needed to be done for achieving text-to-speech conversion:
The following is the code representing above steps:
@SpringBootApplication
public class GCloudText2SpeechApplication implements CommandLineRunner {
public static void main(String[] args) {
SpringApplication app = new SpringApplication(GCloudText2SpeechApplication.class);
app.run(args);
}
@Override
public void run(String... arg0) throws Exception {
String text = "Hello World! How are you doing today? This is Google Cloud Text-to-Speech Demo!";
String outputAudioFilePath = "/home/support/Documents/output.mp3";
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create()) {
// Set the text input to be synthesized
SynthesisInput input = SynthesisInput.newBuilder().setText(text).build();
// Build the voice request; languageCode = "en_us"
VoiceSelectionParams voice = VoiceSelectionParams.newBuilder().setLanguageCode("en-US")
.setSsmlGender(SsmlVoiceGender.FEMALE)
.build();
// Select the type of audio file you want returned
AudioConfig audioConfig = AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3) // MP3 audio.
.build();
// Perform the text-to-speech request
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
// Get the audio contents from the response
ByteString audioContents = response.getAudioContent();
// Write the response to the output file.
try (OutputStream out = new FileOutputStream(outputAudioFilePath)) {
out.write(audioContents.toByteArray());
System.out.println("Audio content written to file \"output.mp3\"");
}
}
}
}
In this post, you learned about how to get started with Google Cloud Text-to-Speech Service using Java/Sring Boot app.
Did you find this article useful? Do you have any questions or suggestions about this article? Leave a comment and ask your questions and I shall do my best to address your queries.
If you've built a "Naive" RAG pipeline, you've probably hit a wall. You've indexed your…
If you're starting with large language models, you must have heard of RAG (Retrieval-Augmented Generation).…
If you've spent any time with Python, you've likely heard the term "Pythonic." It refers…
Large language models (LLMs) have fundamentally transformed our digital landscape, powering everything from chatbots and…
As Large Language Models (LLMs) evolve into autonomous agents, understanding agentic workflow design patterns has…
In today's data-driven business landscape, organizations are constantly seeking ways to harness the power of…