Categories: AIAWSJava

Amazon Polly Text-to-speech with AWS S3, Twilio Java App

Amazon Polly can be used with Twilio phone service and AWS S3 to create an automated alert system which does (achieves) some of the following:

  • Convert text to speech (using Amazon Polly)
  • Upload audio (speech stream) created using Polly service on AWS S3 bucket
  • Use Twilio Call service to play the audio to the destined phone number

The following represents the application architecture diagram (communication flow viewpoint) representing communication between  Spring Boot app and Amazon Polly, Amazon S3 and Twilio Service to achieve automated phone alerts based on text-to-speech conversion.

Figure 1. Amazon Polly – S3 – Twilio – Spring Boot – Java

This can be used to create automated alert/notification system around following use cases which makes phone call to concerned personal and inform about the incidents:

  • Security alerts
  • Production downtime
  • Custom alerts related to one or more custom events

In this post, you will learn about creating Spring Boot app for using Twilio and Amazon services (S3 and Polly) for making automated phone call to end users. The following are some of the topics explained in this article:

  • Spring Boot App for invoking Amazon S3, Polly and Twilio Services
  • Custom Class for invoking Amazon Polly Service
  • Custom Class for invoking AWS S3 Storage APIs
  • Custom Class for invoking Twilio Phone Service
  • Application properties file
  • Configuration beans created using application properties files

Spring Boot App for invoking Amazon S3, Polly and Twilio Services

Pay attention to some of the following steps for making phone call to end user:

  • Create speech stream using Amazon Polly service
  • Upload the audio file to AWS S3 storage
  • Invoke Twilio service to make the phone call and play the audio to the end user
import java.io.IOException;
import java.io.InputStream;
import java.util.UUID;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import com.vflux.rbot.storage.CloudStorage;
import com.vflux.rbot.texttospeech.CustomPolly;
import com.vflux.rbot.voice.VoiceService;


@SpringBootApplication
public class RecruiterbotApplication implements CommandLineRunner {

    @Autowired VoiceService twilioVoiceService;
    @Autowired CustomPolly customPolly;
    @Autowired CloudStorage awsCloudStorage;

    public static void main(String[] args) {
        SpringApplication app = new SpringApplication(RecruiterbotApplication.class);
        app.run(args);
    }

    @Override
    public void run(String... arg0) throws IOException {
        String text = "Hello Ajitesh. This is message from Microsoft talent acquisition team. We are happy to inform you that you have been shortlisted for next round of interview. Good Bye!";
        //
        // Create speech stream using Amazon Polly service
        //
        InputStream speechStream = this.customPolly.synthesisSpeechMP3Format(text);
        //
        // Upload the audio file to AWS S3 storage
        //
        String s3Key = UUID.randomUUID().toString() + ".mp3";
        String s3URL = this.awsCloudStorage.uploadAudioStream(s3Key, speechStream);
        //
        // Invoke Twilio service to make the phone call and play the audio to the end user
        //
        this.twilioVoiceService.playVoice("+9198877761234", s3URL);
    }

}

Custom Class for invoking Amazon Polly Service

Pay attention to some of the following:

  • Configurable language code and voice id which will be used for creating audio file of format MP3 (OutputFormat.Mp3)
  • BasicSessionCredentials instance which is injected while constructing the instance of CustomPolly class; This allows for using Amazon STS service related to using temporary security credentials for instantiating Amazon services
  • Retrieve appropriate voice id based on configurable value for language code and voice id. This code works for language code as en-IN. One can set language code and voice id based on details provided in this page, Amazon Polly Voice List
  • Amazon Polly synthesizeSpeech API which is used to create a speech stream
@Component
public class CustomPolly {

    @Autowired
    String pollyLanguageCode;
    @Autowired
    String pollyVoiceId;

    private AmazonPolly amazonPolly;

    public CustomPolly(@Autowired Region awsRegion, @Autowired BasicSessionCredentials sessionCredentials) {
        this.amazonPolly = AmazonPollyClientBuilder.standard()
                .withCredentials(new AWSStaticCredentialsProvider(sessionCredentials)).withRegion(awsRegion.getName()).build();
    }

    public InputStream synthesisSpeechMP3Format(String text) throws IOException {
        return this.synthesize(text, OutputFormat.Mp3);
    }

    public void play(String text) throws IOException, JavaLayerException {
        //
        // Get the audio stream created using the text
        //
        InputStream speechStream = this.synthesize(text, OutputFormat.Mp3);
        //
        // Play the audio
        //
        AudioPlayer.play(speechStream);
    }

    public InputStream synthesize(String text, OutputFormat format) throws IOException {
        //
        // Get the default voice
        //
        Voice voice = this.getVoice();
        //
        // Create speech synthesis request comprising of information such as following:
        // Text
        // Voice
        // The detail will be used to create the speech
        //
        SynthesizeSpeechRequest synthReq = new SynthesizeSpeechRequest().withText(text).withVoiceId(voice.getId())
                .withOutputFormat(format);
        //
        // Create the speech
        //
        SynthesizeSpeechResult synthRes = this.amazonPolly.synthesizeSpeech(synthReq);
        //
        // Returns the audio stream
        //
        return synthRes.getAudioStream();
    }

    public Voice getVoice() {
        //
        // Create describe voices request.
        //
        DescribeVoicesRequest enInVoicesRequest = new DescribeVoicesRequest().withLanguageCode(this.pollyLanguageCode);
        //
        // Synchronously ask Amazon Polly to describe available TTS voices.
        //
        DescribeVoicesResult enInVoicesResult = this.amazonPolly.describeVoices(enInVoicesRequest);
        Iterator<Voice> voiceIter = enInVoicesResult.getVoices().iterator();
        Voice voice = null;
        String pollyVoiceIdLower = this.pollyVoiceId.trim().equals("")?"raveena":this.pollyVoiceId.toLowerCase();
        while(voiceIter.hasNext()) {
            Voice tmpvoice = voiceIter.next();
            if(tmpvoice.getId().toLowerCase().equals(pollyVoiceIdLower)) {
                voice = tmpvoice;
                break;
            }
        }
        if(voice == null) {
            voice = enInVoicesResult.getVoices().get(0);
        }
        return voice;
    }
}

Custom Class for invoking AWS S3 Storage APIs

Pay attention to some of the following:

  • An instance of AccessControlList which is used to associate read permission with the file uploaded on S3
    AccessControlList acl = new AccessControlList();
    acl.grantPermission(GroupGrantee.AllUsers, Permission.Read);
    
  • An instance of ObjectMetaData which is required to be configured with content-type and content-length of the file/content uploaded in the S3 bucket
    ObjectMetadata objectMetaData = new ObjectMetadata();
    byte[] bytes = IOUtils.toByteArray(inputStream);
    objectMetaData.setContentLength(bytes.length);
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
    objectMetaData.setContentType("audio/mpeg");
    
  • An instance of PutObjectRequest which is intantiated with AccessControlList and ObjectMetaData instances.
    PutObjectRequest putObjectRequest = new PutObjectRequest(this.awsS3DataBucket, keyName, byteArrayInputStream, objectMetaData).withAccessControlList(acl);
    
  • API putObject invoked with an instance of PutObjectRequest
@Component("awsCloudStorage")
public class AWSCloudStorage implements CloudStorage {

    public static final String S3_BASE_URL = "http://s3.amazonaws.com/";

    @Autowired
    String awsS3DataBucket;

    private AmazonS3 amazonS3;


    public AWSCloudStorage(@Autowired Region awsRegion, @Autowired BasicSessionCredentials sessionCredentials) {
        this.amazonS3 = AmazonS3ClientBuilder.standard()
                .withCredentials(new AWSStaticCredentialsProvider(sessionCredentials)).build();
    }

    public void uploadFile(String keyName, String filePath) {
        try {
            this.amazonS3.putObject(this.awsS3DataBucket, keyName, filePath);
        } catch (AmazonServiceException e) {
            System.err.println(e.getErrorMessage());
        }
    }

    public String uploadAudioStream(String keyName, InputStream inputStream) throws IOException {
        try {
            //
            // Create Read permission for audio file
            //
            AccessControlList acl = new AccessControlList();
            acl.grantPermission(GroupGrantee.AllUsers, Permission.Read);
            //
            // Create ObjectMetaData for setting content length and content type
            //
            ObjectMetadata objectMetaData = new ObjectMetadata();
            byte[] bytes = IOUtils.toByteArray(inputStream);
            objectMetaData.setContentLength(bytes.length);
            ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
            objectMetaData.setContentType("audio/mpeg");
            //
            // Put the object in the AWS S3
            //
            PutObjectRequest putObjectRequest = new PutObjectRequest(this.awsS3DataBucket, keyName, byteArrayInputStream, objectMetaData).withAccessControlList(acl);
            this.amazonS3.putObject(putObjectRequest);
        } catch (AmazonServiceException e) {
            System.err.println(e.getErrorMessage());
        }
        return getS3URL(keyName);
    }

    private String getS3URL(String key) {
        return S3_BASE_URL + this.awsS3DataBucket + "/" + key;
    }
}

Custom Class for invoking Twilio Phone Service

Pay attention to some of the following:

  • Twimlet URL which is created using voice URL which will be accessed to play voice file to the destined phone number.
  • TwilioRestClient which is used to instantiate Twilio Call service
  • From and To phone numbers which are required for making phone calls.
import java.net.URI;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

import com.twilio.http.TwilioRestClient;
import com.twilio.rest.api.v2010.account.Call;
import com.twilio.type.PhoneNumber;

@Component
public class TwilioVoiceService implements VoiceService {

    private static final String APIVERSION = "2010-04-01";
    private static final String TWIMLET_BASE_URL = "http://twimlets.com/message?Message%5B0%5D=";

    private TwilioRestClient twilioRestClient;
    private String fromPhoneNumber;

    public TwilioVoiceService(@Autowired TwilioRestClient twilioRestClient, @Autowired String fromPhoneNumber) {
        this.twilioRestClient = twilioRestClient;
        this.fromPhoneNumber = fromPhoneNumber;
    }

    @Override
    public void playVoice(String toNumber, String voicePath) {
        String url = TWIMLET_BASE_URL + voicePath;

        PhoneNumber to = new PhoneNumber(toNumber); // Replace with your phone number
        PhoneNumber from = new PhoneNumber(this.fromPhoneNumber); // Replace with a Twilio number
        URI uri = URI.create(url);

        // Make the call
        Call call = Call.creator(to, from, uri).create(this.twilioRestClient);
        System.out.println(call.getSid());
    }

}

Application properties file

Pay attention to some of the following in the code given below:

  • AWS Access Key properties; This is stored as empty string. The code uses AWS security temporary credentials for accessing AWS services.
  • AWS S3 Bucket
  • Amazon Polly voice language code and voice ID
  • Twilio account SID, auth token values and phone number from which the call would be made. Note that this phone number needs to be purchased from TWILIO web console.
# AWS Properties
#
aws.access.key.id = 
aws.access.key.secret = 
aws.region = us-east-1
aws.s3.data.bucket = remainders-11032018
aws.temporary.credentials.validity.duration = 
#
# Amazon Polly Properties
#
aws.polly.language.code = en-IN
aws.polly.voice.id = Raveena
#
# Twilio properties
#
twilio.account.sid = BD1a1234f87beee26b9876c8e8513e432
twilio.auth.token = zzz12341df987654c7a71b9c1234xz87
twilio.from.phone.number = +13111116667

Configuration Beans created using Application Properties Files

Pay attention to the different set of files used for creating configuration beans for AWS and Twilio services.

AWS Properties Configuration Beans

Pay attention to some of the following:

  • BasicSessionCredentials bean which will be used to create Amazon services using AWS temporary security credentials
  • AWS Security credentials provider which can be created using AWS Security Key ID and security key access.
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.PropertySource;

import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.BasicSessionCredentials;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClient;
import com.amazonaws.services.securitytoken.AWSSecurityTokenServiceClientBuilder;
import com.amazonaws.services.securitytoken.model.Credentials;
import com.amazonaws.services.securitytoken.model.GetSessionTokenRequest;
import com.amazonaws.services.securitytoken.model.GetSessionTokenResult;

@Configuration
@PropertySource("classpath:application.properties")
public class AWSAppConfig {

    private static final Integer TEMPORARY_CREDENTIALS_DURATION_DEFAULT = 7200;

    @Value("${aws.access.key.id}") String awsKeyId;
    @Value("${aws.access.key.secret}") String awsKeySecret;
    @Value("${aws.region}") String awsRegion;
    @Value("${aws.s3.data.bucket}") String awsS3DataBucket; 
    @Value("${aws.temporary.credentials.validity.duration}") String credentialsValidityDuration;  
    @Value("${aws.polly.language.code}") String pollyLanguageCode;
    @Value("${aws.polly.voice.id}") String pollyVoiceId;

    @Bean(name = "awsKeyId") 
    public String getAWSKeyId() {
        return awsKeyId;
    }

    @Bean(name = "awsKeySecret") 
    public String getAWSKeySecret() {
        return awsKeySecret;
    }

    @Bean(name = "awsRegion") 
    public Region getAWSPollyRegion() {
        return Region.getRegion(Regions.fromName(awsRegion));
    }

    @Bean(name = "awsCredentialsProvider") 
    public AWSCredentialsProvider awsCredentialsProvider() {
        BasicAWSCredentials awsCredentials = new BasicAWSCredentials(this.awsKeyId, this.awsKeySecret);
        return new AWSStaticCredentialsProvider(awsCredentials);
    }

    @Bean(name = "sessionCredentials")
    public BasicSessionCredentials sessionCredentials() {
        AWSSecurityTokenServiceClient sts_client = (AWSSecurityTokenServiceClient) AWSSecurityTokenServiceClientBuilder.defaultClient();
        GetSessionTokenRequest session_token_request = new GetSessionTokenRequest();
        if(this.credentialsValidityDuration == null || this.credentialsValidityDuration.trim().equals("")) {
            session_token_request.setDurationSeconds(TEMPORARY_CREDENTIALS_DURATION_DEFAULT);
        } else {
            session_token_request.setDurationSeconds(Integer.parseInt(this.credentialsValidityDuration));
        }

        GetSessionTokenResult session_token_result =
                sts_client.getSessionToken(session_token_request);
        Credentials session_creds = session_token_result.getCredentials();
        BasicSessionCredentials sessionCredentials = new BasicSessionCredentials(
                   session_creds.getAccessKeyId(),
                   session_creds.getSecretAccessKey(),
                   session_creds.getSessionToken());
        return sessionCredentials;
    }

    @Bean(name = "awsS3DataBucket") 
    public String awsS3DataBucket() {
        return awsS3DataBucket;
    }

    @Bean(name = "pollyLanguageCode") 
    public String pollyLanguageCode() {
        return pollyLanguageCode;
    }

    @Bean(name = "pollyVoiceId") 
    public String pollyVoiceId() {
        return pollyVoiceId;
    }
}

Twilio Properties Configuration Beans

Pay attention to creation of the Bean of type, TwilioRestClient which can be used for instantiating Twilio services.

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.PropertySource;

import com.twilio.http.TwilioRestClient;
import com.twilio.http.TwilioRestClient.Builder;


@Configuration
@PropertySource("classpath:application.properties")
public class TwilioAppConfig {

    @Value("${twilio.account.sid}") String accountSID;
    @Value("${twilio.auth.token}") String authToken;
    @Value("${twilio.from.phone.number}") String fromPhoneNumber;

    @Bean(name = "twilioRestClient")
    public TwilioRestClient twilioRestClient() {
        return (new Builder(this.accountSID, this.authToken)).build();
    }

    @Bean(name = "fromPhoneNumber")
    public String fromPhoneNumber() {
        return this.fromPhoneNumber;
    }

    @Bean(name = "twilioAccountSID")
    public String twilioAccountSID() {
        return this.accountSID;
    }

    @Bean(name = "twilioAuthToken")
    public String twilioAuthToken() {
        return this.authToken;
    }
}

Further Reading / References

Summary

In this post, you learned about invoking Twilio API from Spring Boot Java app to play audio to a destined phone number, with audio being created using Amazon Polly service and uploaded on AWS S3 bucket.

Did you find this article useful? Do you have any questions or suggestions about this article in relation to integrating Amazon Polly with AWS S3 and Twilio API using Spring Boot Java app? Leave a comment and ask your questions and I shall do my best to address your queries.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

1 month ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago