Build IVR System using Amazon Polly, Lambda and Twilio

Figure 1. Build IVR with Amazon Polly, S3, Lambda and Twilio

Building an intelligent IVR system with a Bot handling the interaction with your end users and bringing in humans based on pre-defined events would bring a lot of automation and remove mundane manual activities which takes up lot of time for a person. This can be achieved using cloud services provided by cloud providers such as Amazon, Google, Azure etc and communication service providers such as Twilio.

In this post, you will learn about how to create or build an intelligent or smart IVR system using some of the following:

Use Amazon Polly to create one or more custom text-to-speech audios and store the same at predefined locations in AWS S3
Program the IVR using Twilio TwiML
Create one or more APIs for handling responses from Twilio; These APIs can be deployed as AWS Lambda which could take one or more actions such as some of the following:
- Invoke services such as Amazon Transcribe and Comprehend to identify actionable inputs
- Update database such as Amazon DynamoDB with actionable inputs
- The above could then invoke other AWS Lambdas to create further actions.

The following is a sample/reference application architecture representing a custom IVR system built using different AWS services (DynamoDB, Lambda, Polly, and S3) and Twilio.

Figure 1. Build IVR system with Amazon Polly, S3, Lambda and Twilio

Creating Custom Text-to-speech audio using Amazon Polly

Use Amazon Polly to create custom text-to-speech audio. Store the generated audio with appropriate read permission and content-type/content-length information on AWS S3. Refer to this page, Amazon Polly Text-to-speech with AWS S3, Twilio Java App

Programming the IVR using TwiML

The following is the sample IVR instructions which does the following:

Play the query message created using Amazon Polly and stored in S3; Ask user to leave feedback message at the beep and press star key, “*” on finish.
Record the message of user and call an API which invokes AWS Lambda; AWS Lambda handles the recording by calling other services such as some of the following:
- Amazon Transcribe to convert speech to text and identify action
- Amazon Comprehend to do the sentiment analysis and identify action
- Amazon DynamoDB to store the information and action updates
Play thank you message stored at S3
Hangup after the thank you message is played

The above can be achieved using the following code:

@Override
    public void playVoice(String toNumber, String voicePath) throws IOException, URISyntaxException {
        String thankYouPath = &amp;quot;http://s3.amazonaws.com/reminders/thankyou.mp3&amp;quot;;

        PhoneNumber to = new PhoneNumber(toNumber); // Replace with your phone number
        PhoneNumber from = new PhoneNumber(this.fromPhoneNumber); // Replace with a Twilio number
        //
        // Play custom feedback or query message created using Polly
        //
        Play playMessage = new Play.Builder(voicePath).loop(1).build();
        //
        // Record the user input
        //
        Record recordMessage = new Record.Builder().
        .action(URL_for_invoking_AWS_Lambda).method(HttpMethod.POST).
        timeout(15).playBeep(true).maxLength(30).finishOnKey(&amp;quot;*&amp;quot;).build();
        //
        // Play thank you message
        //
        Play playThankYouMessage = new Play.Builder(thankYouPath).build();
        //
        // Hang up
        Hangup hangup = new Hangup.Builder().build();
        //
        // Create a voice response object and corresponding XML
        //
        VoiceResponse response = new VoiceResponse.Builder().play(play1).record(record1).play(play2).hangup(hangup1)
                .build();
        String xml = response.toXml();
        //
        // Store the voice XML on AWS S3
        InputStream xmlStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8));
        String s3Key = UUID.randomUUID().toString() + &amp;quot;.xml&amp;quot;;
        String url = this.awsCloudStorage.uploadStream(s3Key, xmlStream, &amp;quot;text/xml&amp;quot;);
        //
        // Make the call using programmed voice stored as XML on AWS S3
        //
        Call call = Call.creator(to, from, new URI(url)).setMethod(HttpMethod.GET).create(this.twilioRestClient);
        System.out.println(call.toString());
    }
 The following is the sample TwiML file which gets created and stored on S3. It is later played by Twilio as a result of outgoing call to the destined user.
 &amp;lt;Response&amp;gt;
&amp;lt;Play loop=&amp;quot;1&amp;quot;&amp;gt;
  &amp;lt;/Play&amp;gt; &amp;lt;Record action=&amp;quot;API_url_invoking_AWS_Lambda&amp;quot; method=&amp;quot;post&amp;quot; finishOnKey=&amp;quot;#&amp;quot; maxLength=&amp;quot;30&amp;quot; playBeep=&amp;quot;true&amp;quot; timeout=&amp;quot;15&amp;quot; trim=&amp;quot;trim-silence&amp;quot;/&amp;gt; &amp;lt;Play&amp;gt;  &amp;lt;/Play&amp;gt; &amp;lt;Hangup/&amp;gt; &amp;lt;/Response&amp;gt; 
 Further Reading / References

Summary

In this post, you learned about creating or building a smart and an intelligent IVR system using Amazon ML services such as Polly, Transcribe, Comprehend, Lambda etc and Twilio Programmable Voice APIs.

Did you find this article useful? Do you have any questions or suggestions about this article in relation to building IVR using Amazon and Twilio services? Leave a comment and ask your questions and I shall do my best to address your queries.

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin.
Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.