Building an intelligent IVR system with a Bot handling the interaction with your end users and bringing in humans based on pre-defined events would bring a lot of automation and remove mundane manual activities which takes up lot of time for a person. This can be achieved using cloud services provided by cloud providers such as Amazon, Google, Azure etc and communication service providers such as Twilio.
In this post, you will learn about how to create or build an intelligent or smart IVR system using some of the following:
- Use Amazon Polly to create one or more custom text-to-speech audios and store the same at predefined locations in AWS S3
- Program the IVR using Twilio TwiML
- Create one or more APIs for handling responses from Twilio; These APIs can be deployed as AWS Lambda which could take one or more actions such as some of the following:
- Invoke services such as Amazon Transcribe and Comprehend to identify actionable inputs
- Update database such as Amazon DynamoDB with actionable inputs
- The above could then invoke other AWS Lambdas to create further actions.
The following is a sample/reference application architecture representing a custom IVR system built using different AWS services (DynamoDB, Lambda, Polly, and S3) and Twilio.
Creating Custom Text-to-speech audio using Amazon Polly
Use Amazon Polly to create custom text-to-speech audio. Store the generated audio with appropriate read permission and content-type/content-length information on AWS S3. Refer to this page, Amazon Polly Text-to-speech with AWS S3, Twilio Java App
Programming the IVR using TwiML
The following is the sample IVR instructions which does the following:
- Play the query message created using Amazon Polly and stored in S3; Ask user to leave feedback message at the beep and press star key, “*” on finish.
- Record the message of user and call an API which invokes AWS Lambda; AWS Lambda handles the recording by calling other services such as some of the following:
- Amazon Transcribe to convert speech to text and identify action
- Amazon Comprehend to do the sentiment analysis and identify action
- Amazon DynamoDB to store the information and action updates
- Play thank you message stored at S3
- Hangup after the thank you message is played
The above can be achieved using the following code:
@Override public void playVoice(String toNumber, String voicePath) throws IOException, URISyntaxException { String thankYouPath = "http://s3.amazonaws.com/reminders/thankyou.mp3"; PhoneNumber to = new PhoneNumber(toNumber); // Replace with your phone number PhoneNumber from = new PhoneNumber(this.fromPhoneNumber); // Replace with a Twilio number // // Play custom feedback or query message created using Polly // Play playMessage = new Play.Builder(voicePath).loop(1).build(); // // Record the user input // Record recordMessage = new Record.Builder(). .action(URL_for_invoking_AWS_Lambda).method(HttpMethod.POST). timeout(15).playBeep(true).maxLength(30).finishOnKey("*").build(); // // Play thank you message // Play playThankYouMessage = new Play.Builder(thankYouPath).build(); // // Hang up Hangup hangup = new Hangup.Builder().build(); // // Create a voice response object and corresponding XML // VoiceResponse response = new VoiceResponse.Builder().play(play1).record(record1).play(play2).hangup(hangup1) .build(); String xml = response.toXml(); // // Store the voice XML on AWS S3 InputStream xmlStream = new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)); String s3Key = UUID.randomUUID().toString() + ".xml"; String url = this.awsCloudStorage.uploadStream(s3Key, xmlStream, "text/xml"); // // Make the call using programmed voice stored as XML on AWS S3 // Call call = Call.creator(to, from, new URI(url)).setMethod(HttpMethod.GET).create(this.twilioRestClient); System.out.println(call.toString()); }The following is the sample TwiML file which gets created and stored on S3. It is later played by Twilio as a result of outgoing call to the destined user.
<Response> <Play loop="1"> </Play> <Record action="API_url_invoking_AWS_Lambda" method="post" finishOnKey="#" maxLength="30" playBeep="true" timeout="15" trim="trim-silence"/> <Play> </Play> <Hangup/> </Response>Further Reading / References
- Amazon Polly
- AWS Java SDK
- TwiML For Programmable Voice
- Twilio Java Helper library
- Twilio Java on Github
- Amazon Polly Text-to-speech with AWS S3, Twilio Java App
Summary
In this post, you learned about creating or building a smart and an intelligent IVR system using Amazon ML services such as Polly, Transcribe, Comprehend, Lambda etc and Twilio Programmable Voice APIs.
Did you find this article useful? Do you have any questions or suggestions about this article in relation to building IVR using Amazon and Twilio services? Leave a comment and ask your questions and I shall do my best to address your queries.
- What are AI Agents? How do they work? - January 7, 2025
- Agentic AI Design Patterns Examples - January 6, 2025
- List of Agentic AI Resources, Papers, Courses - January 5, 2025
I found it very helpful. However the differences are not too understandable for me