This article explores the technology landscape which can be used to build similar platform / service offerings like Liv.ai.
First and foremost, congratulations to Liv.ai team for leveraging existing cloud-based AI and speech recognition (Speech-to-text conversion) technologies to come up with a set of business offerings which leverages speech-to-text conversion technology to create great value for businesses. The founding team (IIT KGP Alumni – Subodh Kumar, Sanjeev Kumar and Kishore Mundra) nailed it! Doing right thing at right time at right place.
Liv.ai enables developers to convert speech-to-text by using Powerful Neural Network Models with exceptional accuracy and minimal latency. At this point, the platform supports 9 languages including Hindi, English, Bengali, Gujarati, Telugu, Tamil, Marathi, Punjabi and Kannada.
The following is the list of areas where Liv.ai technologies look to be focusing on:
- Speech analytics
- Voice keyboard
- Customer care automation
Business Usecases for Speech-to-text Conversion
The following lists down some of the business use case in relation to speech-to-text conversion technology:
- Voice input: Let consumer talk with your website in relation with searching products & services; This could be very useful for ecommerce website.
- Conversational speech-to-text: Audio files are converted into text files; This can be very useful for extracting intelligence from voice captured from as part of customer care phone calls. Millions of customers are calling customer care centre for issues resolutions. The calls are recorded. Imagine this service tagging the recorded call. And, the tag information can be used for various purposes such as following:
- Segmentation such as issues classification, customers types etc.
- Customer churn
- Rewards & recognition for rewarding customer care executives
- Product feature identification
How to Build Liv.ai like Platforms
The following are some of the key building blocks of a platform like liv.ai leveraging speech-to-text conversion technology:
- Voice capture
- Way to capture the real-time data
- Speech-to-text conversion
- Convert the speech (streaming) in real-time; This would be useful when consumer could call out product names during search, or, customer call out commands on the software
- Convert the batch of long audio files to text in asynchronous manner; This would be useful
- API-based integrations
- APIs for converting audio to text by applying neural network models
- Deep learning algorithms to recognize the speech and convert it to text
- App for doing some of the following:
- Access the audio files
- Analytics reports
All of the above can be achieved using following:
- Web / mobile app for accessing audio files / analytics reports
- Integration with Google Cloud Speech API to achieve speech-to-text conversion.
Google Cloud Speech API
The following are some of the salient features of Google Cloud Speech API:
- Speech-to-text conversion in real time (streaming recognition) based on deep learning models
- Greater accuracy in noisy environments
- Context aware recognition; Very useful for auto-suggesting words (word hints)
- Easy-to-integrate APIs with support for REST and gRPC based integrations.
- Asynchronous audio processing for large audios
And, all of the above comes at a very decent pricing from Google:
|Monthly Usage||Price per 15 seconds|
Other Cloud Speech APIs (Azure, AWS)
One can also try other cloud speech APIs such as following:
In case, you wanted to share your thoughts in relation with using Google or other cloud speech APIs to build speech-to-text conversion platforms such as liv.ai, please feel free to suggest.