How to Build like Speech-to-text Conversion Platform speech-to-text conversion

This article explores the technology landscape which can be used to build similar platform / service offerings like

First and foremost, congratulations to team for leveraging existing cloud-based AI and speech recognition (Speech-to-text conversion) technologies to come up with a set of business offerings which leverages speech-to-text conversion technology to create great value for businesses. The founding team (IIT KGP Alumni – Subodh Kumar, Sanjeev Kumar and Kishore Mundra) nailed it! Doing right thing at right time at right place. enables developers to convert speech-to-text by using Powerful Neural Network Models with exceptional accuracy and minimal latency. At this point, the platform supports 9 languages including Hindi, English, Bengali, Gujarati, Telugu, Tamil, Marathi, Punjabi and Kannada.

The following is the list of areas where technologies look to be focusing on:

  • Speech analytics
  • Voice keyboard
  • Assistant
  • Customer care automation

Business Usecases for Speech-to-text Conversion

The following lists down some of the business use case in relation to speech-to-text conversion technology:

  • Voice input: Let consumer talk with your website in relation with searching products & services; This could be very useful for ecommerce website.
  • Conversational speech-to-text: Audio files are converted into text files; This can be very useful for extracting intelligence from voice captured from as part of customer care phone calls. Millions of customers are calling customer care centre for issues resolutions. The calls are recorded. Imagine this service tagging the recorded call. And, the tag information can be used for various purposes such as following:
    • Segmentation such as issues classification, customers types etc.
    • Customer churn
    • Rewards & recognition for rewarding customer care executives
    • Product feature identification

How to Build like Platforms

The following are some of the key building blocks of a platform like leveraging speech-to-text conversion technology:

  • Voice capture
    • Way to capture the real-time data
  • Speech-to-text conversion
    • Convert the speech (streaming) in real-time; This would be useful when consumer could call out product names during search, or, customer call out commands on the software
    • Convert the batch of long audio files to text in asynchronous manner; This would be useful
  • API-based integrations
    • APIs for converting audio to text by applying neural network models
  • Deep learning algorithms to recognize the speech and convert it to text
  • App for doing some of the following:
    • Access the audio files
    • Analytics reports

All of the above can be achieved using following:

  • Web / mobile app for accessing audio files / analytics reports
  • Integration with Google Cloud Speech API to achieve speech-to-text conversion.

Google Cloud Speech API

When considering Indian languages or rather, languages spoken in India, Google Cloud Speech API supports speech to text conversion for following Indian languages (as supported by

  • Hindi
  • English
  • Bengali
  • Gujarati
  • Telugu
  • Tamil
  • Marathi
  • Punjabi
  • Kannada

The following are some of the salient features of Google Cloud Speech API:

  • Speech-to-text conversion in real time (streaming recognition) based on deep learning models
  • Greater accuracy in noisy environments
  • Context aware recognition; Very useful for auto-suggesting words (word hints)
  • Easy-to-integrate APIs with support for REST and gRPC based integrations.
  • Asynchronous audio processing for large audios

And, all of the above comes at a very decent pricing from Google:

Monthly Usage Price per 15 seconds
0-60 minutes Free
61-1,000,000 minutes $0.006

Other Cloud Speech APIs (Azure, AWS)

One can also try other cloud speech APIs such as following:

In case, you wanted to share your thoughts in relation with using Google or other cloud speech APIs to build speech-to-text conversion platforms such as, please feel free to suggest.

Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog,
Posted in AI, Latest Tech, Speech Recognition, startup. Tagged with , , .

Leave a Reply

Your email address will not be published. Required fields are marked *