Combine Amazon Translate with Elasticsearch and Skedler to build a cost-efficient multi-lingual omnichannel customer care – Part 2 of 2

In the previous post, we presented a system architecture to translate text from multiple languages with AWS Translate, index this information in Elasticsearch 6.2.3 for fast search, visualize the data with Kibana, and automated sending of customized intelligence with Skedler Reports and Alerts.

In this post, we are going to see how to implement the previously described architecture.

The main steps are:

  • Define API Gateway HTTP endpoints
  • Build a AWS Lambda Function
  • Deploy to AWS with Serverless framework
  • Translate text with AWS Translate
  • Index to Elasticsearch 6.2.3
  • Search in Elasticsearch by language – full-text search
  • Visualize, report and monitor with Kibana dashboards
  • Use Skedler Reports and Alerts for reporting, monitoring and alerting

We are going to define two HTTP API methods, one to translate and index new inquiries and another one to search for them.  We will use AWS API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.

The core of our solution will receive the inquiry (as a string) to be translated, translate it and index the text with the translations to Elasticsearch.

We will use AWS Lambda. It lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running.  With Lambda, you can run code for virtually any type of application or backend service – all with zero administration.  Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

To deploy our AWS Lambda function and the API Gateway HTTP endpoints, we will use the Serverless Framework.  Serverless is a toolkit for deploying and operating serverless architectures.

1. API Gateway

We are going to configure the following HTTP enpoints:

  • HTTP POST /createTicket
  • HTTP GET /searchTicket

The createTicket endpoint will be used to translate the text using AWS Translate and to index the document in Elasticsearch. The searchTicket endpoint will be used to search for documents in different languages.  The handler for each endpoint will be an AWS Lambda function.

Below is the serverless.yml section where we have defined the two endpoints.

functions:

  create:

    handler: handler.create_ticket

    events:

      – http:

          path: createTicket

          method: post

  search:

    handler: handler.search_ticket

    events:

      – http:

          path: searchTicket

          method: get

2. AWS Lambda

Once we have defined the two endpoints, we need to write the Lambda function. We do not focus on the deploy of the function in AWS, the Serverless framework will take care of it.   The function we are going to write will perform the following:

  • Get the input that need be translated (the customer inquiry)
  • Invoke AWS Translate and get the translations of the input
  • Create the Elasticsearch document
  • Index the document

def create_ticket(event, context):

    body = json.loads(event[‘body’])

    text = body[‘text’]

    customer_code = body[‘customerCode’]

    country = body[‘country’]

Detect the language of the input text

translate_client = boto3.client(‘translate’)

comprehend_client = boto3.client(‘comprehend’)

def create_ticket(event, context):

    body = json.loads(event[‘body’])

    text = body[‘text’]

    customer_code = body[‘customerCode’]

    country = body[‘country’]

    target_languages = os.environ[‘AWS_TRANSLATE_SUPPORTED_LANGUAGES’].split()

    dominant_language = comprehend_client.detect_dominant_language(

        Text=text

    )[‘Languages’][0][‘LanguageCode’]

and invoke AWS Translation

translate_client = boto3.client(‘translate’)

def get_translation(text, source_dominant_language, target_dominant_language):

   return translate_client.translate_text(

        Text=text,

        SourceLanguageCode=source_dominant_language,

        TargetLanguageCode=target_dominant_language

    )[‘TranslatedText’] 

Create the JSON document and index it to Elasticsearch:

es = Elasticsearch(

    [os.environ[‘ELASTICSEARCH_HOST’]],

    verify_certs=False

)

def index_new_document(english_text, translations, customer_code, country):

    result_nested_obj = []

    for key, value in translations.items():

        result_nested_obj.append({“language”:key, “text”:value})

    doc = {

        “text” : english_text,

        “language” : ‘en’,

        “translations” : result_nested_obj,

        “timestamp”: datetime.now(),

        “customer_code”: customer_code,

        “country”: country,

        “ticket_number”: str(uuid.uuid4())

    }

    es.index(index=os.environ[‘ELASTICSEARCH_INDEX_NAME’], doc_type=os.environ[‘ELASTICSEARCH_TYPE_NAME’], body=doc)

As you may have noticed, we used environment variables. They are defined in the serverless.yml configuration file.

provider:

  name: aws

  runtime: python3.6

  region: eu-west-1

  memorySize: 1024

  timeout: 300

  environment:

    AWS_TRANSLATE_SUPPORTED_LANGUAGES: ‘ar zh fr de pt es’ # supported AWS Translate languages

    ELASTICSEARCH_HOST: ‘https://yourElasticsearchHost’

    ELASTICSEARCH_INDEX_NAME: ‘customercare’

    ELASTICSEARCH_TYPE_NAME: ‘ticket’

3. Deploy

We are now ready to the deploy our code to AWS.

This is how my serverless.yml looks like:

We specified the provider (AWS), the runtime (Python 3.6), the environment variables, our HTTP endpoints and the AWS Lambda function handlers.

service: awstranslate

provider:

  name: aws

  runtime: python3.6

  region: eu-west-1

  memorySize: 1024

  timeout: 300

  environment:

    AWS_TRANSLATE_SUPPORTED_LANGUAGES: ‘ar zh fr de pt es’ # supported AWS Translate languages

    ELASTICSEARCH_HOST: ‘https://yourElasticsearchHost’

    ELASTICSEARCH_INDEX_NAME: ‘customercare’

    ELASTICSEARCH_TYPE_NAME: ‘ticket’

functions:

  create:

    handler: handler.create_ticket

    events:

      – http:

          path: createTicket

          method: post

          cors: true

  search:

    handler: handler.search_ticket

    events:

      – http:

          path: searchTicket

          method: get

plugins:

  – serverless-python-requirements

custom:

  pythonRequirements:

    dockerizePip: false

We specified the provider (AWS), the runtime (Python 3.6), the environment variables, our HTTP endpoints and the AWS Lambda function handlers.

Deploy to AWS:

serveless deploy –aws-s3-accelerate

4. Index to Elasticsearch

Given an inquiry, we now have a list of translations. Now, we want to index this information to Elasticsearch 6.2.3.
We create a new index called customercare and a new type called ticket.

The ticket type will have the following properties:

  • text: the English text
  • language: the language of the text
  • country: the country from where we received the inquiry
  • ticket number: an ID generated to uniquely identify an inquiry
  • timestamp: index time
  • translations: list of the translations (text and language)

PUT /customercare

{

   “mappings”: {

     “ticket”: {

      “properties”: {

        “text”: { “type”: “text”  },

        “language”: { “type”: “keyword” },

“country”: { “type”: “keyword” },

“ticket_number”: { “type”: “keyword” },

“timestamp”: {“type”: “date”},

        “translations”: {

          “type”: “nested”, 

          “properties”: {

            “text”:    { “type”: “text”  },

            “language”: { “type”: “keyword”  }

          }

        }

      }

    }

  }

}

5. Search in Elasticsearch

Now that we indexed the data in Elasticsearch, we can perform some queries to search in a multi-lingual way.

Examples:

Full-text search through translations:

GET customercare/_search

{

  “query”: {

    “nested”: {

      “path”: “translations”,

      “query”: {

        “match”: {

          “translations.text”: “your text”

        }

      }

    }

  }

}

Full-text search through English text and translations:

GET customercare/_search

{

  “query”: {

    “bool”: {

      “should”: [

        {

          “nested”: {

            “path”: “translations”,

            “query”: {

              “match”: {

                “translations.text”: “tree”

              }

            }

          }

        },

        {

          “term”: {

            “text”: “tree”

          }

        }

      ]

    }

  }

}

Number of inquiries by a customer (full-text search):

GET customercare/_search

{

  “aggs”: {

    “genres”: {

      “terms”: {

        “field”: “customerId”

      }

    }

  },

  “query”: {

    “bool”: {

      “should”: [

        {

          “match”: {

            “text”: “tree”

          }

        }

      ]

    }

  }

}

6. Visualize, Report, and Monitor with Kibana dashboards and search

With Kibana you can create a set of visualizations/dashboards to search for inquiries by language and to monitor index metrics (like number of translations or number of translations by customer).

Examples of Kibana dashboards:

Top languages, languages inquiries by customer and geolocation of inquiries:

inquiries count by languages and customers and top customer by language:

7. Use Skedler Reports and Alerts to easily monitor data

Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.

We have created a custom report using Skedler Report Templates that provides an overview of the tickets based on languages and countries of origin.  The custom report generated by Skedler is shown below:


If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of negative feedback by customers crosses a threshold value, you can use Skedler Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notifications (it supports multiple notifications, from Email to Slack and Webhook).

We have seen how to schedule a report generation. We are now going to see how to use Skedler Alerts to get notified when something happens in our index. For example, if the number of inquiries from a specific country hits a certain threshold.

Choose the Alert Condition. For example: “the number of ticket in English must be higher than zero”.

or “the number of ticket in English coming from Italy and containing a given word must be higher than zero”.

The Skedler Alert notification in Slack looks like.

Conclusion

In this two-part blog series, we learnt how to build our own multi-lingual omni-channel customer care platform using AWS Translate, Elasticsearch, and Skedler. Let us know your thoughts about this approach. Send your comments to hello at skedler dot com.

How to Extract Business Insights from Audio Using AWS Transcribe, AWS Comprehend and Elasticsearch – Part 2 of 2

In the previous post, we presented a system architecture to convert audio and voice into written text with AWS Transcribe, extract useful information for quick understanding of content with AWS Comprehend, index this information in Elasticsearch 6.2 for fast search and visualize the data with Kibana 6.2.

In this post we are going to see how to implement the previosly described architecture.
The main steps performed in the process are:

  1. Configure S3 Event Notification
  2. Consume messages from Amazon SQS queue
  3. Convert the recording to text with AWS Transcribe
  4. Entities/key phrases/sentiment detection using AWS Comprehend
  5. Index to Elasticsearch 6.2
  6. Search in Elasticsearch by entities/sentiment/key phrases/customer
  7. Visualize, report and monitor with Kibana dashboards
  8. Use Skedler and Alerts for reporting, monitoring and alerting

1. Configure S3 Event Notification

When a new recording has been uploaded to the S3 bucket, a message will be sent to an Amazon SQS queue.

You can read more information on how to configure the S3 Bucket and read the queue programmatically here: Configuring Amazon S3 Event Notifications.

This is how a message notified from S3 looks. The information we need are the object key and bucket name.

{

  “Records”: [

    {

      “eventVersion”: “2.0”,

      “eventSource”: “aws:s3”,

      “eventName”: “ObjectCreated:Put”,

      “requestParameters”: { “sourceIPAddress”: “xxx.xxx.xx.xx” },

      “s3”: {

        “s3SchemaVersion”: “1.0”,

        “configurationId”: “ev”,

        “bucket”: {

          “name”: “your_bucket”,

          “arn”: “arn:aws:s3:::your_bucket”

        },

        “object”: {

          “key”: “my_new_recording.mp3”,

          “size”: 567,

        }

      }

    }

  ]

}

2. Consume messages from Amazon SQS queue

Now that the S3 bucket has been configured, a notification will be sent to the SQS queue when a recording is uploaded to the bucket. We are going to build a consumer that will perform the following operations:

  • Start a new AWS Transcribe transcription job
  • Check the status of the job
  • When the job is done, perform text analysis with AWS Comprehend
  • Index the results to Elasticsearch

With this code you can read the messages from a SQS queue, fetch the bucket and key (used in S3) of the uploaded document and use them to invoke AWS Transcribe for the speech to text task:

import boto3 as boto3

import time

import json

AWS_ACCESS_KEY = ‘youAWS_ACCES_KEY’

AWS_SECRET_ACCESS_KEY = ‘youAWS_SECRET_ACCESKEY’

AWS_REGION = ‘yourAWS_SUBSCRIBTION_REGION’

SQS_QUEUE_NAME = ‘SQS_QUEUE_NAME’

sqs_resource_connection = boto3.resource(

    ‘sqs’,

    aws_access_key_id = AWS_ACCESS_KEY,

    aws_secret_access_key = AWS_SECRET_ACCESS_KEY,

    region_name = AWS_REGION

)

queue = sqs_resource_connection.get_queue_by_name(QueueName = SQS_QUEUE_NAME)

while True:

    messages = queue.receive_messages(MaxNumberOfMessages = 1, WaitTimeSeconds  = 5)

    for message in messages:

        body = json.loads(message.body)

        key_name = body[‘Records’][0][‘s3’][‘object’][‘key’]

        bucket_name= body[‘Records’][0][‘bucket’][‘name’]

        object_url = f’https://s3.amazonaws.com/{bucket_name}/{key_name}’

        # Start the AWS Transcribe transcription job

        # Check job status 

        # Run text analysis

        # Index to Elasticsearch

        message.delete()

    time.sleep(10)

3. AWS Transcribe – Start Transcription Job

Once we have consumed a S3 message and we have the url of the new uploaded document, we can start a new transcription job (asynchronous) to perform the speech to text task.

We are going to use the start_transcription_job method.

It takes a job name, the S3 url and the media format as parameters.

To use the AWS Transcribe API be sure that your AWS Python SDK – Boto3 is updated.

pip install boto3 –upgrade

import boto3 

client_transcribe = boto3.client(

    ‘transcribe’,

    region_name=’us-east-1′ # service still in preview

)

def start_transcribe_job(job_name, media_file_uri):

    response = client_transcribe.start_transcription_job(

        TranscriptionJobName=job_name,

        LanguageCode=’en-US’, # TODO: use parameter when more languages will be available

        MediaFormat=’mp3′, # feel free to change it

        Media={

            ‘MediaFileUri’: media_file_uri

        }

    )

    return response[‘TranscriptionJob’][‘TranscriptionJobName’]

Read more details here: Python Boto3 AWS Transcribe.

3a. AWS Transcribe – Check Job Status

Due to the asynchronous nature of the transcription job (it could take a while depending on the length and complexity of your recordings), we need to check the job status.

Once the stauts is “COMPLETED” we can retrieve the result of the job (the text converted from the recording).

def get_transcribe_job_response(job_name):

    job_status = ‘IN_PROGRESS’

    while job_status == ‘IN_PROGRESS’:

        job = client_transcribe.get_transcription_job(

            TranscriptionJobName=job_name

        )

        job_status = job[‘TranscriptionJob’][‘TranscriptionJobStatus’]

        time.sleep(5)

    if job_status == ‘FAILED’:

        raise Exception(f’Job {job_name} failed’)

    elif job_status == ‘COMPLETED’:

        job_result = job[‘TranscriptionJob’][‘Transcript’][‘TranscriptFileUri’]

        with urllib.request.urlopen(job_result) as url:

            return json.loads(url.read().decode())[‘results’][‘transcripts’][0]

Here’s how the output looks:

{

    “jobName”: “myFirstJob”,

    “accountId”: “1111111”,

    “results”: {

        “transcripts”: [{

            “transcript”: “welcome back”

        }],

        “items”: [{

            “start_time”: “0.990”,

            “end_time”: “1.300”,

            “alternatives”: [{

                “confidence”: “0.9999”,

                “content”: “welcome”

            }],

            “type”: “pronunciation”

        }, {

            “start_time”: “1.300”,

            “end_time”: “1.440”,

            “alternatives”: [{

                “confidence”: “1.0000”,

                “content”: “back”

            }],

            “type”: “pronunciation”

        }]

    }

}

4. AWS Comprehend – Text Analysis

We have converted our recording to text. Now, we can run the text analysis using AWS Comprehend. The analysis will extract the following elements from the text:

  • Sentiment
  • Entities
  • Key phreses

import boto3

client_comprehend = boto3.client(

    ‘comprehend’,

    region_name = ‘yourRegion’

)

def comprehend_analysis(plain_text):

    # Max Bytes size supported by AWS Comprehend

    # https://boto3.readthedocs.io/en/latest/reference/services/comprehend.html#Comprehend.Client.detect_dominant_language

    # https://boto3.readthedocs.io/en/latest/reference/services/comprehend.html#Comprehend.Client.detect_entities

    while sys.getsizeof(plain_text) > 5000:

        plain_text = plain_text[:-1]

    dominant_language_response = client_comprehend.detect_dominant_language(

        Text=plain_text

    )

    dominant_language = sorted(dominant_language_response[‘Languages’], key=lambda k: k[‘LanguageCode’])[0][‘LanguageCode’]

    if dominant_language not in [‘en’,’es’]:

        dominant_language = ‘en’

    response_entities = client_comprehend.detect_entities(

        Text=plain_text,

        LanguageCode=dominant_language

    )

    response_key_phrases = client_comprehend.detect_key_phrases(

        Text=plain_text,

        LanguageCode=dominant_language

    )

    response_sentiment = client_comprehend.detect_sentiment(

        Text=plain_text,

        LanguageCode=dominant_language

    )

    entites = list(set([x[‘Type’] for x in response_entities[‘Entities’]]))

    key_phrases = list(set([x[‘Text’] for x in response_key_phrases[‘KeyPhrases’]]))

    sentiment = response_sentiment[‘Sentiment’]

    return entites, key_phrases, sentiment

Read more details here: Python Boto3 AWS Comprehend.

5. Index to Elasticsearch

Given a recording, we now have a set of elements that characterize it. Now, we want to index this information to Elasticsearch 6.2. I created a new index called audioarchive and a new type called recording.

The recording type we are going to create will have the following properties:

  • customer id: the id of the customer who submitted the recording (substring of the s3 key)
  • entities: the list of entities detected by AWS Comprehend
  • key phrases: the list of key phrases detected by AWS Comprehend
  • sentiment: the sentiment of the document detected by AWS Comprehend
  • s3Location: link to the document in the S3 bucket

Create the new index:

curl -XPUT ‘esHost:9200/audioarchive/’ -H ‘Content-Type: application/json’ -d ‘{

    “settings” : {

        “index” : {

            “number_of_shards” : 1, 

            “number_of_replicas” : 0

        }

    }

}’

Add the new mapping:

curl -X PUT “esHost:9200/audioarchive/recording/_mapping” -H ‘Content-Type: application/json’ -d ‘{

    “recording” : {

        “properties” : {

“customerId” : { “type” : “keyword” },

            “entities” : { “type” : “keyword” },

“keyPhrases” : { “type” : “keyword” },

“sentiment” : {“type” : “keyword”},

            “s3Location” : { “type” : “text”}

        }

}

}’

We can now index the new document:

from elasticsearch import Elasticsearch

es_client = Elasticsearch(‘esHost’)

def create_es_document(customer_id, entites, sentiment, key_phrases, s3_location):

    return {

        “customerId”: customer_id,

        “entities”: entites,

        “sentiment”: sentiment,

        “keyPhrases”: key_phrases,

        “s3Location”: s3_location

    }

def index_to_es(document, index_name, type):

    es_client.index(index=index_name, doc_type=type, body=document)

doc = create_es_document(1, [‘entity1’, ‘entity2’], ‘positive’, [‘k1′,’k2’], ‘https://your_bucket.s3.amazonaws.com/your_object_key’

index_to_es(doc, INDEX_NAME, TYPE_NAME)

6. Search in Elasticsearch by entities, sentiment, key phrases or customer

Now that we indexed the data in Elasticsearch, we can perform some queries to extract business insights from the recordings.

Examples:

Number of positive recordins that contains the _feedback_ key phrases by customer.

POST audioarchive/recording/_search?size=0

{

  “aggs”: {

    “genres”: {

      “terms”: {

        “field”: “customerId”

      }

    }

  },

  “query”: {

    “bool”: {

      “must”: [

        {

          “match”: {

            “sentiment”: “Positive”

          }

        },

        {

          “match”: {

            “keyPhrases”: “feedback”

          }

        }

      ]

    }

  }

}

Number of recordings by sentiment.

POST audioarchive/recording/_search?size=0

{

  “aggs”: {

    “genres”: {

      “terms”: {

        “field”: “sentiment”

      }

    }

  }

}

What are the main key phares in the nevative recordings?

POST audioarchive/recording/_search?size=0

{

  “aggs”: {

    “genres”: {

      “terms”: {

        “field”: “keyPhrases”

      }

    }

  },

  “query”: {

    “bool”: {

      “should”: [

        {

          “match”: {

            “sentiment”: “Negative”

          }

        },

        {

          “match”: {

            “sentiment”: “Mixed”

          }

        }

      ]

    }

  }

}

7. Visualize, Report, and Monitor with Kibana dashboards and search

With Kibana you can create a set of visualizations/dashboards to search for recording by customer, entities and to monitor index metrics (like number of positive recordings, number of recordings by customer, most common entities/key phreases in the recordings).

Examples of Kibana dashboards:

Percentage of documents by sentiment, percentage of positive feedback and key phrases:

kibana report dashboard

Number of recordings by customers, and sentiment by customers:

kibana report dashboard

Most common entities and heat map sentiment-entities:

kibana report

8. Use Skedler Reports and Alerts to easily monitor data

Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.

[video_embed video=”APEOKhsgIbo” parameters=”” mp4=”” ogv=”” placeholder=”” width=”700″ height=”400″]

If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of negative recording by customer reaches a certain value, you can use Skedler Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notification (it supports multiple notifications, from Email to Slack and Webhook).

Conclusion

In this post we have seen how to use Elasticsearch as the search engine for customer recordings. We used the speech to text power of AWS Transcribe to convert our recording to text and then AWS Comprehend to extract semantic information from the text. Then we used Kibana to aggregate the data and create useful visualizations and dashboards. Then scheduled and distribute custom reports from Kibana Dashboards using Skedler Reports.

Environment configurations:

  • Elasticsearch and Kibana 6.2
  • Python 3.6.3 and AWS SDK Boto3 1.6.3
  • Ubuntu 16.04.3 LTS
  • Skedler Reports & Alerts

Extract business insights from audio using AWS Transcribe, AWS Comprehend and Elasticsearch – Part 1

Many businesses struggle to gain actionable insights from customer recordings because they are locked in voice and audio files that can’t be analyzed. They have a gold mine of potential information from product feedback, customer service recordings and more, but it’s seemingly locked in a black box.

Until recently, transcribing audio files to text has been time-consuming or inaccurate.
Speech to text is the process of converting speech input into digital text, based on speech recognition. The best solutions were either not accurate enough, too expensive to scale or didn’t play well with legacy analysis tools. With Amazon’s introduction of AWS Transcribe, that has changed.

In this two-part blog post, we are going to present a system architecture to convert audio and voice into written text with AWS Transcribe, extract useful information for quick understanding of content with AWS Comprehend, index this information in Elasticsearch 6.2 for fast search and visualize the data with Kibana 6.2.  In Part I, you can learn about the key components, architecture, and common use cases.  In Part II, you can learn how to implement this architecture.

We are going to analyze some customer recordings (complaints, product feedbacks, customer support) to extract useful information and answer the following questions:

  • How many positive recordings do I have?
  • How many customers are complaining (negative feedback) about my products?
  • Which is the sentiment about my product?
  • Which entities/key phrases are the most common in my recordings?

The components that we are going to use are the following:

  • AWS S3 bucket
  • AWS Transcribe
  • AWS Comprehend
  • Elasticsearch 6.2
  • Kibana 6.2
  • Skedler Reports and Alerts

System architecture:

This architecture is useful when you want to get useful insights from a set or audio/voice recording. You will be able to convert to text your recordings, extract semantic details from the text, perform fast search/aggregations on the data, visualize and report the data.

Examples of common applications are:

  • transcription of customer service calls
  • generation of subtitles on audio and video content
  • conversion of audio file (for example podcast) to text
  • search for keywords or inappropriate words within an audio file

 

AWS Transcribe

At the re:invent2017 conference, Amazon Web Services presented Amazon Transcribe, a new, machine learning – natural language processing – service.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.

Instead of AWS Transcribe, you can use similar services to perform speech to text analysis, like: Azure Bing Speech API or Google Cloud Speech API.

> The service is still in preview, watch the launch video here: AWS re:Invent 2017: Introducing Amazon Transcribe.

> You can read more about it here: Amazon Transcribe – Accurate Speech To Text At Scale.

 

AWS Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is, and automatically organizes a collection of text files by topic. – AWS Service Page

AWS Comprehend and Elasticsearch

It analyzes text and tells you what it finds, starting with the language, from Afrikaans to Yoruba, with 98 more in between. It can identify different types of entities (people, places, brands, products, and so forth), key phrases, sentiment (positive, negative, mixed, or neutral), and extract key phrases, all from a text in English or Spanish. Finally, Comprehend’s topic modeling service extracts topics from large sets of documents for analysis or topic-based grouping. – Jeff Barr – Amazon Comprehend – Continuously Trained Natural Language Processing.

Instead of AWS Comprehend, you can use similar services to perform Natural Language Processing, like: Google Cloud Platform – Natural Language API or Microsoft Azure – Text Analytics API.
I prefer to use AWS Comprehend because the service constantly learns and improves from a variety of information sources, including Amazon.com product descriptions and consumer reviews – one of the largest natural language data sets in the world. This means it will keep pace with the evolution of language and it is fully integrated with AWS S3 and AWS Glue (so you can load documents and texts from various AWS data stores such as Amazon Redshift, Amazon RDS, Amazon DynamoDB, etc.).

Once you have a text file of the audio recording, you enter it into Amazon Comprehend for analysis of the sentiment, tone and other insights. Instead of AWS Comprehend, you can use similar services to perform Natural Language Processing, like: Google Cloud Platform – Natural Language API or Microsoft Azure – Text Analytics API.

> Here you can find an AWS Comprehend use case: How to Combine Text Analytics and Search using AWS Comprehend and Elasticsearch 6.0.

 

Conclusion

In this post we have seen a system architecture that performs the following:

  • Speech to text task – AWS Transcribe
  • Text analysis – AWS Comprehend
  • Index and fast search – Elasticsearch
  • Dashboard visualization – Kibana
  • Automatic Reporting and Alerting – Skedler Reports and Alerts

Amazon Transcribe and Comprehend can be powerful tools in helping you unlock the potential insights from voice and video recordings that were previously too costly to access. Having these insights makes it easier to understand trends in issues and consumer behavior, brand and product sentiment, Net Promoter Score, as well as product ideas and suggestions, and more.

In the next post (Part 2 of 2), you can see how to implement the described architecture.

Copyright © 2023 Guidanz Inc
Translate »