Every organization provides services to customers before, during and after a purchase. For organizations whose customers are spread all over the world, the customer care team has to handle requests in different languages. Meeting the customer satisfaction SLA for a global multi-lingual customer base without breaking the bank is a significant challenge. How can you enable our customer care team to respond to inquiries in different languages? Is it feasible for organizations to handle customer inquiries from across the globe efficiently without compromising on quality?
With Amazon’s introduction of AWS Translate + ELK + Skedler, you now can!
In this two-part blog post, we are going to present a system architecture to translate customer inquiries in different languages with AWS Translate, index this information in Elasticsearch 6.2.3 for fast search, visualize the data with Kibana 6.2.3, and automate reporting and alerting using Skedler. In Part I, we will discuss the key components, architecture, and common use cases. In Part II, we will dive into the details on how to implement this architecture.
Let us begin by breaking down the business requirement into use cases:
Enable customer care teams (based in the US or other English language countries) to respond to tickets/questions from customers all over the world, automatically translated, across multiple channels such as email, chat
Build a searchable index of tickets/questions/responses/translations/customer satisfaction score to measure (such as key topics, customer satisfaction, identify topics for automation – auto-reply via chatbots or knowledgebase)
Use Skedler reporting and alerting to generate KPIs on the above and alert if customer satisfaction score falls below threshold levels
The components that we need are the following:
AWS API Gateway
AWS Lambda
AWS Translate
Elasticsearch 6.2.3
Kibana 6.2.3
Skedler Reports and Alerts
System architecture:
A Bit about AWS Translate
At the re:invent2017 conference, Amazon Web Services presented Amazon Translate, a new machine learning – natural language processing – service.
Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. Neural machine translation is a form of language translation automation that uses deep learning models to deliver more accurate and more natural sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to localize content – such as websites and applications – for international users, and to easily translate large volumes of text efficiently.
In this post we presented a system architecture that performs the following:
Text Translation with AWS Translate
Index and fast search – Elasticsearch
Dashboard visualization – Kibana
Automated Customizable Reporting and Alerting – Skedler Reports and Alerts
AWS Translate+ELK+Skedler is a robust solution in helping you to handle multi-lingual customer support inquiries in a high-quality and cost-efficient way.
Excited and ready to dive into the details? In the next post (Part 2 of 2), you can see how to implement the described architecture.
Many businesses struggle to gain actionable insights from customer recordings because they are locked in voice and audio files that can’t be analyzed. They have a gold mine of potential information from product feedback, customer service recordings and more, but it’s seemingly locked in a black box.
Until recently, transcribing audio files to text has been time-consuming or inaccurate. Speech to text is the process of converting speech input into digital text, based on speech recognition. The best solutions were either not accurate enough, too expensive to scale or didn’t play well with legacy analysis tools. With Amazon’s introduction of AWS Transcribe, that has changed.
In this two-part blog post, we are going to present a system architecture to convert audio and voice into written text with AWS Transcribe, extract useful information for quick understanding of content with AWS Comprehend, index this information in Elasticsearch 6.2 for fast search and visualize the data with Kibana 6.2. In Part I, you can learn about the key components, architecture, and common use cases. In Part II, you can learn how to implement this architecture.
We are going to analyze some customer recordings (complaints, product feedbacks, customer support) to extract useful information and answer the following questions:
How many positive recordings do I have?
How many customers are complaining (negative feedback) about my products?
Which is the sentiment about my product?
Which entities/key phrases are the most common in my recordings?
The components that we are going to use are the following:
AWS S3 bucket
AWS Transcribe
AWS Comprehend
Elasticsearch 6.2
Kibana 6.2
Skedler Reports and Alerts
System architecture:
This architecture is useful when you want to get useful insights from a set or audio/voice recording. You will be able to convert to text your recordings, extract semantic details from the text, perform fast search/aggregations on the data, visualize and report the data.
Examples of common applications are:
transcription of customer service calls
generation of subtitles on audio and video content
conversion of audio file (for example podcast) to text
search for keywords or inappropriate words within an audio file
AWS Transcribe
At the re:invent2017 conference, Amazon Web Services presented Amazon Transcribe, a new, machine learning – natural language processing – service.
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech to text capability to their applications. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is, and automatically organizes a collection of text files by topic. – AWS Service Page
It analyzes text and tells you what it finds, starting with the language, from Afrikaans to Yoruba, with 98 more in between. It can identify different types of entities (people, places, brands, products, and so forth), key phrases, sentiment (positive, negative, mixed, or neutral), and extract key phrases, all from a text in English or Spanish. Finally, Comprehend’s topic modeling service extracts topics from large sets of documents for analysis or topic-based grouping. – Jeff Barr – Amazon Comprehend – Continuously Trained Natural Language Processing.
Instead of AWS Comprehend, you can use similar services to perform Natural Language Processing, like: Google Cloud Platform – Natural Language API or Microsoft Azure – Text Analytics API.
I prefer to use AWS Comprehend because the service constantly learns and improves from a variety of information sources, including Amazon.com product descriptions and consumer reviews – one of the largest natural language data sets in the world. This means it will keep pace with the evolution of language and it is fully integrated with AWS S3 and AWS Glue (so you can load documents and texts from various AWS data stores such as Amazon Redshift, Amazon RDS, Amazon DynamoDB, etc.).
Once you have a text file of the audio recording, you enter it into Amazon Comprehend for analysis of the sentiment, tone and other insights. Instead of AWS Comprehend, you can use similar services to perform Natural Language Processing, like: Google Cloud Platform – Natural Language API or Microsoft Azure – Text Analytics API.
In this post we have seen a system architecture that performs the following:
Speech to text task – AWS Transcribe
Text analysis – AWS Comprehend
Index and fast search – Elasticsearch
Dashboard visualization – Kibana
Automatic Reporting and Alerting – Skedler Reports and Alerts
Amazon Transcribe and Comprehend can be powerful tools in helping you unlock the potential insights from voice and video recordings that were previously too costly to access. Having these insights makes it easier to understand trends in issues and consumer behavior, brand and product sentiment, Net Promoter Score, as well as product ideas and suggestions, and more.
In the next post (Part 2 of 2), you can see how to implement the described architecture.
How to automatically extract metadata from documents? How to index them and perform fast searches? In this post, we are going to see how to automatically extract metadata from a document using Amazon AWS Comprehend and Elasticsearch 6.0 for fast search and analysis.
This architecture we present improves the search and automatic classification of documents (using the metadata) for your organization.
Using the automatically extracted metadata you can search for documents and find what you need.
Voice of customer analytics: You can use Amazon Comprehend to analyze customer interactions in the form of documents, support emails, online comments, etc., and discover what factors drive the most positive and negative experiences. You can then use these insights to improve your products and services.
Semantic search: You can use Amazon Comprehend to provide a better search experience by enabling your search engine to index key phrases, entities, and sentiment. This enables you to focus the search on the intent and the context of the articles instead of basic keywords.
Knowledge management and discovery: You can use Amazon Comprehend to organize and categorize your documents by topic for easier discovery, and then personalize content recommendations for readers by recommending other articles related to the same topic.
When we talk about metadata, I like the following definition:
Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are examples of very basic document metadata. Having the ability to filter through that metadata makes it much easier for someone to locate a specific document.
We are going to focus on the following metadata:
Document content type (PDF, Plain Text, HTML, Docx)
Document dominant language
Document entities
Key phrases
Sentiment
Document length
Country of origin of the document (metadata taken from the user details – ip address)
Amazon S3 will be the main documents storage. Once a document has been uploaded to S3 (you can easily use the AWS SDK to upload a document to S3 from your application) a notification is sent to an SQS queue and then consumed by a consumer.
The consumer gets the uploaded document and detects the entities/key phrases/sentiment using AWS Comprehend. Then it indexes the document to Elasticsearch. We use the Elasticsearch pre-processor plugins, Attachment Processor and Geoip Processor, to perform the other metadata extraction (more details below).
Here are the main steps performed in the process:
Upload a document to S3 bucket
Event notification from S3 to a SQS queue
Event consumed by a consumer
Entities/key phrases/sentiment detection using AWS Comprehend
Index to Elasticsearch
ES Ingestion pre-processing: extract document metadata using Attachment and Geoip Processor plugin
Search in Elasticsearch by entities/sentiment/key phrases/language/content type/source country and full-text search
Use Kibana for dashboard and search
Use Skedler and Alerts for reporting, monitoring and alerting
In the example, we used AWS S3 as document storage. But you could extend the architecture and use the following:
SharePoint: create an event receiver and once a document has been uploaded extract the metadata and index it to Elasticsearch. Then search and get the document on SharePoint
Box, Dropbox and Google Drive: extract the metadata from the document stored in a folder and then easily search for them
Similar Object storage (i.e. Azure Blob Storage)
Event notification
When a document has been uploaded to the S3 bucket a message will be sent to an Amazon SQS queue. You can read more information on how to configure the S3 Bucket and read the queue programmatically here: Configuring Amazon S3 Event Notifications.
This is how a message notified from S3 looks. The information we need are the sourceIPAddress and object key
Now that the S3 bucket has been configured, when a document is uploaded to the bucket a notification will be sent to the SQS queue. We are going to build a consumer that will read this message and perform the instances/key phrases/sentiment detection using AWS Comprehend. You can eventually read a set of messages (change the MaxNumberOfMessages parameter) from the queue and run the task against a set of documents (batch processing).
With this code you can read the messages from a SQS queue and fetch the bucket and key (used in S3) of the uploaded document and use them to invoke AWS Comprehend for the metadata detection task:
Amazon Comprehend is a new AWS service presented at the re:invent 2017. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is, and automatically organizes a collection of text files by topic. – AWS Service Page
It analyzes text and tells you what it finds, starting with the language, from Afrikaans to Yoruba, with 98 more in between. It can identify different types of entities (people, places, brands, products, and so forth), key phrases, sentiment (positive, negative, mixed, or neutral), and extract key phrases, all from a text in English or Spanish. Finally, Comprehend‘s topic modeling service extracts topics from large sets of documents for analysis or topic-based grouping. – Jeff Barr – Amazon Comprehend – Continuously Trained Natural Language Processing.
Given a document, we now have a set of metadata that identify it. Next, we index these metadata to Elasticsearch and use a pipeline to extract the other metadata. To do so, I created a new index called library and a new type called document.
Since we are going to use Elasticsearch 6.0 and Kibana 6.0, I suggested you read the following resource:
To pre-process documents before indexing it, we define a pipeline that specifies a series of processors. Each processor transforms the document in some way. For example, you may have a pipeline that consists of one processor that removes a field from the document followed by another processor that renames a field. Our pipeline will extract the document metadata (from the encoded base64) and the location information from the ip address.
document = create_es_document(‘the title of the document’, base64data, ‘xxx.xxx.xx.xx’, [‘entity1’, ‘entity2’], [‘k1′,’k2’], ‘positive’ ‘https://your_bucket.s3.amazonaws.com/your_object_key’]
es_client.index(index=’library’, doc_type=’document’, body=document, pipeline=’documentpipeline’) # note the pipeline here
This is how an indexed document looks like. Notice the attachment and geoip section. We have the language, content type, length and user location details.
“s3Location”: “https://your_bucket.s3.amazonaws.com/A Christmas Carol, by Charles Dickens.docx”,
“title”: “A Christmas Carol, by Charles Dickens.docx”
}
}
Visualize, Report, and Monitor
With Kibana you can create a set of visualizations/dashboards to search for documents by entities and to monitor index metrics (like number of document by language, most contributing countries, document by content type and so on).
Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.
Example of Kibana dashboard:
Number of documents by language and countries that upload more documents.
Countries by the number of uploaded documents.
If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of documents by country or documents by language reaches a certain value, you can use Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notification (it supports multiple notifications, from Email to Slack and Webhook).
Conclusion
In this post we have seen how to use Elasticsearch as the search engine for documents metadata. You can extend your system by adding this pipeline to automatically extract the document metadata and index them to Elasticsearch for fast search (semantic search).
By automatically extracting the metadata from your documents you can easily classify and search (Knowledge management and discovery) for them by content, entities, content type, dominant content language and source country (from where the document has been uploaded).
I ran this demo using the following environment configurations:
In this post we are going to see how to build a machine learning system to perform the image recognition task and use Elasticsearch as search engine to search for the labels identified within the images. The image recognition task is the process of identifying and detecting an object or a feature in a digital image or video.
The components that we will use are the following:
Elasticsearch
Kibana
Skedler Reports and Alerts
Amazon S3 bucket
Amazon Simple Queue Service (eventually you can replace this with AWS Lambda)
Amazon Rekognition
The idea is to build a system that will process the image recognition task against some images stored in a S3 bucket and will index the results (set of labels and % of confidence) to Elasticsearch. So we are going to use Elasticsearch as search engine for the labels found in the images.
If you are not familiar with one or more on the item listed before, I suggest you to read more about them here:
Amazon Rekognition is a service that makes it easy to add image analysis to your applications. With Rekognition, you can detect objects, scenes, faces; recognize celebrities; and identify inappropriate content in images.
These are the main steps performed in the process:
Upload an image to S3 bucket
Event notification from S3 to an SQS queue
Event consumed by a consumer
Image recognition on the image using AWS Rekognition
The result of the labels detection is indexed in Elasticsearch
Search in Elasticsearch by labels
Get the results from Elasticsearch and get the images from S3
Use Kibana for dashboard and search
Use Skedler and Alerts for reporting, monitoring and alerting
Architecture:
Use Case
This system architecture can be useful when you need to detect the labels in a picture and perform fast searches.
Example of applications:
Smart photo gallery: find things in your photo. Detect labels in an automatic way and use Elasticsearch to search them
Product Catalog: automatically classify the products of a catalog. Take photos of a product and get it classified
Content moderation: get notified when a NSFW content is uploaded
Accessibility camera app: help people with disability to see and take pictures
Event notification
When an image is uploaded to the S3 bucket a message will be stored to a Amazon SQS queue. You can read more information on how to configure the S3 Bucket and to read the queue programmatically here: Configuring Amazon S3 Event Notifications.
This is how a message notified from S3 looks like (we need the bucket and key information):
{“Records”:
[{
“eventSource”:”aws:s3″,
“awsRegion”:”your_aws_region”,
“eventTime”:”2017-11-16T12:47:48.435Z”,
“eventName”:”ObjectCreated:Put”,
“s3”:{
“configurationId”:”myS3EventCOnfiguration”,
“bucket”:{
“name”:”yourBucket”,
“arn”:”arn:aws:s3:::yourBucket”
},
“object”:{
“key”:”myImage.jpg”,
“eTag”:”790xxxxxe255kkkk011ayyyy908″
}
}
}]
}
Consume messages from the Amazon SQS queue
Now that the S3 bucket is configured, when an image is uploaded to the bucket an event will be notified and a message saved the SQS queue. We are going to build a consumer that will read this message and perform the image labels detection using AWS Rekognition. You can eventually read a set of messages (change the MaxNumberOfMessages parameter) from the queue and run the task against a set of images (batch processing) or use a AWS Lambda notification (instead of SQS).
With this code you can read the messages from a SQS queue and fetch the bucket and key (used in S3) of the uploaded file and use them to invoke AWS Rekognition for the labels detection task:
# Here we will run the image labels detection with AWS Rekognition (we will need the Bucket name and file key)
# and index the labels detection result to Elasticsearch
# invoke_aws_reko(BUCKET_NAME, filename_key)
message.delete()
time.sleep(10)
Image recognition task
Now that we have the key of the uploaded image we can use AWS Rekognition to run the image recognition task.
The following function invoke the detect_labels method to get the labels of the image. It returns a dictionary with the identified labels and % of confidence.
# We will index these information to Elasticsearch
return result_dictionary
Index to Elasticsearch
So given an image we have now a set of labels that identify it. We want now to index these labels to Elasticsearch. To do so, I created a new index called imagerepository and a new type called image.
The image type we are going to create will have the following properties:
title: the title of the image
s3_location: the link to the S3 resource
labels: field that will contain the result of the detection task
For the labels property I used the Nested datatype. It allows arrays of objects to be indexed and queried independently of each other.
Images that represent a pizza with at least 90% of probability:
post imagerepository/_search
{
“query”: {
“nested”: {
“path”: “labels”,
“query”: {
“bool”: {
“must”: [
{
“match”: {
“labels.label”: “pizza”
}
},
{
“range”: {
“labels.confidence”:{
“gte”: 0.90
}
}
}
]
}
}
}
}
}
Visualize and monitor
With Kibana you can create set of visualizations/dashboards to search for images by label and to monitor index metrics (like number of pictures by label, most common labels and so on).
Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Review.
With Kibana you can create a dashboard to visualize the number of labels. Here you can see and example with bar/pie chart and tag cloud visualization (you can easily schedule and distribute the dashboard with Skedler).
If you want to get notified when something happen in your index, for example, a certain labels is detected or the number of labels reach a certain number, you can use Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notification (it supports multiple notifications, from Email to Slack and Webhook).
Conclusion
In this post we have seen how to combine the power of Elasticsearch’s search with the powerful machine learning service AWS Rekognition. The process pipeline includes also a S3 bucket (where the images are stored) and a SQS Queue used to receive event notifications when a new image is stored to S3 (and it is ready for the image labels detection task). This use case show you how to use Elasticsearch as a search engine not only for logs.
If you are familiar with AWS Lambda you can replace the SQS Queue and the consumer with a function (S3 notification supports AWS Lambda as destination) and call the AWS rekognition service from your Lambda function. Keep in mind that with Lambda you have a 5 minutes execution limit and you can’t invoce the function on batch on a set of images (so you will pay the Lambda execution for each image).
I ran this demo using the following environment configuration:
As the majority of IT-savvy businesses synchronize with the cloud, capital infrastructure charges are becoming rapidly replaced with more lucrative options which optimize and scale company costs. Rather than IT planning for weeks on end, users are able to instantly discover thousands of servers within minutes for more rapid, effectual results. This starts with Amazon Elasticsearch Service: the cloud’s top trending tool for Elasticsearch-Logstash-Kibana applications.
With Amazon Elasticsearch Service, you’re able to effortlessly scale log analytics, text searches and monitoring within a few minutes. It also allows you to exercise Elasticsearch’s APIs and real-time functionalities along with its scalability and security models by production workloads, including Kibana, Logstash, and AWS; enabling you to start to action your data insights effectively and quickly. Simply run the AWS Elasticsearch service, or run your own cluster using the AWS EC2.
Server or Cluster?
If you choose to run the server on AWS, there are a few clear advantages. Primarily, you won’t have to manually replace failed nodes, since you can add it to the cluster as you go. You can add and remove nodes through an API and manage access rights via IAM, which is far easier than setting up a reverse proxy, as well as receive daily insights into daily snapshots to S3, including CloudWatch monitoring for your Elasticsearch cluster.
On the other hand, if you choose to download and run, you’ll have more instance types and sizes available. In this case, you’d be able to use bigger i2 instances than AWS alone, allowing you to scale further and get more insights into logs and metrics. You’re also able to alter the index settings with more detail than just analysis and replicas, such as with delayed allocation, which often carries a lot of data per node. Additionally, you can modify more cluster-wide settings than in AWS alone, as well as gain access to all other APIs, which is particularly useful when debugging. And whilst Cloudwatch collects a reasonable amount of metrics, with EC2 you can utilize a more comprehensive Elasticsearch monitoring solution, with clusters of more than 20 nodes at a time.
Amazon Elasticsearch Service within Skedler
Regardless of which route you take, the ultimate challenge is learning how to add reporting for your ELK application within the Amazon Elasticsearch Service. In this instance, Skedler accounts for proven reporting solutions specifically with ELK applications, with many of our customers running ELK and Skedler together on AWS to account for both log and IOT analytics, as well as several other features. Skedler provides helpful deployment options to add reporting to your existing AWS ELK application, including Skedler as a service within EC2, or Skedler AWS Containers service; both of which support Amazon Simple Email Service (SES) for emailing reports reliably and economically. Ready to try Amazon Elasticsearch Service with and through Skedler? Try it free. After the free trial period, you can purchase Skedler license and instantly convert your evaluation environment into a production environment.