Businesses today rely heavily on professional applications for every daily task or critical operation. These applications should be set up with secure user authentication. Solutions such as LDAP will allow the users to save more time on manual management of critical information and avoid its risks. LDAP integration is one of the features to look for while selecting an application for your daily tasks.
LDAP
LDAP or Lightweight Directory Access Protocol is a software protocol that allows individual users and applications to find and verify whatever information they need within their organization. It has been used as a database of information, primarily storing information like:
Users
Attributes about those users
Group membership privileges
Organizations then used this information to enable authentication to IT resources such as applications or servers. The LDAP database validates whether users would have access to the applications by verifying the user’s credentials.
LDAP authentication
A user cannot access information stored within an LDAP database or directory without first authenticating (proving they are who they say they are). The database typically contains user, group, and permission information and delivers requested information to connected applications.
LDAP authentication involves verifying provided usernames and passwords by connecting with a directory service that uses the LDAP protocol. Some LDAP directory servers are OpenLDAP, MS Active Directory, and OpenDJ.
LDAP Integration with Skedler
Skedler is a report-automation tool created to reduce the amount of time and money invested daily in a cumbersome task in data analytics such as reporting. Generating and distributing reports from Security Onion, Kibana and Grafana has never been easier. With Skedler, MSSPs can generate compliance reports (e.g., PCI, ASV reports) quickly and efficiently to save countless man-hours, deliver reports 10x faster, and enable their customers to mitigate vulnerabilities more quickly.
If you have not already, download Skedler to check out how easy it is to automate your reports. You will be shocked at the amount of time saved every day!
Mechanism
Like any other LDAP-integrated application, Skedler also uses the integration to authenticate users based on their LDAP credentials. Once the LDAP integration is completed in the Skedler Admin account, any user with correct credentials from the LDAP database can log in to Skedker without creating a separate Skedler account.
When a new user attempts to log in to Skedler, the integration checks to see if this user has an existing Skedler account. If not, it automatically queries the LDAP server for the entered username and password. If a matching LDAP account is found, Skedler creates a new account for the user and logs the user into the organization.
Steps for LDAP integration with Skedler
New User
Before going through the steps to integrate Skedler with LDAP authentication, please refer to this documentation to install Skedler and activate the license. Once Skedler is installed on your machine, follow these steps to integrate LDAP authentication.
Add LDAP configuration to the Skedler reporting.yml file.
Sign in with LDAP user credentials
Skedler then validates the entered credentials with the LDAP server. Based on the reporting.yml configuration, Skedler will map the user to the respective roles and organizations.
If you are an existing Skedler user, you can follow these steps to incorporate LDAP authentication:
Upgrade to the Skedler’s latest version
Configure the reporting.yml file
Restart the server
Login using LDAP credentials
Note that if you are adding new roles or organizations to your LDAP server, you have to add the same in Skedler reporting.yml file as well.
Future of Skedler with LDAP integration
This is an opportunity to dedicate your time to areas of innovation and remediation! Skedler is here to help you bring more value to your product, customers, and other stakeholders by automating your cumbersome daily task of reporting.
MSSPs commonly lack visibility into user accounts and activity. They manually manage their resource access, accounting for a decentralized and unorganized Identity and Access Management model filled with redundancies, friction, and security risk. With the new LDAP integration, Skedler can easily identify new or existing users and log them in securely in no time, without ever going to the Admin to ask for credentials or permission.
With Skedler, you can save time, secure your business and provide a seamless employee and customer experience.
Connect Skedler with Kibana, Grafana, and Security Onion in seconds. Automate your reports on hourly, daily, weekly, monthly, and yearly schedules and put them on auto-pilot! Click on this button to get 250 reports free for 15 days!
In this webinar, we discuss how you can “Save Time & Money with Automated Reports and Alerts for Elastic Stack and Grafana.” For any business to run smoothly, you have to pay close attention to all its underlying processes and how it is being managed. How can you do this? Through the act of business process automation (BPA), specifically automated reports and alerts. Organizations using Elastic Stack or Grafana for analytics and monitoring need automated reports and alerts so that users can stay informed with actionable information even when they are away from the dashboards.
Access the Webinar Below
Given the challenges with X-Pack (aka Elastic Stack features) and custom solutions, what users need is a reporting and alerting solution for Elastic Stack and Grafana that is robust, cost-efficient, and easy to deploy so that you can focus on your core business. In this webinar, we explain how Skedler can help you reach those exact goals.
With Skedler, you can save tens of thousands of dollars, add reporting and alerting instantly, and achieve a return on investment within a month.
Generate Report from Grafana in Minutes with Skedler. Fully featured free trial.
Additional Benefits
Quick to install and configure
Intuitive easy to use UI for scheduling reports and creating alerts
Flexible reporting and alerting framework to meet complex requirements
Faster troubleshooting with drill down to root cause data
No scripting required for alerts
Trusted by enterprises of all sizes
Schedule and Automate Your Grafana Reports Free with Skedler. Fully featured free trial.
How do you stay up to date on the critical events in your log analytics platform? Do you spend tens of thousands of dollars and countless hours to create reports and alerts from your Elastic Stack or Grafana application?
Whatever critical scenario arises, receiving the right information at the right time can ultimately be the difference between success and failure. Therefore, it’s important to be constantly aware of every situation, whether it be business partners, operations, customers, or employees, is crucial. The faster a possible issue is identified the faster it can be solved.
Benefits of Automation
Join us in the upcoming webinar on Tuesday, December 18th, 2018 @10AM PST to learn how Skedler, which installs in minutes, can help you save time & money with automated reports and alerts for Elastic Stack & Grafana.
You’ll learn how to quickly add reporting and alerting for Elastic Stack and Grafana while seeing how Skedler can provide a flexible framework to meet your complex monitoring requirements. Be ready with your questions and we’ll be more than happy to discuss them in the webinar Q&A session.
In the previous post, we presented a system architecture to translate text from multiple languages with AWS Translate, index this information in Elasticsearch 6.2.3 for fast search, visualize the data with Kibana, and automated sending of customized intelligence with Skedler Reports and Alerts.
In this post, we are going to see how to implement the previously described architecture.
The main steps are:
Define API Gateway HTTP endpoints
Build a AWS Lambda Function
Deploy to AWS with Serverless framework
Translate text with AWS Translate
Index to Elasticsearch 6.2.3
Search in Elasticsearch by language – full-text search
Visualize, report and monitor with Kibana dashboards
Use Skedler Reports and Alerts for reporting, monitoring and alerting
We are going to define two HTTP API methods, one to translate and index new inquiries and another one to search for them. We will use AWS API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.
The core of our solution will receive the inquiry (as a string) to be translated, translate it and index the text with the translations to Elasticsearch.
We will use AWS Lambda. It lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.
To deploy our AWS Lambda function and the API Gateway HTTP endpoints, we will use the Serverless Framework. Serverless is a toolkit for deploying and operating serverless architectures.
1. API Gateway
We are going to configure the following HTTP enpoints:
HTTP POST /createTicket
HTTP GET /searchTicket
The createTicket endpoint will be used to translate the text using AWS Translate and to index the document in Elasticsearch. The searchTicket endpoint will be used to search for documents in different languages. The handler for each endpoint will be an AWS Lambda function.
Below is the serverless.yml section where we have defined the two endpoints.
functions:
create:
handler: handler.create_ticket
events:
– http:
path: createTicket
method: post
search:
handler: handler.search_ticket
events:
– http:
path: searchTicket
method: get
2. AWS Lambda
Once we have defined the two endpoints, we need to write the Lambda function. We do not focus on the deploy of the function in AWS, the Serverless framework will take care of it. The function we are going to write will perform the following:
Get the input that need be translated (the customer inquiry)
Invoke AWS Translate and get the translations of the input
We specified the provider (AWS), the runtime (Python 3.6), the environment variables, our HTTP endpoints and the AWS Lambda function handlers.
Deploy to AWS:
serveless deploy –aws-s3-accelerate
4. Index to Elasticsearch
Given an inquiry, we now have a list of translations. Now, we want to index this information to Elasticsearch 6.2.3. We create a new index called customercare and a new type called ticket.
The ticket type will have the following properties:
text: the English text
language: the language of the text
country: the country from where we received the inquiry
ticket number: an ID generated to uniquely identify an inquiry
timestamp: index time
translations: list of the translations (text and language)
PUT /customercare
{
“mappings”: {
“ticket”: {
“properties”: {
“text”: { “type”: “text” },
“language”: { “type”: “keyword” },
“country”: { “type”: “keyword” },
“ticket_number”: { “type”: “keyword” },
“timestamp”: {“type”: “date”},
“translations”: {
“type”: “nested”,
“properties”: {
“text”: { “type”: “text” },
“language”: { “type”: “keyword” }
}
}
}
}
}
}
5. Search in Elasticsearch
Now that we indexed the data in Elasticsearch, we can perform some queries to search in a multi-lingual way.
Examples:
Full-text search through translations:
GET customercare/_search
{
“query”: {
“nested”: {
“path”: “translations”,
“query”: {
“match”: {
“translations.text”: “your text”
}
}
}
}
}
Full-text search through English text and translations:
GET customercare/_search
{
“query”: {
“bool”: {
“should”: [
{
“nested”: {
“path”: “translations”,
“query”: {
“match”: {
“translations.text”: “tree”
}
}
}
},
{
“term”: {
“text”: “tree”
}
}
]
}
}
}
Number of inquiries by a customer (full-text search):
GET customercare/_search
{
“aggs”: {
“genres”: {
“terms”: {
“field”: “customerId”
}
}
},
“query”: {
“bool”: {
“should”: [
{
“match”: {
“text”: “tree”
}
}
]
}
}
}
6. Visualize, Report, and Monitor with Kibana dashboards and search
With Kibana you can create a set of visualizations/dashboards to search for inquiries by language and to monitor index metrics (like number of translations or number of translations by customer).
Examples of Kibana dashboards:
Top languages, languages inquiries by customer and geolocation of inquiries:
inquiries count by languages and customers and top customer by language:
7. Use Skedler Reports and Alerts to easily monitor data
Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.
We have created a custom report using Skedler Report Templates that provides an overview of the tickets based on languages and countries of origin. The custom report generated by Skedler is shown below:
If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of negative feedback by customers crosses a threshold value, you can use Skedler Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notifications (it supports multiple notifications, from Email to Slack and Webhook).
We have seen how to schedule a report generation. We are now going to see how to use Skedler Alerts to get notified when something happens in our index. For example, if the number of inquiries from a specific country hits a certain threshold.
Choose the Alert Condition. For example: “the number of ticket in English must be higher than zero”.
or “the number of ticket in English coming from Italy and containing a given word must be higher than zero”.
The Skedler Alert notification in Slack looks like.
Conclusion
In this two-part blog series, we learnt how to build our own multi-lingual omni-channel customer care platform using AWS Translate, Elasticsearch, and Skedler. Let us know your thoughts about this approach. Send your comments to hello at skedler dot com.
In the previous post, we presented a system architecture to convert audio and voice into written text with AWS Transcribe, extract useful information for quick understanding of content with AWS Comprehend, index this information in Elasticsearch 6.2 for fast search and visualize the data with Kibana 6.2.
In this post we are going to see how to implement the previosly described architecture. The main steps performed in the process are:
Configure S3 Event Notification
Consume messages from Amazon SQS queue
Convert the recording to text with AWS Transcribe
Entities/key phrases/sentiment detection using AWS Comprehend
Index to Elasticsearch 6.2
Search in Elasticsearch by entities/sentiment/key phrases/customer
Visualize, report and monitor with Kibana dashboards
Use Skedler and Alerts for reporting, monitoring and alerting
1. Configure S3 Event Notification
When a new recording has been uploaded to the S3 bucket, a message will be sent to an Amazon SQS queue.
Now that the S3 bucket has been configured, a notification will be sent to the SQS queue when a recording is uploaded to the bucket. We are going to build a consumer that will perform the following operations:
Start a new AWS Transcribe transcription job
Check the status of the job
When the job is done, perform text analysis with AWS Comprehend
Index the results to Elasticsearch
With this code you can read the messages from a SQS queue, fetch the bucket and key (used in S3) of the uploaded document and use them to invoke AWS Transcribe for the speech to text task:
Once we have consumed a S3 message and we have the url of the new uploaded document, we can start a new transcription job (asynchronous) to perform the speech to text task.
We are going to use the start_transcription_job method.
It takes a job name, the S3 url and the media format as parameters.
To use the AWS Transcribe API be sure that your AWS Python SDK – Boto3 is updated.
pip install boto3 –upgrade
import boto3
client_transcribe = boto3.client(
‘transcribe’,
region_name=’us-east-1′ # service still in preview
Due to the asynchronous nature of the transcription job (it could take a while depending on the length and complexity of your recordings), we need to check the job status.
Once the stauts is “COMPLETED” we can retrieve the result of the job (the text converted from the recording).
We have converted our recording to text. Now, we can run the text analysis using AWS Comprehend. The analysis will extract the following elements from the text:
Given a recording, we now have a set of elements that characterize it. Now, we want to index this information to Elasticsearch 6.2. I created a new index called audioarchive and a new type called recording.
The recording type we are going to create will have the following properties:
customer id: the id of the customer who submitted the recording (substring of the s3 key)
entities: the list of entities detected by AWS Comprehend
key phrases: the list of key phrases detected by AWS Comprehend
sentiment: the sentiment of the document detected by AWS Comprehend
6. Search in Elasticsearch by entities, sentiment, key phrases or customer
Now that we indexed the data in Elasticsearch, we can perform some queries to extract business insights from the recordings.
Examples:
Number of positive recordins that contains the _feedback_ key phrases by customer.
POST audioarchive/recording/_search?size=0
{
“aggs”: {
“genres”: {
“terms”: {
“field”: “customerId”
}
}
},
“query”: {
“bool”: {
“must”: [
{
“match”: {
“sentiment”: “Positive”
}
},
{
“match”: {
“keyPhrases”: “feedback”
}
}
]
}
}
}
Number of recordings by sentiment.
POST audioarchive/recording/_search?size=0
{
“aggs”: {
“genres”: {
“terms”: {
“field”: “sentiment”
}
}
}
}
What are the main key phares in the nevative recordings?
POST audioarchive/recording/_search?size=0
{
“aggs”: {
“genres”: {
“terms”: {
“field”: “keyPhrases”
}
}
},
“query”: {
“bool”: {
“should”: [
{
“match”: {
“sentiment”: “Negative”
}
},
{
“match”: {
“sentiment”: “Mixed”
}
}
]
}
}
}
7. Visualize, Report, and Monitor with Kibana dashboards and search
With Kibana you can create a set of visualizations/dashboards to search for recording by customer, entities and to monitor index metrics (like number of positive recordings, number of recordings by customer, most common entities/key phreases in the recordings).
Examples of Kibana dashboards:
Percentage of documents by sentiment, percentage of positive feedback and key phrases:
Number of recordings by customers, and sentiment by customers:
Most common entities and heat map sentiment-entities:
8. Use Skedler Reports and Alerts to easily monitor data
Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.
If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of negative recording by customer reaches a certain value, you can use Skedler Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notification (it supports multiple notifications, from Email to Slack and Webhook).
Conclusion
In this post we have seen how to use Elasticsearch as the search engine for customer recordings. We used the speech to text power of AWS Transcribe to convert our recording to text and then AWS Comprehend to extract semantic information from the text. Then we used Kibana to aggregate the data and create useful visualizations and dashboards. Then scheduled and distribute custom reports from Kibana Dashboards using Skedler Reports.
Have you ever wonder how to easily monitor the performance of your application and how to house your application metrics in Elasticsearch? The answer is Elastic APM. Elastic Application Performance Management(APM) is a new feature available in Elasticsearch 6.1 (in beta and alpha in 6.0). A few months ago, Opbeat (an application performance monitoring – APM – company) joined forces with Elastic which is now Elastic APM.
Adding APM (Application Performance Monitoring) to the Elastic Stack is a natural next step in providing users with end-to-end monitoring, from logging to server-level metrics, to application-level metrics, all the way to the end-user experience in the browser or client.
In this post, we are going to see how to monitor the performance of a Python Flask application using the APM feature of Elasticsearch and how to get notified (webhook or email) when something happens in your application by Skedler Alerts.
Here you can read more about Opbeat acquisition and APM announcement:
First of all, let’s see how does APM work. What is written below is taken from here: APM Overview
APM is an application performance monitoring system built on the Elastic Stack. It uses Elasticsearch as its data store and allows you to monitor the performance of thousands of applications in real time.
With APM, you can automatically collect detailed performance information from inside your applications and it requires only minor changes to your application. APM will automatically instrument your application and measure the response time for incoming requests. It also automatically measures what your application was doing while it was preparing the response.
APM agents are open source libraries written in the same language as your application. You install them into your application as you would install any other library. The agents hook into your application and start collecting performance metrics and errors. All data that gets collected by agents and sent on to APM Server.
APM Server
APM server is an open source application written in Go which runs on your servers. It listens on port 8200 by default and receives data from agents periodically. The API is a simple JSON based HTTP API. APM Server builds Elasticsearch documents from the data received from agents. These documents are stored in an Elasticsearch cluster. A single APM Server process can typically handle data from hundreds of agents.
In this post we are not going to see how to install and configure the APM server, you can read more here (the procedure is well documented): Elastic – APM .
Use case
The new APM feature can be used when you need a free solution to monitor your Python/Node.jS/Ruby/JS application and you want to use the Elasticsearch’s search powers and Kibana’s visualization to look at your applications metrics. If you integrate the alerting feature of Skedler Alerts (licensed) you can get notified in a flexible way, from webhook to email when something happens in your application.
Example of applications and notifications by Alerts:
Back office application is written in Python with the Flask or Django Framework: get a Slack notification when the number of HTTP 4xx errors are higher than a given threshold in the last 30 minutes – access control alert
Application server is written in Node.js: get an email message when a not handled exception is raised by the application in the last hour – error handling alert
Batch process script is written in Ruby: get a daily Slack notification with the details of all the operations of the day – application summary alert
Python Flask Application
Flask is a micro web framework written in Python and based on the Werkzeug toolkit and Jinja2 template engine. You can read more about it here: Welcome to Flask. In this example, we will assume to have some web APIs written with Flask. We want to monitor our application (APIs calls) and get notified when the number of errors is particularly high and when some endpoint get too many calls.
Given a set of Flask API endpoints:
we are going to add few lines of code to send the application metrics to the APM Server (that will index these to Elasticsearch). First of all, install the Elastic APM dependencies:
$ pip install elastic-apm[flask]
and import them:
from elasticapm.contrib.flask import ElasticAPM
Configure Elastic APM in your application by specifying the APM server URL, eventually a secret token (that you set in the APM server config.yml) and your application name.
# configure to use ELASTIC_APM in your application’s settings from elasticapm.contrib.flask import ElasticAPM
app.config[‘ELASTIC_APM’] = {
# allowed app_name chars: a-z, A-Z, 0-9, -, _, and space from elasticapm.contrib.flask
‘APP_NAME’: ‘yourApplicationName’,
#’SECRET_TOKEN’: ‘yourToken’, #if you set on the APM server configuration
‘SERVER_URL’: ‘http://apmServer:8200’ # your APM server url
}
apm = ElasticAPM(app)
We are now monitoring our application and housing our metrics in Elasticsearch! You can monitor addition events or send additional data to the APM server.
Capture exceptions:
try:
1 / 0
except ZeroDivisionError:
apm.capture_exception()
Log generic message:
apm.capture_message(‘hello, world!’)
Send extra information:
@app.route(‘/’)
def bar():
try:
1 / 0
except ZeroDivisionError:
app.logger.error(‘Math is hard’,
exc_info=True,
extra={
‘good_at_math’: False,
}
)
)
Elasticsearch and Kibana
All the collected metrics are stored within an Elasticsearch index APM-6.1.1-* as a doc type. Here an extract of the doc type mapping (related to the HTTP request/response).
Once our application is configured, all the metrics will be stored in Elasticsearch and we can use the default Kibana APM UI to view them.
Response time and response by minutes (HTTP 2xx and HTTP 4xx):
Request details:
Elasticsearch Alerts with Skedler
We are now sending our application’s metrics to Elasticsearch and we have a nice way to view them, but we will not look all the time to the Kibana APM UI to see if everything is ok. Wouldn’t it be nice if we could receive a Slack notification or an email when something is wrong so we can look at the dashboard?
Here Skedler Alerts comes into the picture!
It simplifies how you monitor data in Elasticsearch for abnormal patterns, drill down to root cause and alert using webhooks and email. You can design your rules for detecting patterns, spikes, new events, and threshold violations using Skedler’s easy to use UI. You can correlate across indexes, filter events and compare against baseline conditions to detect abnormal patterns in data.
From the Alerts UI (to see how to install Alerts, take a look here: Install Alerts) let’s define a new Webhook (I took the webhook URL from my Slack team setting, read more here: Slack Incoming Webhook):
We want to get a notification when our application returns an HTTP 400 error. Define a new Alert rule (Threshold type):
Filter by the context.response.status_code == 400 field:
Choose your schedule and action.
In the picture below, the job will run every minute and the notification will be sent to the Slack webhook. You can define your Slack message template.
Once the event is fired we correctly get notified to the Slack channel.
You can now create as many new Alert rules as you need to get notified when something happens is your application.
The applications metrics are written by the APM Server to a standard Elastic Index, so you can write your own Alert rules (no constraints on the APM index).
Here you can find some useful resources about Skedler Alerts:
In this post, we have seen how to monitor the performances of your application with Elastic APM, to automatically send them to Elasticsearch and how to use Skedler Alerts to get notified when something is wrong.
Monitoring the performance of your applications is something that you should always do to improve, fix, and manage your application.
You should use Elastic APM if you look for something free, easy to configure and fully integrated with Elasticsearch (metrics are stored in a normal index) and Kibana (you have a dedicated APM UI and you can build your own dashboards).
You should use Skedler Alerts if you want to be notified about your applications’ metrics. It provides a nice dashboard where you can configure your alert rules and supports webhook and email notifications with a custom template.
How to automatically extract metadata from documents? How to index them and perform fast searches? In this post, we are going to see how to automatically extract metadata from a document using Amazon AWS Comprehend and Elasticsearch 6.0 for fast search and analysis.
This architecture we present improves the search and automatic classification of documents (using the metadata) for your organization.
Using the automatically extracted metadata you can search for documents and find what you need.
Voice of customer analytics: You can use Amazon Comprehend to analyze customer interactions in the form of documents, support emails, online comments, etc., and discover what factors drive the most positive and negative experiences. You can then use these insights to improve your products and services.
Semantic search: You can use Amazon Comprehend to provide a better search experience by enabling your search engine to index key phrases, entities, and sentiment. This enables you to focus the search on the intent and the context of the articles instead of basic keywords.
Knowledge management and discovery: You can use Amazon Comprehend to organize and categorize your documents by topic for easier discovery, and then personalize content recommendations for readers by recommending other articles related to the same topic.
When we talk about metadata, I like the following definition:
Metadata summarizes basic information about data, which can make finding and working with particular instances of data easier. For example, author, date created and date modified and file size are examples of very basic document metadata. Having the ability to filter through that metadata makes it much easier for someone to locate a specific document.
We are going to focus on the following metadata:
Document content type (PDF, Plain Text, HTML, Docx)
Document dominant language
Document entities
Key phrases
Sentiment
Document length
Country of origin of the document (metadata taken from the user details – ip address)
Amazon S3 will be the main documents storage. Once a document has been uploaded to S3 (you can easily use the AWS SDK to upload a document to S3 from your application) a notification is sent to an SQS queue and then consumed by a consumer.
The consumer gets the uploaded document and detects the entities/key phrases/sentiment using AWS Comprehend. Then it indexes the document to Elasticsearch. We use the Elasticsearch pre-processor plugins, Attachment Processor and Geoip Processor, to perform the other metadata extraction (more details below).
Here are the main steps performed in the process:
Upload a document to S3 bucket
Event notification from S3 to a SQS queue
Event consumed by a consumer
Entities/key phrases/sentiment detection using AWS Comprehend
Index to Elasticsearch
ES Ingestion pre-processing: extract document metadata using Attachment and Geoip Processor plugin
Search in Elasticsearch by entities/sentiment/key phrases/language/content type/source country and full-text search
Use Kibana for dashboard and search
Use Skedler and Alerts for reporting, monitoring and alerting
In the example, we used AWS S3 as document storage. But you could extend the architecture and use the following:
SharePoint: create an event receiver and once a document has been uploaded extract the metadata and index it to Elasticsearch. Then search and get the document on SharePoint
Box, Dropbox and Google Drive: extract the metadata from the document stored in a folder and then easily search for them
Similar Object storage (i.e. Azure Blob Storage)
Event notification
When a document has been uploaded to the S3 bucket a message will be sent to an Amazon SQS queue. You can read more information on how to configure the S3 Bucket and read the queue programmatically here: Configuring Amazon S3 Event Notifications.
This is how a message notified from S3 looks. The information we need are the sourceIPAddress and object key
Now that the S3 bucket has been configured, when a document is uploaded to the bucket a notification will be sent to the SQS queue. We are going to build a consumer that will read this message and perform the instances/key phrases/sentiment detection using AWS Comprehend. You can eventually read a set of messages (change the MaxNumberOfMessages parameter) from the queue and run the task against a set of documents (batch processing).
With this code you can read the messages from a SQS queue and fetch the bucket and key (used in S3) of the uploaded document and use them to invoke AWS Comprehend for the metadata detection task:
Amazon Comprehend is a new AWS service presented at the re:invent 2017. Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in text. Amazon Comprehend identifies the language of the text; extracts key phrases, places, people, brands, or events; understands how positive or negative the text is, and automatically organizes a collection of text files by topic. – AWS Service Page
It analyzes text and tells you what it finds, starting with the language, from Afrikaans to Yoruba, with 98 more in between. It can identify different types of entities (people, places, brands, products, and so forth), key phrases, sentiment (positive, negative, mixed, or neutral), and extract key phrases, all from a text in English or Spanish. Finally, Comprehend‘s topic modeling service extracts topics from large sets of documents for analysis or topic-based grouping. – Jeff Barr – Amazon Comprehend – Continuously Trained Natural Language Processing.
Given a document, we now have a set of metadata that identify it. Next, we index these metadata to Elasticsearch and use a pipeline to extract the other metadata. To do so, I created a new index called library and a new type called document.
Since we are going to use Elasticsearch 6.0 and Kibana 6.0, I suggested you read the following resource:
To pre-process documents before indexing it, we define a pipeline that specifies a series of processors. Each processor transforms the document in some way. For example, you may have a pipeline that consists of one processor that removes a field from the document followed by another processor that renames a field. Our pipeline will extract the document metadata (from the encoded base64) and the location information from the ip address.
document = create_es_document(‘the title of the document’, base64data, ‘xxx.xxx.xx.xx’, [‘entity1’, ‘entity2’], [‘k1′,’k2’], ‘positive’ ‘https://your_bucket.s3.amazonaws.com/your_object_key’]
es_client.index(index=’library’, doc_type=’document’, body=document, pipeline=’documentpipeline’) # note the pipeline here
This is how an indexed document looks like. Notice the attachment and geoip section. We have the language, content type, length and user location details.
“s3Location”: “https://your_bucket.s3.amazonaws.com/A Christmas Carol, by Charles Dickens.docx”,
“title”: “A Christmas Carol, by Charles Dickens.docx”
}
}
Visualize, Report, and Monitor
With Kibana you can create a set of visualizations/dashboards to search for documents by entities and to monitor index metrics (like number of document by language, most contributing countries, document by content type and so on).
Using Skedler, an easy to use report scheduling and distribution application for Elasticsearch-Kibana-Grafana, you can centrally schedule and distribute custom reports from Kibana Dashboards and Saved Searches as hourly/daily/weekly/monthly PDF, XLS or PNG reports to various stakeholders. If you want to read more about it: Skedler Overview.
Example of Kibana dashboard:
Number of documents by language and countries that upload more documents.
Countries by the number of uploaded documents.
If you want to get notified when something happens in your index, for example, a certain entity is detected or the number of documents by country or documents by language reaches a certain value, you can use Alerts. It simplifies how you create and manage alert rules for Elasticsearch and it provides a flexible approach to notification (it supports multiple notifications, from Email to Slack and Webhook).
Conclusion
In this post we have seen how to use Elasticsearch as the search engine for documents metadata. You can extend your system by adding this pipeline to automatically extract the document metadata and index them to Elasticsearch for fast search (semantic search).
By automatically extracting the metadata from your documents you can easily classify and search (Knowledge management and discovery) for them by content, entities, content type, dominant content language and source country (from where the document has been uploaded).
I ran this demo using the following environment configurations:
Skedler Alerts provides an easy to use alerting solution for Elasticsearch data. It is designed for users who find:
yaml based rules are difficult to manage and time consuming
alternative pack based plugins are cost prohibitive
Powerful, Yet Easy to Use Alerting for Elasticsearch
Skedler Alerts simplifies how you create and manage alert rules for Elasticsearch. In addition, it provides a flexible approach to notification. You can send notifications via email or slack. You can also integrate alert notifications to your application using webhooks. Last but not the least, with just a few button clicks you can drill down from notifications to root cause documents.
Alerts Installation
Installation of Skedler Alerts is equally simple. Of course, the easiest way to is install Skedler Alerts with Docker. Please note that Skedler Alerts is available for Linux 7 only.
If you would like to install Skedler on a VM, reach out to us via the free trial page. You will hear back from us within 24 hours to help you get started.
Questions/Comments?
Reach out to us via Skedler Forum with your questions and comments. Happy, Easy Alerting!
As the majority of IT-savvy businesses synchronize with the cloud, capital infrastructure charges are becoming rapidly replaced with more lucrative options which optimize and scale company costs. Rather than IT planning for weeks on end, users are able to instantly discover thousands of servers within minutes for more rapid, effectual results. This starts with Amazon Elasticsearch Service: the cloud’s top trending tool for Elasticsearch-Logstash-Kibana applications.
With Amazon Elasticsearch Service, you’re able to effortlessly scale log analytics, text searches and monitoring within a few minutes. It also allows you to exercise Elasticsearch’s APIs and real-time functionalities along with its scalability and security models by production workloads, including Kibana, Logstash, and AWS; enabling you to start to action your data insights effectively and quickly. Simply run the AWS Elasticsearch service, or run your own cluster using the AWS EC2.
Server or Cluster?
If you choose to run the server on AWS, there are a few clear advantages. Primarily, you won’t have to manually replace failed nodes, since you can add it to the cluster as you go. You can add and remove nodes through an API and manage access rights via IAM, which is far easier than setting up a reverse proxy, as well as receive daily insights into daily snapshots to S3, including CloudWatch monitoring for your Elasticsearch cluster.
On the other hand, if you choose to download and run, you’ll have more instance types and sizes available. In this case, you’d be able to use bigger i2 instances than AWS alone, allowing you to scale further and get more insights into logs and metrics. You’re also able to alter the index settings with more detail than just analysis and replicas, such as with delayed allocation, which often carries a lot of data per node. Additionally, you can modify more cluster-wide settings than in AWS alone, as well as gain access to all other APIs, which is particularly useful when debugging. And whilst Cloudwatch collects a reasonable amount of metrics, with EC2 you can utilize a more comprehensive Elasticsearch monitoring solution, with clusters of more than 20 nodes at a time.
Amazon Elasticsearch Service within Skedler
Regardless of which route you take, the ultimate challenge is learning how to add reporting for your ELK application within the Amazon Elasticsearch Service. In this instance, Skedler accounts for proven reporting solutions specifically with ELK applications, with many of our customers running ELK and Skedler together on AWS to account for both log and IOT analytics, as well as several other features. Skedler provides helpful deployment options to add reporting to your existing AWS ELK application, including Skedler as a service within EC2, or Skedler AWS Containers service; both of which support Amazon Simple Email Service (SES) for emailing reports reliably and economically. Ready to try Amazon Elasticsearch Service with and through Skedler? Try it free. After the free trial period, you can purchase Skedler license and instantly convert your evaluation environment into a production environment.
What if you could generate a PDF or XLS report of Kibana dashboard on the fly using REST API, without having to go to the Kibana dashboard or Skedler dashboard? What services would you be able to offer to your users and customers? Would you embed the report generation option in your application? Would you offer a mobile app for analysis and report generation? Would you attach a report on the fly to your alert notification?
Introducing Skedler REST API and iFrame option
Skedler PDF/XLS Report Solution for Kibana now offers two features that opens up a multitude of possibilities:
REST API – allows you to schedule, generate, customize PDF/XLS Reports of Kibana dashboards and Searches from any application.
iFrame – allows you to embed/white label Skedler as an iFrame in your application. Users can seamlessly schedule reports from within your application.
Skedler REST API for Kibana Report Generation
With Skedler REST API for Kibana PDF-XLS report generation you can:
Generate and mail quick reports.
Create reports and filters
Set and get email configuration details and timezone settings.
Get all the scheduled reports and filter list.
Your application will use the standard HTTP methods like GET, PUT, POST and DELETE methods. Because the REST API is based on open standards, you can use any web development language to access the API. You can also use CURL to test these methods.
You can seamlessly integrate Skedler into your customized Kibana application or another application with the Skedler iFrame feature. This removes the header and footer from Skedler application. In addition, you can customize Skedler CSS to match the look-n-feel of your application.
With the iFrame option you can:
Include Skedler Reports, Filters, Email configuration pages individually in your application
You can pass query filters to the Skedler pages to filter the display by user id etc.
You can pass the userId from your authentication tool such as SearchGuard to the Skedler pages.
Feature Availability
Rest APIs and iFrame features are currently in Beta. They are available with Skedler v2.4 release.
Rest APIs is available in the Premier Edition.
iFrame option is available in the Advanced Edition.