Breaking down Elasticsearch in Real-time
There are so many articles on the internet that talk about what elasticsearch is and why you need it. However, when it comes to simplifying the whole elasticsearch/ ELK concept, from a business standpoint, we haven’t really come across a resourceful article or a guide that could help make the buyers’ process easier. This article aims to educate buyers who are considering elasticsearch to understand what it is and why they need it in a simple and jargon-free language.
If you’re a newbie to elasticsearch then you’re just one article away from discovering many of the imperative drivers for using elasticsearch. So, keep reading until the end to know them all.
What’s with elasticsearch? Why are people adopting it widely?
Think of elasticsearch as Google for semi-structured data such as logs. You can simply type in any query to fetch what you want from unstructured or semi structured data. According to Elastic, Elasticsearch is a distributed open-source search and analytics engine for all types of data including textual, numerical, geospatial, structured, and unstructured.
There could be a variety of sources from which you want to collect data such as servers, web servers; network devices such as switches and routers including system metrics, logs and web applications. Before indexing the raw data in Elasticsearch, it needs to be cleaned and enriched after collection. By cleaning and enriching, we mean parsing, normalizing the data before ingestion. Indexing is the process of storing data in elasticsearch in the form of an index where an index is a collection of JSON documents that are correlated to keys ( different field names). Once the data is indexed, users can run complex queries to retrieve large sets of data at an incredibly high speed. The reason as to why Elasticsearch is fast is because, instead of searching the text directly, it searches an index instead. This data can be visualized and viewed on Kibana, which is a data visualization tool to share and manage the dashboards.
How does Elasticsearch help in real-time?
Elasticsearch can be used to index and store all your data logs from a variety of sources. For instance, when a user types in a query in the search tab of an application or applies different filters, the elasticsearch immediately fetches the relevant data and displays it on the frontend tool as a visualization which is called Kibana. Basically, elasticsearch helps in faster retrieval of data.
Now let’s take a look at what makes elasticsearch a favorite tool adopted by a lot of businesses these days.
7 Reasons Why Elasticsearch is Popular among Devops and Secops
#1 . Schema-less. Document Oriented.
Elasticsearch is document-oriented, which does not use schemas and tables to store data. All data is stored in a document form. Therefore, the data is presented in JSON format. So, you can integrate several solutions as it provides you the output in JSON format.
It is scalable across multiple nodes. So, you can start with a single node or two or three nodes (for replication) and scale to tens of nodes as your data grows.
It executes operations on data very fast. It’s incredibly fast relative to other databases like MySQL, RDBMS, etc.
Check elasticsearch vs mongoDB
Check Elasticsearch vs MySQL here
Elasticsearch supports multiple languages.
#5. Autocompletion and Instant Search
Like Google, elasticsearch supports auto-completion and instant search. When you start typing your query, it automatically shows you a dropdown of options to select and complete your query from.
#6. It’s Open Source
Of course, it all comes for free. Elasticsearch is an open-source database search engine. So, anyone can download it without having to pay for the license.
Relational Database (RDBMS) works well for structured data such as tabular data of columns and rows, but is not built for semi-structured or unstructured data such as logs and texts. . This also holds true especially when it comes to managing huge data sets, thus leading to slower fetching of results. However, optimizing RDBMS to overcome its limitations also brings in the following set of limitations such as:
- Every field cannot be indexed
- Updating rows in heavily indexed tables is a lengthy and exhausting process.
Businesses nowadays are looking for alternate ways where data retrieval is super-fast. This is achievable by adopting NoSQL rather than RDBMS. Elasticsearch is one such NoSQL distributed database. Elasticsearch’s flexible data models help users meet the demanding workload and support low latency required for real-time engagement.
Now that we know why a business needs Elasticsearch, let’s look at one of the most successful precursors that used Elasticsearch to successfully improve their business and whose story would strongly resonate with us all.
How Netflix uses Elasticsearch to improve its messaging and customer operations
Netflix heavily relies on ELK for various use cases to monitor and analyze customer service operations and security logs. The company chose Elasticsearch for its automatic sharding and replication, flexible schema, nice extension model, and ecosystem with many plugins.
In 2007, Netflix expanded its business with the introduction of streaming media. By Jan 2016, Netflix services were available in 190 countries, and in Jan 2017, it reported having around 93 million subscribers worldwide including more than 49 million subscribers just in the United States. With such a humongous user base, Netflix messages millions of customers a day on many channels which includes text messaging, emails, push notification, voice calls, etc.. and these are sent to the customers via their messaging platform which is made up of a series of separate applications.
Now let’s uncover how Netflix used elastic stack for higher message deliverability and operational excellence.
Netflix used Elastic stack to monitor and ensure that the messages are delivered to the customers promptly. There were multiple stages involved during each message delivery and hence the message was tracked at each stage.
There was one use case where they had to determine all the countries from which the customers used phone number verification to verify their Netflix accounts. Now, the events triggered for each stage were stored in elasticsearch and they used Kibana to visualize the data. It was found that the UK, Brazil, and the USA has the highest number of customers using the phone number verification method. But considering the success percentage of verification, Brazil had only 70% success rate thus leaving them to ponder the reason for such low percentage. They hence, drilled down the issue to determine the underlying reason. From the analysis, they identified that the users had given an invalid phone number. In the first take, which seemed like nearly 30% of the customers were using an invalid phone number. That’s quite unrealistic. Thus they doubled down on determining the root cause of this problem further.
Theories to low success percentage in phone number verification:
- Customers used landline numbers. However, this theory was later dismissed.
- The second theory was that, in certain regions of Brazil, the 9th digit prefix to mobile numbers had been mandated to which the users weren’t accustomed as of then. Since the SMS provider in Brazil expected the 9th digit in the phone number, Netflix users were facing an issue in validating their Netflix account with their phone number.
Thanks to Elastic Stack, it easily pointed out the issue.
One thing to be wary of when considering the ELK stack for your business
Though Elasticsearch is a magical tool that can shoo away all your customer operations and security log woes, remember, great things do come with downsides too. Providing such incredible features for free, elastic doesn’t meet the customers’ needs when it comes to enterprise features such as machine learning, reporting. To use the Kibana reporting option which is in-built in the ELK stack, users need to upgrade to the gold or silver or platinum plans which are priced insanely high.
For this reason, a lot of ELK users end up creating manual reports (also known as copy-pasting dashboard screenshots) thus shelling out around 10-15 hours of manual labor every week ( if they require weekly reports ).
If you can relate to this then you should also check how you can step up from your manual elasticsearch reporting chaos.
On the other hand, some folks are constantly on the lookout for a good Kibana reporting alternative.
PS: We realized that there’s no one go-to article for a potential elasticsearch buyer to learn about elasticsearch reporting during the early stages of their buying process. Hence, we put together Everything you need to know when considering elasticsearch reporting for your business for you to read and learn the ins and outs of elasticsearch reporting from which you can drive actions immediately.