Elasticsearch Interview Questions & Answers – Set 1

Python interview questions and answers

In this post, you will learn about fundamentals and best practices with ElasticSearch based on the following:

  • Revision notes on Elasticsearch fundamentals
  • A set of questions to test your knowledge and, in turn, help you learn Elasticsearch concepts related to index and shards; These questions could as well help you prepare for interviews related to ElasticSearch
  • A set of interview questions

ElasticSearch Fundamentals – Revision Notes

  • Each Elasticsearch shard is a Lucene index
  • The number of shards and replicas can be defined per index at the time of creation of the index. The number of replicas per shard can later be changed.
  • Shard in ElasticSearch is primarily a Lucene index made up of one or more Lucene segments which store the document data in form of an inverted index.
  • Lucene segments are immutable
  • Average shard size could vary from 10GB to 40 GB depending upon the nature of data stored in the index. It is commonly seen that time-based data is stored in shard size of 20-40 GB.
  • It is recommended to run force-merge operation of merging multiple smaller segments into a larger one in off-peak hours (when no more data is written to the index).
  • It is recommended to have 20-25 shards per GB heap space. Thus, a node with 20 GB heap can have 400-500 shards.
  • Each shard has metadata related to shard and segment which needs to be stored in memory, and thus, use heap space.
  • The size of the shard could be managed based on one of the following techniques:
    • Creating shards based on time-based indexing
    • Creating shards based on documents count for each shard and using rollover API
    • Merging/shrinking existing shards into new shard using Shrink APIs
  • It is recommended to determine the maximum shard size from a query performance perspective based on the benchmark using realistic data and queries. There is no thumb rule or one-size-fits-all solution to this.
  • It is recommended to use time-based indices for managing data retention whenever possible. Data can be grouped into indices based on the retention period. This makes it manage the indices in terms of creating and deleting the indices.

Sample Interview Questions

  • Explain the concepts of the cluster, node, index, shard, and replicas?
  • How to determine the shard size? What is recommended as the size of shard consisting of time-based data?
  • How does update and delete documents from index works?
  • How many shards can be allocated to a node having the memory of 20 GB or so?
  • Explain Lucene segments and merging of segments?
  • What is rollover and shrink APIs used for?

Sample Quiz (Objective Questions) on ElasticSearch

[wp_quiz id=”6854″]

Further Reading / References

Summary

In this post, you learned about quick concepts, sample interview questions and quiz related to Elasticsearch. Did you find this article useful? Do you have any questions or suggestions about this article? Leave a comment and ask your questions and I shall do my best to address your queries.

Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.
Posted in ElasticSearch, Interview questions, Java. Tagged with , , .