Elasticsearch Interview Questions & Answers – Set 1


In this post, you will learn about fundamentals and best practices with ElasticSearch based on the following:

  • Revision notes on Elasticsearch fundamentals
  • A set of questions to test your knowledge and, in turn, help you learn Elasticsearch concepts related to index and shards; These questions could as well help you prepare for interviews related to ElasticSearch
  • A set of interview questions

ElasticSearch Fundamentals – Revision Notes

  • Each Elasticsearch shard is a Lucene index
  • The number of shards and replicas can be defined per index at the time of creation of the index. The number of replicas per shard can later be changed.
  • Shard in ElasticSearch is primarily a Lucene index made up of one or more Lucene segments which store the document data in form of an inverted index.
  • Lucene segments are immutable
  • Average shard size could vary from 10GB to 40 GB depending upon the nature of data stored in the index. It is commonly seen that time-based data is stored in shard size of 20-40 GB.
  • It is recommended to run force-merge operation of merging multiple smaller segments into a larger one in off-peak hours (when no more data is written to the index).
  • It is recommended to have 20-25 shards per GB heap space. Thus, a node with 20 GB heap can have 400-500 shards.
  • Each shard has metadata related to shard and segment which needs to be stored in memory, and thus, use heap space.
  • The size of the shard could be managed based on one of the following techniques:
    • Creating shards based on time-based indexing
    • Creating shards based on documents count for each shard and using rollover API
    • Merging/shrinking existing shards into new shard using Shrink APIs
  • It is recommended to determine the maximum shard size from a query performance perspective based on the benchmark using realistic data and queries. There is no thumb rule or one-size-fits-all solution to this.
  • It is recommended to use time-based indices for managing data retention whenever possible. Data can be grouped into indices based on the retention period. This makes it manage the indices in terms of creating and deleting the indices.

Sample Interview Questions

  • Explain the concepts of the cluster, node, index, shard, and replicas?
  • How to determine the shard size? What is recommended as the┬ásize of shard consisting of time-based data?
  • How does update and delete documents from index works?
  • How many shards can be allocated to a node having the memory of 20 GB or so?
  • Explain Lucene segments and merging of segments?
  • What is rollover and shrink APIs used for?

Sample Quiz (Objective Questions) on ElasticSearch

How many shards are created by default when elasticsearch server starts?

How many replicas are created by default for each shard?

How many shards including primary and replica shards in total are created by default?

Shards can further be splitted into multiple shards

Number of shards of an index can be changed at any point of time

Data is available for querying as soon as _______

Lucene segments are immutable

Updating a document results in which of the following

Deleting the document results in which of the following

The more heap space a node has, the more data and shards it can handle.

Number of shards on a node depends upon the available heap space

Smaller the shard size, smaller is the segment, greater is the overhead

Each query is executed in a single thread per shard

Which of the following API are used to create a new index given a pre-defined count of documents to be stored in an index is reached?

Which of the following API is used to reduce the number of shards in case many shards have been configured initially

Creating multiple shards of an index and partioning the data into different indices are one and the same thing

Further Reading / References


In this post, you learned about quick concepts, sample interview questions and quiz related to Elasticsearch. Did you find this article useful? Do you have any questions or suggestions about this article? Leave a comment and ask your questions and I shall do my best to address your queries.

Ajitesh Kumar

Ajitesh Kumar

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.

He has also authored the book, Building Web Apps with Spring 5 and Angular.
Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.