Interview questions

Elasticsearch Interview Questions & Answers – Set 1

In this post, you will learn about fundamentals and best practices with ElasticSearch based on the following:

  • Revision notes on Elasticsearch fundamentals
  • A set of questions to test your knowledge and, in turn, help you learn Elasticsearch concepts related to index and shards; These questions could as well help you prepare for interviews related to ElasticSearch
  • A set of interview questions

ElasticSearch Fundamentals – Revision Notes

  • Each Elasticsearch shard is a Lucene index
  • The number of shards and replicas can be defined per index at the time of creation of the index. The number of replicas per shard can later be changed.
  • Shard in ElasticSearch is primarily a Lucene index made up of one or more Lucene segments which store the document data in form of an inverted index.
  • Lucene segments are immutable
  • Average shard size could vary from 10GB to 40 GB depending upon the nature of data stored in the index. It is commonly seen that time-based data is stored in shard size of 20-40 GB.
  • It is recommended to run force-merge operation of merging multiple smaller segments into a larger one in off-peak hours (when no more data is written to the index).
  • It is recommended to have 20-25 shards per GB heap space. Thus, a node with 20 GB heap can have 400-500 shards.
  • Each shard has metadata related to shard and segment which needs to be stored in memory, and thus, use heap space.
  • The size of the shard could be managed based on one of the following techniques:
    • Creating shards based on time-based indexing
    • Creating shards based on documents count for each shard and using rollover API
    • Merging/shrinking existing shards into new shard using Shrink APIs
  • It is recommended to determine the maximum shard size from a query performance perspective based on the benchmark using realistic data and queries. There is no thumb rule or one-size-fits-all solution to this.
  • It is recommended to use time-based indices for managing data retention whenever possible. Data can be grouped into indices based on the retention period. This makes it manage the indices in terms of creating and deleting the indices.

Sample Interview Questions

  • Explain the concepts of the cluster, node, index, shard, and replicas?
  • How to determine the shard size? What is recommended as the size of shard consisting of time-based data?
  • How does update and delete documents from index works?
  • How many shards can be allocated to a node having the memory of 20 GB or so?
  • Explain Lucene segments and merging of segments?
  • What is rollover and shrink APIs used for?

Sample Quiz (Objective Questions) on ElasticSearch

How many shards are created by default when elasticsearch server starts?

Correct! Wrong!

How many replicas are created by default for each shard?

Correct! Wrong!

How many shards including primary and replica shards in total are created by default?

Correct! Wrong!

Shards can further be splitted into multiple shards

Correct! Wrong!

Number of shards of an index can be changed at any point of time

Correct! Wrong!

Data is available for querying as soon as _______

Correct! Wrong!

Lucene segments are immutable

Correct! Wrong!

Updating a document results in which of the following

Correct! Wrong!

Deleting the document results in which of the following

Correct! Wrong!

The more heap space a node has, the more data and shards it can handle.

Correct! Wrong!

Number of shards on a node depends upon the available heap space

Correct! Wrong!

Smaller the shard size, smaller is the segment, greater is the overhead

Correct! Wrong!

Each query is executed in a single thread per shard

Correct! Wrong!

Which of the following API are used to create a new index given a pre-defined count of documents to be stored in an index is reached?

Correct! Wrong!

Which of the following API is used to reduce the number of shards in case many shards have been configured initially

Correct! Wrong!

Creating multiple shards of an index and partioning the data into different indices are one and the same thing

Correct! Wrong!

ElasticSearch Interview Questions on Index, Shards
You did extremely well!!
You did reasonably well!!
Better luck next time!!

Share your Results:

Further Reading / References

Summary

In this post, you learned about quick concepts, sample interview questions and quiz related to Elasticsearch. Did you find this article useful? Do you have any questions or suggestions about this article? Leave a comment and ask your questions and I shall do my best to address your queries.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking. Check out my other blog, Revive-n-Thrive.com

Recent Posts

Autoencoder vs Variational Autoencoder (VAE): Differences

Last updated: 08th May, 2024 In the world of generative AI models, autoencoders (AE) and…

1 day ago

Linear Regression T-test: Formula, Example

Last updated: 7th May, 2024 Linear regression is a popular statistical method used to model…

2 days ago

Feature Engineering in Machine Learning: Python Examples

Last updated: 3rd May, 2024 Have you ever wondered why some machine learning models perform…

6 days ago

Feature Selection vs Feature Extraction: Machine Learning

Last updated: 2nd May, 2024 The success of machine learning models often depends on the…

7 days ago

Model Selection by Evaluating Bias & Variance: Example

When working on a machine learning project, one of the key challenges faced by data…

1 week ago

Bias-Variance Trade-off in Machine Learning: Examples

Last updated: 1st May, 2024 The bias-variance trade-off is a fundamental concept in machine learning…

1 week ago