Elasticsearch Interview Questions & Answers - Set 1

In this post, you will learn about fundamentals and best practices with ElasticSearch based on the following:

Revision notes on Elasticsearch fundamentals
A set of questions to test your knowledge and, in turn, help you learn Elasticsearch concepts related to index and shards; These questions could as well help you prepare for interviews related to ElasticSearch
A set of interview questions

Table of Contents

ElasticSearch Fundamentals – Revision Notes

Each Elasticsearch shard is a Lucene index
The number of shards and replicas can be defined per index at the time of creation of the index. The number of replicas per shard can later be changed.
Shard in ElasticSearch is primarily a Lucene index made up of one or more Lucene segments which store the document data in form of an inverted index.
Lucene segments are immutable
Average shard size could vary from 10GB to 40 GB depending upon the nature of data stored in the index. It is commonly seen that time-based data is stored in shard size of 20-40 GB.
It is recommended to run force-merge operation of merging multiple smaller segments into a larger one in off-peak hours (when no more data is written to the index).
It is recommended to have 20-25 shards per GB heap space. Thus, a node with 20 GB heap can have 400-500 shards.
Each shard has metadata related to shard and segment which needs to be stored in memory, and thus, use heap space.
The size of the shard could be managed based on one of the following techniques:
- Creating shards based on time-based indexing
- Creating shards based on documents count for each shard and using rollover API
- Merging/shrinking existing shards into new shard using Shrink APIs
It is recommended to determine the maximum shard size from a query performance perspective based on the benchmark using realistic data and queries. There is no thumb rule or one-size-fits-all solution to this.
It is recommended to use time-based indices for managing data retention whenever possible. Data can be grouped into indices based on the retention period. This makes it manage the indices in terms of creating and deleting the indices.

Sample Interview Questions

Explain the concepts of the cluster, node, index, shard, and replicas?
How to determine the shard size? What is recommended as the size of shard consisting of time-based data?
How does update and delete documents from index works?
How many shards can be allocated to a node having the memory of 20 GB or so?
Explain Lucene segments and merging of segments?
What is rollover and shrink APIs used for?

Sample Quiz (Objective Questions) on ElasticSearch

How many shards are created by default when elasticsearch server starts?

Correct! Wrong!

How many replicas are created by default for each shard?

Correct! Wrong!

How many shards including primary and replica shards in total are created by default?

Correct! Wrong!

Shards can further be splitted into multiple shards

True

False

Correct! Wrong!

Number of shards of an index can be changed at any point of time

True

False

Correct! Wrong!

Data is available for querying as soon as _______

It is written to a shard

After the shard is published to the Lucene segment disk

Correct! Wrong!

Lucene segments are immutable

True

False

Correct! Wrong!

Updating a document results in which of the following

Updating the original document in real time

Finding the matching document, marking the document as deleted and adding the new version

Correct! Wrong!

Deleting the document results in which of the following

Deleting the document in the index in real time

Finding the matching document, marking it as deleted.

Correct! Wrong!

The more heap space a node has, the more data and shards it can handle.

True

False

Correct! Wrong!

Number of shards on a node depends upon the available heap space

True

False

Correct! Wrong!

Smaller the shard size, smaller is the segment, greater is the overhead

True

False

Correct! Wrong!

Each query is executed in a single thread per shard

True

False

Correct! Wrong!

Which of the following API are used to create a new index given a pre-defined count of documents to be stored in an index is reached?

Shrink

Rollover

Correct! Wrong!

Which of the following API is used to reduce the number of shards in case many shards have been configured initially

Shrink

Rollover

Correct! Wrong!

Creating multiple shards of an index and partioning the data into different indices are one and the same thing

True

False

Correct! Wrong!

ElasticSearch Interview Questions on Index, Shards

You did extremely well!!

You did reasonably well!!

Better luck next time!!

Summary

In this post, you learned about quick concepts, sample interview questions and quiz related to Elasticsearch. Did you find this article useful? Do you have any questions or suggestions about this article? Leave a comment and ask your questions and I shall do my best to address your queries.

Author
Recent Posts

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin.
Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Elasticsearch Interview Questions & Answers – Set 1

ElasticSearch Fundamentals – Revision Notes

Sample Interview Questions

Sample Quiz (Objective Questions) on ElasticSearch

How many shards are created by default when elasticsearch server starts?

How many replicas are created by default for each shard?

How many shards including primary and replica shards in total are created by default?

Shards can further be splitted into multiple shards

Number of shards of an index can be changed at any point of time

Data is available for querying as soon as _______

Lucene segments are immutable

Updating a document results in which of the following

Deleting the document results in which of the following

The more heap space a node has, the more data and shards it can handle.

Number of shards on a node depends upon the available heap space

Smaller the shard size, smaller is the segment, greater is the overhead

Each query is executed in a single thread per shard

Which of the following API are used to create a new index given a pre-defined count of documents to be stored in an index is reached?

Which of the following API is used to reduce the number of shards in case many shards have been configured initially

Creating multiple shards of an index and partioning the data into different indices are one and the same thing

Further Reading / References

Summary

Ajitesh Kumar

Leave a Reply Cancel reply

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Recent Comments

Elasticsearch Interview Questions & Answers – Set 1

ElasticSearch Fundamentals – Revision Notes

Sample Interview Questions

Sample Quiz (Objective Questions) on ElasticSearch

How many shards are created by default when elasticsearch server starts?

How many replicas are created by default for each shard?

How many shards including primary and replica shards in total are created by default?

Shards can further be splitted into multiple shards

Number of shards of an index can be changed at any point of time

Data is available for querying as soon as _______

Lucene segments are immutable

Updating a document results in which of the following

Deleting the document results in which of the following

The more heap space a node has, the more data and shards it can handle.

Number of shards on a node depends upon the available heap space

Smaller the shard size, smaller is the segment, greater is the overhead

Each query is executed in a single thread per shard

Which of the following API are used to create a new index given a pre-defined count of documents to be stored in an index is reached?

Which of the following API is used to reduce the number of shards in case many shards have been configured initially

Creating multiple shards of an index and partioning the data into different indices are one and the same thing

Further Reading / References

Summary

Ajitesh Kumar

Leave a Reply Cancel reply

ChatGPT Prompts (250+)

Recent Posts

Data Science / AI Trends

Free Online Tools

Newsletter

Tag Cloud

Recent Comments