ElasticsearchĀ is aĀ search engineĀ built on top of Apache Lucene.
It is:
- AĀ NoSQL document storeĀ (stores JSON documents)
- Designed forĀ full-text search, filtering, and analytics
- Scalable and distributed
- It is accessible from RESTful web service interface
YouĀ store documents, not rows like in SQL.
š§± Key Concepts
š„ļøĀ Node
- AĀ NodeĀ is a single running instance of Elasticsearch.
- A single physical or virtual server can hostĀ multiple nodes, depending on the systemās resources like RAM, storage, and CPU.
š§®Ā Cluster
AĀ ClusterĀ is a collection of one or more nodes that together hold your data and provide distributed search and indexing capabilities.
Why Clusters?
- šĀ ScalabilityĀ ā Add more nodes to increase capacity.
- š”Ā Fault ToleranceĀ ā If one node fails, others continue operating.
š¦Ā Core Terms in Elasticsearch
Concept | Description |
Index | Like a table in SQL. A collection of documents |
Document | AĀ single JSON objectĀ ā like a row in SQL. Every document has a unique ID (UID) |
Field | A key-value pair in a document |
Mapping | Like a schema: defines field types |
Query | How you search documents |
š§©Ā Shard
AnĀ IndexĀ can grow large, so Elasticsearch splits it into smaller pieces calledĀ shards.
- Each shard is aĀ self-contained indexĀ and can reside on any node.
- Shards enableĀ distributed storageĀ andĀ parallel processing.
Types of Shards:
- š¹Ā Primary ShardĀ ā The original piece of the index.
- šøĀ Replica ShardĀ ā A copy of the primary shard used for redundancy and load balancing.
Why Shards?
- ā”Ā ScalabilityĀ ā Distribute data across nodes.
- āļøĀ PerformanceĀ ā Indexing and search operations run in parallel.
Example:
If an index has 5 primary shards and your cluster has 5 nodes, each node can host one shard, balancing the load evenly.
ā»ļøĀ Replica ā The Backup Copies
To protect against data loss and improve search performance, Elasticsearch usesĀ replica shards, which are copies of primary shards.
- ā EnsuresĀ high availabilityĀ ā if a node or shard fails, the replica takes over.
- š BoostsĀ search performanceĀ ā queries can hit either primary or replica shards.
Key Points:
- You can configure theĀ number of replicasĀ per index.
- A replica isĀ never stored on the same nodeĀ as its corresponding primary shard ā to avoid a single point of failure.
š ļø Getting Started
Step 1: Install and Run Elasticsearch
Step 2: : Creating an Index, Mapping
šļøĀ Step 1: Creating an Index
AnĀ indexĀ in Elasticsearch is like aĀ tableĀ in SQL ā it stores a collection ofĀ JSON documents.
You can create an index with default settings like this:
PUT /library
Or, to include custom settings (like number of shards and replicas):
PUT /library { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } }
Ā
š§¬Ā Step 2: Define a Mapping
MappingsĀ in Elasticsearch define the structure of your documents ā similar to a schema in a relational database. You define field types such asĀ
text,Ā keyword,Ā date,Ā integer, etc.Hereās a sample mapping for aĀ
bookĀ document:PUT /library/_mapping { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "published_date": { "type": "date" }, "pages": { "type": "integer" }, "available": { "type": "boolean" } } }
ā
Alternatively, create index + mapping in one go:
PUT /library { "mappings": { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "published_date": { "type": "date" }, "pages": { "type": "integer" }, "available": { "type": "boolean" } } } }
Step 3: Add & Updating a Document
šĀ Adding Document
Once the index and mapping are ready, you can start inserting data.
POST /products/_doc/1 { "name": "Wireless Mouse", "price": 25.99, "stock": 50, "category": "electronics" }
šĀ Auto-ID Example:
POST /products/_doc { "name": "Gaming Keyboard", "price": 59.99, "stock": 30, "category": "electronics" }
āļø Updating Documents in Elasticsearch
If you reindex a document with the same ID, itĀ replaces the existing document.
PUT /library/_doc/1 { "title": "Elasticsearch Essentials - Updated", "author": "Abhishek Tiwari", "published_date": "2023-08-01", "pages": 340, "available": false } Note: This replaces the entire document. If you omit a field, it gets deleted!
š ļøĀ 2. Partial Update (Only Specific Fields)
Use theĀ
_updateĀ API to modify only certain fields:- ThisĀ preserves the restĀ of the document, updating only whatās inside theĀ
docĀ object.
POST /library/_update/1 { "doc": { "available": true, "pages": 350 } }
Ā
š Step 4: Search Documents (Query DSL)
š What IsĀ Analyzed SearchĀ in Elasticsearch?
Analyzed searchĀ refers toĀ processing both the document content and the search queryĀ through anĀ analyzerĀ before they are stored or compared. This is the default behavior when using theĀ
textĀ data type in Elasticsearch.š§ What Is anĀ Analyzer?
AnĀ analyzerĀ is a combination of:
- Tokenizer: Breaks text into individual terms (tokens).
- Filters: Modify the tokens (e.g., lowercase, remove stop words, stemming).
Input: "The Quick Brown Foxes" Default (standard) analyzer ā Tokens: ["the", "quick", "brown", "fox"]
š When Does Analyzing Happen?
- At index timeĀ (when the document is stored): the value in aĀ
textĀ field is analyzed into tokens.
- At search timeĀ (when you query withĀ
match,Āmulti_match, etc.): the query string is also analyzed.
ā
Benefits of Analyzed Search
Feature | Description |
Case-insensitive | "Quick"Ā matchesĀ "quick" |
Flexible | Can match partial phrases:Ā "brown fox" |
Supports stemming | "running"Ā can matchĀ "run"Ā if stemmer is enabled |
Language-aware | Can handle language-specific rules |
š Analyzed vs Non-Analyzed
Feature | Analyzed ( text) | Not Analyzed ( keyword) |
Tokenization | ā
Yes | ā No |
Query type used | match,Ā match_phrase, etc. | term,Ā terms |
Case sensitivity | ā Case-insensitive (usually) | ā
Case-sensitive (unless lowercased manually) |
Sorting, aggregations | ā Not supported directly | ā
Supported |
ā” Basic structure:
GET /products/_search { "query": { "match": { "name": "keyboard" } } }
This is using theĀ Query DSLĀ (Domain-Specific Language) ā JSON-based syntax for querying.
What is Query DSL?
It has the format:
{ "query": { "match" | "term" | "bool" | "range" | ... } }
Ā
š Query Examples
šøĀ Example : Match query
Document:
{ "description": "The quick brown fox jumps" }
ā
Match query (Analyzed Search)
{ "match": { "description": "Quick Fox" } }
- Both query and document are analyzed.
- Tokens compared:Ā
"quick",Ā"fox"
- Match ā
ā Term query (No Analyzed Search)
{ "term": { "description": "Quick Fox" } }
- Query isĀ not analyzed
- Exact match expected against full textĀ
"The quick brown fox jumps"Ā ā No match ā
šøĀ Example : Range Query
{ "query": { "range": { "price": { "gte": 20, "lte": 60 } } } }
šøĀ Example : Bool Query (AND, OR, NOT)
{ "query": { "bool": { "must": [ { "match": { "name": "mouse" } }, { "term": { "category": "electronics" } } ], "filter": [ { "range": { "price": { "lte": 50 } } } ] } } }
mustĀ Clause:Ā Ensures that documents match both theĀnameĀ andĀcategoryĀ conditions. These matches contribute to the relevance score, allowing Elasticsearch to rank the results based on how well they match these criteria.
filterĀ Clause:Ā Applies a price constraint, filtering out documents where the price is greater than 50. This condition does not affect the relevance score, ensuring that scoring is based solely on theĀmustĀ conditions.
should: like OR
must_not: like NOT
Ā
Mapping
1. Defining Mappings at Index Creation
When creating a new index, wrap your mapping under theĀ
mappingsĀ key:PUT /my_index { "mappings": { "properties": { "<field1>": { <field1_definition> }, "<field2>": { <field2_definition> }, ... } } }
Ā
properties: container for field definitionsĀ Elastic.
<fieldN>: each fieldās name; each must specify at least aĀtypeĀ (e.g.,Ātext,Ākeyword,Ādate,Āinteger)
Example
PUT /products { "mappings": { "properties": { "name": { "type": "text" }, "description":{ "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "price": { "type": "integer" }, "brand": { "type": "keyword" } } } }
This creates aĀ
productsĀ index where:descriptionĀ is a fullātextĀtextĀ field, with a subāfieldĀdescription.keywordĀ for exact matches.
2. Updating Mappings on an Existing Index
Use the Put Mapping API to add new fields or multiāfields:
PUT /products/_mapping { "properties": { "<new_field>": { <definition> } } }
Ā
YouĀ cannotĀ change existing field types; you can only add new properties or multiāfields.
Correct Way: Delete and Recreate the Index with Proper Mapping
3. Key Components of a Mapping Definition
3.1. Field Types (
type)Each field must declare aĀ
type. Common types include:text: analyzed string for fullātext search
keyword: not analyzed; good for aggregations, sorting, exact match
- Numeric types:Ā
integer,Ālong,Āfloat,Ādouble
date: date formats
boolean,Āgeo_point,Ānested, etc.
3.2. Analyzers andĀ
indexĀ Settingsanalyzer: specify custom analysis chain (tokenizer, filters) forĀtextĀ fields
index: set toĀfalseĀ if you want to store the field but not index it for search
3.3. Multi-Fields (
fields)Allow indexing the same data in different ways. Example: fullātext and exact match:
"description": { "type": "text", "fields": { "raw": { "type": "keyword" } } }
Ā
3.4.Ā
_sourceĀ and Metadata Fields_source: controls how the original JSON document is stored/retrieved; can disable or apply include/exclude filters
- Meta-fields:Ā
_id,Ā_typeĀ (deprecated in newer versions),Ā_allĀ (removed), etc.
3.5. Dynamic Mappings (
dynamic,Ā dynamic_templates)dynamic:ĀtrueĀ (default)āautomatically add new fields;Āfalseāignore new fields;Āstrictāreject documents with unmapped fields
dynamic_templates: pattern-based rules to apply custom mappings to matching field names.
4. Full Example
PUT /articles { "mappings": { "dynamic": "strict", "properties": { "title": { "type": "text", "analyzer": "standard" }, "author": { "type": "keyword" }, "publish_date": { "type": "date", "format": "yyyy-MM-dd" }, "comments": { "type": "nested", "properties": { "user": { "type": "keyword" }, "message": { "type": "text" }, "date": { "type": "date" } } } }, "dynamic_templates": [ { "strings_as_keywords": { "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ] } }
Ā
Here:
- new unmapped fields cause errors (
dynamic: "strict")
- all string fields without explicit mapping becomeĀ
keywordĀ viaĀdynamic_templates
šÆĀ Elasticsearch Search Queries
1. šøĀ Match QueryĀ ā Analyzed Full-Text Search
Use this when you want to search text fields that are analyzed (tokenized and normalized).
Example:
{ "query": { "match": { "description": "quick fox" } } }
- Query terms and document fields are both analyzed.
- Matches documents containing tokens likeĀ
"quick"Ā andĀ"fox"Ā in theĀdescriptionĀ field.
2. šøĀ Term QueryĀ ā Exact Match (No Analysis)
Use this for exact matches on keyword or non-analyzed fields.
Example:
{ "query": { "term": { "category.keyword": "electronics" } } }
- Searches for exact termĀ
"electronics"Ā in theĀcategory.keywordĀ field.
- No text analysis; case sensitive and exact match.
3. šøĀ Range QueryĀ ā Numeric or Date Ranges
Filter documents with values between specified boundaries.
Example:
{ "query": { "range": { "price": { "gte": 20, "lte": 60 } } } }
- Finds documents where price is between 20 and 60 (inclusive).
4. šøĀ Bool QueryĀ ā Combine Queries with AND, OR, NOT
Compose complex queries usingĀ
mustĀ (AND),Ā shouldĀ (OR), andĀ must_notĀ (NOT).Example:
{ "query": { "bool": { "must": [ { "match": { "name": "mouse" } }, { "term": { "category.keyword": "electronics" } } ], "filter": [ { "range": { "price": { "lte": 50 } } } ], "must_not": [ { "term": { "brand.keyword": "brandX" } } ] } } }
- Documents must haveĀ
"mouse"Ā in the name AND categoryĀ"electronics".
- Price must be less than or equal to 50.
- Excludes documents fromĀ
"brandX".
5. šøĀ Fuzzy QueryĀ ā Search with Typos or Approximate Matches
Great for user input with spelling mistakes or typos.
Example:
{ "query": { "fuzzy": { "name": { "value": "wirless", "fuzziness": "AUTO" } } } }
- Matches similar terms likeĀ
"wireless"Ā orĀ"wiresless".
6. šøĀ Prefix QueryĀ ā Autocomplete with Prefix Matching
Search documents where a field starts with a given prefix.
Example:
{ "query": { "prefix": { "name": "blu" } } }
- Matches documents with terms likeĀ
"bluetooth",Ā"blue light".
7. šøĀ Completion SuggesterĀ ā Efficient Autocomplete Suggestions
Designed for fast autocomplete and suggestion features.
Setup:
PUT /products { "mappings": { "properties": { "suggest": { "type": "completion" } } } }
Indexing Document:
POST /products/_doc/1 { "name": "Wireless Mouse", "suggest": { "input": ["wireless mouse", "mouse", "computer accessory"] } }
Query:
POST /products/_search { "suggest": { "product-suggest": { "prefix": "wire", "completion": { "field": "suggest" } } } }
- Returns autocomplete suggestions as user typesĀ
"wire".
8. šøĀ Nested QueryĀ ā Querying Arrays of Objects
If your documents contain nested objects (arrays of JSON objects), use nested queries to query fields inside the same nested object.
Document Example:
{ "name": "Gaming Laptop", "features": [ { "name": "RAM", "value": "16GB" }, { "name": "GPU", "value": "NVIDIA" } ] }
Mapping:
PUT /electronics { "mappings": { "properties": { "features": { "type": "nested" } } } }
Query:
json CopyEdit POST /electronics/_search { "query": { "nested": { "path": "features", "query": { "bool": { "must": [ { "match": { "features.name": "GPU" } }, { "match": { "features.value": "NVIDIA" } } ] } } } } }
- Matches documents where theĀ same nested objectĀ hasĀ
"GPU"Ā as name andĀ"NVIDIA"Ā as value.
Ā
Ā
Ā
šĀ Elasticsearch Aggregations
Aggregations are Elasticsearchās way to summarize and analyze your data. They work likeĀ group by,Ā count,Ā sum,Ā avg, and other analytics in SQL.
Basic Request Structure:
GET /index_name/_search { "query": { // your query here (e.g. match, bool, range) }, "aggs": { // your aggregations here } }
Common Aggregation Types (Clauses)
1.Ā termsĀ ā Group by field values (like GROUP BY in SQL)
Groups documents by unique values of a field and returns counts per group.
json CopyEdit "aggs": { "by_category": { "terms": { "field": "category.keyword", "size": 10 } } }
- Returns the top 10 categories and how many documents each has.
- UseĀ
.keywordĀ for exact term aggregation on text fields.
2.Ā avgĀ ā Average of numeric field
Calculates average of a numeric field.
"aggs": { "average_price": { "avg": { "field": "price" } } }
3.Ā sumĀ ā Sum of numeric field
Calculates the total sum.
"aggs": { "total_sales": { "sum": { "field": "sales" } } }
4.Ā minĀ andĀ maxĀ ā Minimum and maximum values
"aggs": { "min_price": { "min": { "field": "price" } }, "max_price": { "max": { "field": "price" } } }
5.Ā statsĀ ā Summary stats (count, min, max, avg, sum)
"aggs": { "price_stats": { "stats": { "field": "price" } } }
6.Ā date_histogramĀ ā Group by date intervals
Great for time-series data, groups documents by fixed time intervals.
"aggs": { "sales_over_time": { "date_histogram": { "field": "order_date", "calendar_interval": "month" } } }
7.Ā filterĀ ā Apply a filter inside aggregations
Filter documents inside aggregation.
"aggs": { "electronics_sales": { "filter": { "term": { "category.keyword": "electronics" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } }
Nested Aggregations
You can nest aggregations inside others for deeper insights.
Example: Top categories ā average price per category
"aggs": { "by_category": { "terms": { "field": "category.keyword" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } }
Full Example: Search with Aggregations
GET /products/_search { "query": { "range": { "price": { "gte": 20 } } }, "aggs": { "by_category": { "terms": { "field": "category.keyword", "size": 5 }, "aggs": { "average_price": { "avg": { "field": "price" } } } }, "price_stats": { "stats": { "field": "price" } } } }
- Filters products priced 20 or above.
- Groups by top 5 categories and calculates average price per category.
- Returns overall price statistics.
š§ Ā Final Thoughts
Aggregation Type | Purpose | Basic Syntax Example |
terms | Group by unique values | "terms": { "field": "category.keyword" } |
avg | Average of numeric field | "avg": { "field": "price" } |
sum | Sum of numeric field | "sum": { "field": "sales" } |
min / max | Minimum and maximum values | "min": { "field": "price" }, "max": {...} |
stats | Count, min, max, avg, sum summary | "stats": { "field": "price" } |
date\_histogram | Group by date intervals | "date_histogram": { "field": "order_date", "calendar_interval": "month" } |
filter | Filter aggregation scope | "filter": { "term": { "category.keyword": "x" } } |
Ā