Elastic Search/ OpenSearch
šŸ”

Elastic Search/ OpenSearch

ElasticsearchĀ is aĀ search engineĀ built on top of Apache Lucene.
It is:
  • AĀ NoSQL document storeĀ (stores JSON documents)
  • Designed forĀ full-text search, filtering, and analytics
  • Scalable and distributed
  • It is accessible from RESTful web service interface
YouĀ store documents, not rows like in SQL.

🧱 Key Concepts

šŸ–„ļøĀ Node

  • AĀ NodeĀ is a single running instance of Elasticsearch.
  • A single physical or virtual server can hostĀ multiple nodes, depending on the system’s resources like RAM, storage, and CPU.

🧮 Cluster

AĀ ClusterĀ is a collection of one or more nodes that together hold your data and provide distributed search and indexing capabilities.
Why Clusters?
  • šŸ”Ā Scalability — Add more nodes to increase capacity.
  • šŸ›”Ā Fault Tolerance — If one node fails, others continue operating.

šŸ“¦Ā Core Terms in Elasticsearch

Concept
Description
Index
Like a table in SQL. A collection of documents
Document
AĀ single JSON object — like a row in SQL. Every document has a unique ID (UID)
Field
A key-value pair in a document
Mapping
Like a schema: defines field types
Query
How you search documents

🧩 Shard

AnĀ IndexĀ can grow large, so Elasticsearch splits it into smaller pieces calledĀ shards.
  • Each shard is aĀ self-contained indexĀ and can reside on any node.
  • Shards enableĀ distributed storageĀ andĀ parallel processing.
Types of Shards:
  • šŸ”¹Ā Primary Shard — The original piece of the index.
  • šŸ”øĀ Replica Shard — A copy of the primary shard used for redundancy and load balancing.
Why Shards?
  • ⚔ Scalability — Distribute data across nodes.
  • āš™ļøĀ Performance — Indexing and search operations run in parallel.
Example:
If an index has 5 primary shards and your cluster has 5 nodes, each node can host one shard, balancing the load evenly.

ā™»ļøĀ Replica — The Backup Copies

To protect against data loss and improve search performance, Elasticsearch usesĀ replica shards, which are copies of primary shards.
  • āœ… EnsuresĀ high availability — if a node or shard fails, the replica takes over.
  • šŸš€ BoostsĀ search performance — queries can hit either primary or replica shards.
Key Points:
  • You can configure theĀ number of replicasĀ per index.
  • A replica isĀ never stored on the same nodeĀ as its corresponding primary shard — to avoid a single point of failure.

šŸ› ļø Getting Started

Step 1: Install and Run Elasticsearch

🐳 With Docker (Easiest)

docker run -d --name elasticsearch \ -p 9200:9200 -e "discovery.type=single-node" \ docker.elastic.co/elasticsearch/elasticsearch:8.13.0
Test it:
curl http://localhost:9200
Ā 

Step 2: : Creating an Index, Mapping

šŸ—ļøĀ Step 1: Creating an Index

AnĀ indexĀ in Elasticsearch is like aĀ tableĀ in SQL — it stores a collection ofĀ JSON documents.
You can create an index with default settings like this:
PUT /library
Or, to include custom settings (like number of shards and replicas):
PUT /library { "settings": { "number_of_shards": 3, "number_of_replicas": 1 } }
Ā 

🧬 Step 2: Define a Mapping

MappingsĀ in Elasticsearch define the structure of your documents — similar to a schema in a relational database. You define field types such asĀ text,Ā keyword,Ā date,Ā integer, etc.
Here’s a sample mapping for aĀ bookĀ document:
PUT /library/_mapping { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "published_date": { "type": "date" }, "pages": { "type": "integer" }, "available": { "type": "boolean" } } }
āœ… Alternatively, create index + mapping in one go:
PUT /library { "mappings": { "properties": { "title": { "type": "text" }, "author": { "type": "keyword" }, "published_date": { "type": "date" }, "pages": { "type": "integer" }, "available": { "type": "boolean" } } } }

Step 3: Add & Updating a Document

šŸ“„Ā Adding Document

Once the index and mapping are ready, you can start inserting data.
POST /products/_doc/1 { "name": "Wireless Mouse", "price": 25.99, "stock": 50, "category": "electronics" }
šŸ“ŒĀ Auto-ID Example:
POST /products/_doc { "name": "Gaming Keyboard", "price": 59.99, "stock": 30, "category": "electronics" }

āœļø Updating Documents in Elasticsearch

If you reindex a document with the same ID, itĀ replaces the existing document.
PUT /library/_doc/1 { "title": "Elasticsearch Essentials - Updated", "author": "Abhishek Tiwari", "published_date": "2023-08-01", "pages": 340, "available": false } Note: This replaces the entire document. If you omit a field, it gets deleted!

šŸ› ļøĀ 2. Partial Update (Only Specific Fields)

Use theĀ _updateĀ API to modify only certain fields:
  • ThisĀ preserves the restĀ of the document, updating only what’s inside theĀ docĀ object.
POST /library/_update/1 { "doc": { "available": true, "pages": 350 } }
Ā 

šŸ” Step 4: Search Documents (Query DSL)

šŸ’”

šŸ” What IsĀ Analyzed SearchĀ in Elasticsearch?

Analyzed searchĀ refers toĀ processing both the document content and the search queryĀ through anĀ analyzerĀ before they are stored or compared. This is the default behavior when using theĀ textĀ data type in Elasticsearch.
šŸ”§ What Is anĀ Analyzer?
AnĀ analyzerĀ is a combination of:
  1. Tokenizer: Breaks text into individual terms (tokens).
  1. Filters: Modify the tokens (e.g., lowercase, remove stop words, stemming).
Input: "The Quick Brown Foxes" Default (standard) analyzer → Tokens: ["the", "quick", "brown", "fox"]
šŸ”„ When Does Analyzing Happen?
  1. At index timeĀ (when the document is stored): the value in aĀ textĀ field is analyzed into tokens.
  1. At search timeĀ (when you query withĀ match,Ā multi_match, etc.): the query string is also analyzed.
āœ… Benefits of Analyzed Search
Feature
Description
Case-insensitive
"Quick"Ā matchesĀ "quick"
Flexible
Can match partial phrases:Ā "brown fox"
Supports stemming
"running"Ā can matchĀ "run"Ā if stemmer is enabled
Language-aware
Can handle language-specific rules
šŸ†š Analyzed vs Non-Analyzed
Feature
Analyzed (text)
Not Analyzed (keyword)
Tokenization
āœ… Yes
āŒ No
Query type used
match,Ā match_phrase, etc.
term,Ā terms
Case sensitivity
āŒ Case-insensitive (usually)
āœ… Case-sensitive (unless lowercased manually)
Sorting, aggregations
āŒ Not supported directly
āœ… Supported

⚔ Basic structure:

GET /products/_search { "query": { "match": { "name": "keyboard" } } }
This is using theĀ Query DSLĀ (Domain-Specific Language) — JSON-based syntax for querying.

What is Query DSL?

It has the format:
{ "query": { "match" | "term" | "bool" | "range" | ... } }
Ā 
šŸ’”

šŸ“Œ Query Examples

šŸ”øĀ Example : Match query

Document:
{ "description": "The quick brown fox jumps" }
āœ… Match query (Analyzed Search)
{ "match": { "description": "Quick Fox" } }
  • Both query and document are analyzed.
  • Tokens compared:Ā "quick",Ā "fox"
  • Match āœ…
āŒ Term query (No Analyzed Search)
{ "term": { "description": "Quick Fox" } }
  • Query isĀ not analyzed
  • Exact match expected against full textĀ "The quick brown fox jumps" → No match āŒ

šŸ”øĀ Example : Range Query

{ "query": { "range": { "price": { "gte": 20, "lte": 60 } } } }

šŸ”øĀ Example : Bool Query (AND, OR, NOT)

{ "query": { "bool": { "must": [ { "match": { "name": "mouse" } }, { "term": { "category": "electronics" } } ], "filter": [ { "range": { "price": { "lte": 50 } } } ] } } }
  • mustĀ Clause:Ā Ensures that documents match both theĀ nameĀ andĀ categoryĀ conditions. These matches contribute to the relevance score, allowing Elasticsearch to rank the results based on how well they match these criteria.
  • filterĀ Clause:Ā Applies a price constraint, filtering out documents where the price is greater than 50. This condition does not affect the relevance score, ensuring that scoring is based solely on theĀ mustĀ conditions.
  • should: like OR
  • must_not: like NOT
    • šŸŽÆ Summary of What You Learned

      Operation
      Example
      Create Index
      PUT /products
      Add Document
      POST /products/_doc/1
      Get Document
      GET /products/_doc/1
      Search
      GET /products/_search
      Match Query
      match: { name: "mouse" }
      Exact Match
      term: { category: "electronics" }
      Range
      range: { price: { gte: 20 }}
      慤
      慤
Ā 

Mapping

1. Defining Mappings at Index Creation

When creating a new index, wrap your mapping under theĀ mappingsĀ key:
PUT /my_index { "mappings": { "properties": { "<field1>": { <field1_definition> }, "<field2>": { <field2_definition> }, ... } } }
Ā 
  • properties: container for field definitionsĀ Elastic.
  • <fieldN>: each field’s name; each must specify at least aĀ typeĀ (e.g.,Ā text,Ā keyword,Ā date,Ā integer)
Example
PUT /products { "mappings": { "properties": { "name": { "type": "text" }, "description":{ "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "price": { "type": "integer" }, "brand": { "type": "keyword" } } } }
This creates aĀ productsĀ index where:
  • descriptionĀ is a full‐textĀ textĀ field, with a sub‐fieldĀ description.keywordĀ for exact matches.

2. Updating Mappings on an Existing Index

Use the Put Mapping API to add new fields or multi‐fields:
PUT /products/_mapping { "properties": { "<new_field>": { <definition> } } }
Ā 
šŸ’”
YouĀ cannotĀ change existing field types; you can only add new properties or multi‐fields.
Correct Way: Delete and Recreate the Index with Proper Mapping

3. Key Components of a Mapping Definition

3.1. Field Types (type)
Each field must declare aĀ type. Common types include:
  • text: analyzed string for full‐text search
  • keyword: not analyzed; good for aggregations, sorting, exact match
  • Numeric types:Ā integer,Ā long,Ā float,Ā double
  • date: date formats
  • boolean,Ā geo_point,Ā nested, etc.
3.2. Analyzers andĀ indexĀ Settings
  • analyzer: specify custom analysis chain (tokenizer, filters) forĀ textĀ fields
  • index: set toĀ falseĀ if you want to store the field but not index it for search
3.3. Multi-Fields (fields)
Allow indexing the same data in different ways. Example: full‐text and exact match:
"description": { "type": "text", "fields": { "raw": { "type": "keyword" } } }
Ā 
3.4.Ā _sourceĀ and Metadata Fields
  • _source: controls how the original JSON document is stored/retrieved; can disable or apply include/exclude filters
  • Meta-fields:Ā _id,Ā _typeĀ (deprecated in newer versions),Ā _allĀ (removed), etc.
3.5. Dynamic Mappings (dynamic,Ā dynamic_templates)
  • dynamic:Ā trueĀ (default)—automatically add new fields;Ā false—ignore new fields;Ā strict—reject documents with unmapped fields
  • dynamic_templates: pattern-based rules to apply custom mappings to matching field names.

4. Full Example

PUT /articles { "mappings": { "dynamic": "strict", "properties": { "title": { "type": "text", "analyzer": "standard" }, "author": { "type": "keyword" }, "publish_date": { "type": "date", "format": "yyyy-MM-dd" }, "comments": { "type": "nested", "properties": { "user": { "type": "keyword" }, "message": { "type": "text" }, "date": { "type": "date" } } } }, "dynamic_templates": [ { "strings_as_keywords": { "match_mapping_type": "string", "mapping": { "type": "keyword" } } } ] } }
Ā 
Here:
  • new unmapped fields cause errors (dynamic: "strict")
  • all string fields without explicit mapping becomeĀ keywordĀ viaĀ dynamic_templates

šŸŽÆĀ Elasticsearch Search Queries

1. šŸ”øĀ Match Query — Analyzed Full-Text Search

Use this when you want to search text fields that are analyzed (tokenized and normalized).

Example:

{ "query": { "match": { "description": "quick fox" } } }
  • Query terms and document fields are both analyzed.
  • Matches documents containing tokens likeĀ "quick"Ā andĀ "fox"Ā in theĀ descriptionĀ field.

2. šŸ”øĀ Term Query — Exact Match (No Analysis)

Use this for exact matches on keyword or non-analyzed fields.

Example:

{ "query": { "term": { "category.keyword": "electronics" } } }
  • Searches for exact termĀ "electronics"Ā in theĀ category.keywordĀ field.
  • No text analysis; case sensitive and exact match.

3. šŸ”øĀ Range Query — Numeric or Date Ranges

Filter documents with values between specified boundaries.

Example:

{ "query": { "range": { "price": { "gte": 20, "lte": 60 } } } }
  • Finds documents where price is between 20 and 60 (inclusive).

4. šŸ”øĀ Bool Query — Combine Queries with AND, OR, NOT

Compose complex queries usingĀ mustĀ (AND),Ā shouldĀ (OR), andĀ must_notĀ (NOT).

Example:

{ "query": { "bool": { "must": [ { "match": { "name": "mouse" } }, { "term": { "category.keyword": "electronics" } } ], "filter": [ { "range": { "price": { "lte": 50 } } } ], "must_not": [ { "term": { "brand.keyword": "brandX" } } ] } } }
  • Documents must haveĀ "mouse"Ā in the name AND categoryĀ "electronics".
  • Price must be less than or equal to 50.
  • Excludes documents fromĀ "brandX".

5. šŸ”øĀ Fuzzy Query — Search with Typos or Approximate Matches

Great for user input with spelling mistakes or typos.

Example:

{ "query": { "fuzzy": { "name": { "value": "wirless", "fuzziness": "AUTO" } } } }
  • Matches similar terms likeĀ "wireless"Ā orĀ "wiresless".

6. šŸ”øĀ Prefix Query — Autocomplete with Prefix Matching

Search documents where a field starts with a given prefix.

Example:

{ "query": { "prefix": { "name": "blu" } } }
  • Matches documents with terms likeĀ "bluetooth",Ā "blue light".

7. šŸ”øĀ Completion Suggester — Efficient Autocomplete Suggestions

Designed for fast autocomplete and suggestion features.

Setup:

PUT /products { "mappings": { "properties": { "suggest": { "type": "completion" } } } }

Indexing Document:

POST /products/_doc/1 { "name": "Wireless Mouse", "suggest": { "input": ["wireless mouse", "mouse", "computer accessory"] } }

Query:

POST /products/_search { "suggest": { "product-suggest": { "prefix": "wire", "completion": { "field": "suggest" } } } }
  • Returns autocomplete suggestions as user typesĀ "wire".

8. šŸ”øĀ Nested Query — Querying Arrays of Objects

If your documents contain nested objects (arrays of JSON objects), use nested queries to query fields inside the same nested object.

Document Example:

{ "name": "Gaming Laptop", "features": [ { "name": "RAM", "value": "16GB" }, { "name": "GPU", "value": "NVIDIA" } ] }

Mapping:

PUT /electronics { "mappings": { "properties": { "features": { "type": "nested" } } } }

Query:

json CopyEdit POST /electronics/_search { "query": { "nested": { "path": "features", "query": { "bool": { "must": [ { "match": { "features.name": "GPU" } }, { "match": { "features.value": "NVIDIA" } } ] } } } } }
  • Matches documents where theĀ same nested objectĀ hasĀ "GPU"Ā as name andĀ "NVIDIA"Ā as value.
Ā 
Ā 
Ā 

šŸ“ŠĀ Elasticsearch Aggregations

Aggregations are Elasticsearch’s way to summarize and analyze your data. They work likeĀ group by,Ā count,Ā sum,Ā avg, and other analytics in SQL.

Basic Request Structure:

GET /index_name/_search { "query": { // your query here (e.g. match, bool, range) }, "aggs": { // your aggregations here } }

Common Aggregation Types (Clauses)

1.Ā terms — Group by field values (like GROUP BY in SQL)

Groups documents by unique values of a field and returns counts per group.
json CopyEdit "aggs": { "by_category": { "terms": { "field": "category.keyword", "size": 10 } } }
  • Returns the top 10 categories and how many documents each has.
  • UseĀ .keywordĀ for exact term aggregation on text fields.

2.Ā avg — Average of numeric field

Calculates average of a numeric field.
"aggs": { "average_price": { "avg": { "field": "price" } } }

3.Ā sum — Sum of numeric field

Calculates the total sum.
"aggs": { "total_sales": { "sum": { "field": "sales" } } }

4.Ā minĀ andĀ max — Minimum and maximum values

"aggs": { "min_price": { "min": { "field": "price" } }, "max_price": { "max": { "field": "price" } } }

5.Ā stats — Summary stats (count, min, max, avg, sum)

"aggs": { "price_stats": { "stats": { "field": "price" } } }

6.Ā date_histogram — Group by date intervals

Great for time-series data, groups documents by fixed time intervals.
"aggs": { "sales_over_time": { "date_histogram": { "field": "order_date", "calendar_interval": "month" } } }

7.Ā filter — Apply a filter inside aggregations

Filter documents inside aggregation.
"aggs": { "electronics_sales": { "filter": { "term": { "category.keyword": "electronics" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } }

Nested Aggregations

You can nest aggregations inside others for deeper insights.
Example: Top categories → average price per category
"aggs": { "by_category": { "terms": { "field": "category.keyword" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } }

Full Example: Search with Aggregations

GET /products/_search { "query": { "range": { "price": { "gte": 20 } } }, "aggs": { "by_category": { "terms": { "field": "category.keyword", "size": 5 }, "aggs": { "average_price": { "avg": { "field": "price" } } } }, "price_stats": { "stats": { "field": "price" } } } }
  • Filters products priced 20 or above.
  • Groups by top 5 categories and calculates average price per category.
  • Returns overall price statistics.
🧠 Final Thoughts
Aggregation Type
Purpose
Basic Syntax Example
terms
Group by unique values
"terms": { "field": "category.keyword" }
avg
Average of numeric field
"avg": { "field": "price" }
sum
Sum of numeric field
"sum": { "field": "sales" }
min / max
Minimum and maximum values
"min": { "field": "price" }, "max": {...}
stats
Count, min, max, avg, sum summary
"stats": { "field": "price" }
date\_histogram
Group by date intervals
"date_histogram": { "field": "order_date", "calendar_interval": "month" }
filter
Filter aggregation scope
"filter": { "term": { "category.keyword": "x" } }
Ā 
Built with Potion.so