Note: All content on this website is derived directly from my own expertise and experiences. No AI-generated text or automated content creation tools are used.
Hi guys 👋, I'm a developer specializing in Elastic Stack and Next.js. My blog shares practical tutorials and insights based on 3+ years of hands-on experience. Open to freelance opportunities — let's get in touch!
Comments
josh
Feb 22, 2024
cool tutorial, thank you
Leave a Comment
Success!
Receive Latest Updates 📬
Get every new post, special offers, and more via email. No fee required.
In this tutorial, we will learn how to find similar documents in Elasticsearch using the more_like_this query.
The more_like_this query finds documents that are similar to a given set of documents.
This can be useful when you want to query similar blog posts, products, or any other documents.
For example, at this site (dminhvu.com), I use the more_like_this query to find similar blog posts to the one you are reading, which is shown in the "You Might Also Like" section at the right side of the page.
Find similar documents to a given document or set of documents.
Find similar documents to a mixed set of documents and text.
Let's discover the syntax and usage of each of these ways.
I will use the following documents for the examples throughout this tutorial:
documents
{ "id": 1, "title": "How to Find Similar Documents in Elasticsearch", "description": "In this tutorial, we will learn how to find similar documents in Elasticsearch using the more_like_this query."}{ "id": 2, "title": "Partial Update in Elasticsearch Guide (with Examples)", "description": "Learn how to perform partial updates in Elasticsearch 8.x to update only specific fields in a document using the Update API."}{ "id": 3, "title": "Logstash Input from JSON File", "description": "Learn how to parse logs from a JSON file in Logstash using the multiline codec plugin."}{ "id": 4, "title": "Python Extract Year from Date", "description": "Learn how to extract year from date in Python, including string date, date object, timestamp, and current year."}
The above query will return the following results:
response
{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 3, "relation": "eq" }, "max_score": 0.2876821, "hits": [ { "_index": "posts", "_type": "_doc", "_id": "2", "_score": 0.2876821, "_source": { "id": 2, "title": "Partial Update in Elasticsearch Guide (with Examples)", "description": "Learn how to perform partial updates in Elasticsearch 8.x to update only specific fields in a document using the Update API." } } ] }}
As you can see, the query returns the document with id = 2 because it is related to "elasticsearch", which is also related to the document with id = 1.
_index: the name of the index of the document to find similar documents to.
_id: the id of the document to find similar documents to, those documents will be ignored from the search results.
doc: the values of artificial fields to find similar documents.
_doc: the text to find similar documents.
For example, to find similar documents to the document with id = 1 in the posts index and the text "python extract year from date", we can use the following query:
query
GET posts/_search{ "query": { "more_like_this": { "fields": ["title", "description"], "like": [ { "_index": "posts", "_id": "1" }, { "_index": "posts", "doc": { "title": "python extract year from date", "description": "python extract year from date" } } ], "min_term_freq": 1, "max_query_terms": 10 } }}
The above query will return the following results:
response
{ "took": 1, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 2, "relation": "eq" }, "max_score": 0.2876821, "hits": [ { "_index": "posts", "_type": "_doc", "_id": "2", "_score": 0.2876821, "_source": { "id": 2, "title": "Partial Update in Elasticsearch Guide (with Examples)", "description": "Learn how to perform partial updates in Elasticsearch 8.x to update only specific fields in a document using the Update API." } }, { "_index": "posts", "_type": "_doc", "_id": "4", "_score": 0.2876821, "_source": { "id": 4, "title": "Python Extract Year from Date", "description": "Learn how to extract year from date in Python, including string date, date object, timestamp, and current year." } } ] }}
As you can see, the query returns the documents with id = 2 and id = 4 because they are related to "elasticsearch" and "python extract year from date".
Comments
josh
Feb 22, 2024
cool tutorial, thank you