In this tutorial, we will learn how to find similar documents in Elasticsearch using the more_like_this
query.
The more_like_this query finds documents that are similar to a given set of documents.
This can be useful when you want to query similar blog posts, products, or any other documents.
For example, at this site (dminhvu.com), I use the more_like_this
query to find similar blog posts to the one you are reading, which is shown in the "You Might Also Like" section at the right side of the page.
Contents
How to Use the more_like_this
Query in Elasticsearch to Find Similar Documents
The more_like_this
query can be used in 3 ways:
- Find similar documents to a given text.
- Find similar documents to a given document or set of documents.
- Find similar documents to a mixed set of documents and text.
Let's discover the syntax and usage of each of these ways.
I will use the following documents for the examples throughout this tutorial:
1. Find Similar Documents to a Given Text
The syntax of the more_like_this
query to find similar documents to a given text is as follows:
where:
<index>
: the name of the index to search.fields
: the fields to search for similar documents.like
: the text to find similar documents.min_term_freq
: the minimum term frequency below which the terms will be ignored from the input document.max_query_terms
: the maximum number of query terms that will be selected, the higher the number, the higher the accuracy but slower the query.
For example, to find similar documents to the text "python extract year from date"
, we can use the following query:
The above query will return the following results:
As you can see, the query returns the document with id
= 4
because it contains the text "python extract year from date"
in the title
field.
2. Find Similar Documents to a Given Document or Set of Documents
You can also use the more_like_this
query to find similar documents to a given document or set of documents.
The syntax of the more_like_this
query to find similar documents to a given document or set of documents is as follows:
where:
<index>
: the name of the index to search._index
: the name of the index of the document to find similar documents to._id
: the id of the document to find similar documents to, those documents will be ignored from the search results.
For example, to find similar documents to the document with id
= 1
in the posts
index, we can use the following query:
The above query will return the following results:
As you can see, the query returns the document with id
= 2
because it is related to "elasticsearch", which is also related to the document with id
= 1
.
3. Find Similar Documents to a Mixed Set of Documents and Text
To find similar documents to a mixed set of documents and text, you can use the more_like_this
query as follows:
where:
<index>
: the name of the index to search._index
: the name of the index of the document to find similar documents to._id
: the id of the document to find similar documents to, those documents will be ignored from the search results.doc
: the values of artificial fields to find similar documents._doc
: the text to find similar documents.
For example, to find similar documents to the document with id
= 1
in the posts
index and the text "python extract year from date"
, we can use the following query:
The above query will return the following results:
As you can see, the query returns the documents with id
= 2
and id
= 4
because they are related to "elasticsearch" and "python extract year from date".
Conclusion
In this tutorial, we have learned how to find similar documents in Elasticsearch using the more_like_this
query.
- The
more_like_this
query finds documents that are similar to a given set of documents. - The
more_like_this
query can be used in 3 ways:- Find similar documents to a given text.
- Find similar documents to a given document or set of documents.
- Find similar documents to a mixed set of documents and text.
- The
more_like_this
query is useful when you want to query similar blog posts, products, or any other documents.
Comments
josh
Feb 22, 2024
cool tutorial, thank you