in Uncategorized

Use More like this function of Elastic Search to implement the related content function



I have a forum about the programmer.

At first, I used “title search” to implement the related content of threads. It means if you want to know, which threads are related to one thread, you just use the title of this thread to search. This is a very easy approach, but there are some disadvantages of this method:

  1. Sometimes the title is to sample to represent the content.
  2. Sometimes there are no similar contents, but the author wrote other contents in this site, so we want these contents can be shown.
  3. The title is too short to represent, but we just cannot search all the content in this thread.

So, I am happy to find out Elastic Search, which I am using for search, have a function “More like this” can do it.

Before we start to write codes we can use CURL to test the idea.

Note:

  1. Put -d at the end, so we can use shell multi-line string.
  2. [{"_id" : 3721}] is the id of content.
  3. Use "_source": ["title","id"] to limit return fileds, we just want to generate a related content list, we don’t other information.

Result is:

But some content is truly in the search engine, you can find it in search, but this method can not return any related content. It will return like this:

So we can provide more condition (title, author, etc.) to get more results, like this:

Now it is done.

But there is another small problem, is “More like this” function is much slower than search, so you must want to cache the result. 🙂



Write a Comment

Comment

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax