tinyfool

Use More like this function of Elastic Search to implement the related content function



I have a forum about the programmer.

At first, I used “title search” to implement the related content of threads. It means if you want to know, which threads are related to one thread, you just use the title of this thread to search. This is a very easy approach, but there are some disadvantages of this method:

  1. Sometimes the title is to sample to represent the content.
  2. Sometimes there are no similar contents, but the author wrote other contents in this site, so we want these contents can be shown.
  3. The title is too short to represent, but we just cannot search all the content in this thread.

So, I am happy to find out Elastic Search, which I am using for search, have a function “More like this” can do it.

Before we start to write codes we can use CURL to test the idea.

Note:

  1. Put -d at the end, so we can use shell multi-line string.
  2. [{"_id" : 3721}] is the id of content.
  3. Use "_source": ["title","id"] to limit return fileds, we just want to generate a related content list, we don’t other information.

Result is:

But some content is truly in the search engine, you can find it in search, but this method can not return any related content. It will return like this:

So we can provide more condition (title, author, etc.) to get more results, like this:

Now it is done.

But there is another small problem, is “More like this” function is much slower than search, so you must want to cache the result. 🙂



My website configuration log



Web part ( http://ourcoders.com/ ) :

Elastic Search part :

Automatically transcribe

Reportex

The idea of Reportex is very similar to my idea of something about best audio/video editor. When you select a paragraph of words, it will automatedly select the corresponding audio clips.

Reportex beta editting

But right now it is just a demo.

[TIL] 3d scanner hardware, software, app and course

Hardware

DIY 3D Scanner by Alex

In this video, Alex used stepper motors, threaded rod, IR sensor, home made 3D print components to build a DIY 3D Scanner hardware. There are three kinds of sensors can be to measure the distance, IR sensors, Ultrasonic sensors and Laser sensors.

FabScan

FabScan is an open source 3D laser Scanner hardware, you will get all the information you need to build one from here. FabScan uses a laser sensor to get the distance information for 3d reconstruction and use a camera to get the texture of the object.

There is a video show you how to Assembly it:

There is a video show you how to use it:

Software

VisualSFM

VisualSFM is a 3d scanner software, made by Changchang Wu, support Windows/Linux/Mac. You can put all the photos shotted by different angles, VisualSFM uses SIFT (Scale-invariant feature transform) method to get the feature points of photos, connect the same feature point on different photos, to get the relation of photos, to get the 3D model.

VisualSFM on MacOS

If you are interesting VisualSFM, there is a great video will show you how to use it on youtube.

See also:

PhotoScan

use PhotoScan scan photo files:

Trick

You need to use a camera or your phone to take a lot of photos of the object from all the different angles. This is very painful. I developed a trick method use my iPhone. First, I shot a short video shot every angle of the object use my iPhone, send to my Mac, use FFMPEG to convert this video to photos.

The number 6 is to extract how many frames from 1s video.

I did a lot of 3D scans, use this method.

App

MobileFusion

Microsoft’s MobileFusion app is the best app make 3D scans, but it never comes to market.

Course

If you want to write your own codes to make 3d scanning, you may want to learn this course, Photogrammetry by Cyrill Stachniss.

Cyrill Stachniss is a full professor for photogrammetry at the University of Bonn and recently became the deputy managing director of the Institute of Geodesy and Geoinformation.

Photogrammetry:

And finally, you always can go to Reddit find anything, of course, include 3D Scanning.

Google SEO and Structured Data

If you search some specific words like “Apple pie”, Google will show some different results at the first several lines, like this:

google results of apple pie Recipe

You can find out these results contain some extra contents, like rating, votes, time, and cal, Google must know these results are recipes. But how can they know? Using some deep learning algorithm? No, this is all about structured data.

In Google IO 2017, they have a talk about structured data.

Google support a lot of structured data.

  1. Format is Microdata
  2. Google support a lot type of structured data of enhancements, like Breadcrumbs, Corporate Contacts, Galleries and Lists, Logos, Sitelinks Searchbox, Site Name, Social Profile Links, and a lot of content types, Articles, Books, Courses, Datasets, Events, Fact Check, Local Businesses, Music, Podcasts, Products, Recipes, Reviews, TV and Movies, Videos. you can find details at Search Gallery.
  3. Want to know how to use it, you can see Introduction to Structured Data.

[TIL] Integration by parts

Integration by parts or partial integration is a theorem to help us solve complex Integration problem of a product of functions. If a function can represent as a product of a function \(u(x)\) and a derivative of a function \(u′(x)\) , we can use integration by parts.

If \(u = u(x)\) and \(du = u′(x) dx\), while \(v = v(x)\) and \(dv = v′(x) dx\), then integration by parts states that:

\(\int_a^b u(x) v'(x) \, dx\ = [u(x) v(x)]_a^b-\int_a^b v(x) u'(x) \, dx \).

or more compactly:

\(\int u\,dv=uv-\int v\,du\).

This method is very useful when sometimes integration of \(v(x)u′(x) \) is easier to find.

[TIL] Rationalizing Substitution

Rationalizing Substitution is a special type of U-Substitution, radicals in function often cause problems when integrating, use rationalizing substitution we can transform the problem to help us solve them.

For example, sometimes our function contain something like \(\sqrt{x}\), in order to eliminate the “square root”, we can make substitution \(x = z^2\),\(dx = 2z dz\). Then you may can solve function’s integration easier.

U-Substitution

Today I learned U-Substitution, it is a method for finding integrals in calculus, also known as integration by substitution. Using the fundamental theorem of calculus often requires finding an antiderivative. For this and other reasons, U-Substitution is an important tool in mathematics. It is the counterpart to the chain rule of differentiation.

If you find difficult to finding an antiderivative directly, sometimes you may want use substitution \(u = ϕ(x)\), rewrite \(f(x)dx\) form to some function \(g(u)du\) form and get antiderivative of \(g(u)du\) first, and put \(x\) in the antiderivative to get what you want at first.

For example, consider the integral

\( \int _{0}^{2}x\cos(x^{2}+1)\,dx \)

    
If we apply the formula from right to left and make the substitution \(u = ϕ(x) = (x2 + 1)\), we obtain \(du = 2x dx\) and hence; \(x dx = ½du\)

\({\begin{aligned}\int _{x=0}^{x=2}x\cos(x^{2}+1)\,dx&{}={\frac {1}{2}}\int _{u=1}^{u=5}\cos(u)\,du {}={\frac {1}{2}}(\sin(5)-\sin(1)).\end{aligned}}\)
       
see also: Integration by substitution

Why Google built TPU instead invent some superpower GPU?

Deep learning researchers always think training is the core problem. Because they always lack funds to purchase the quickest machines. But Google doesn’t worry this, they just have tons of powerful machines, find resources to train a good model isn’t very hard for Google.

Win some deep learning contests isn’t the goal of Google, it is just their PR tricks. Google want to provide AI cloud services. So they kept releasing their well-trained models, Inception-v3, Word2vec, etc. Most of the customers will use API from Google’s well-trained models, like Cloud Natural Language API, Cloud Speech API, Cloud Translation API, Cloud Vision API, Cloud Video Intelligence API. Some of them will want to use models that provide by Google or other companies, or just do some fine tune. And only a little of them will want to train their model all from the beginning.

So, Google cares about service more than training, so they build TPU to speed up service, to reduce service latency.

Programming languages I ever learned.

I learn a lot of them:

  1. Basic – high school
  2. pascal – high school
  3. C – high school
  4. C++ – college
  5. VisualBasic – college
  6. ASP – college
  7. Delphi – college
  8. BCB – college
  9. VC – college
  10. PHP – college
  11. Flash – college
  12. VHDL – at wok
  13. Python – at wok
  14. Java – at wok
  15. Matlab – at wok
  16. Objective-C – at wok
  17. Swift – at wok

Maybe I missed something , I am an old man now. 🙂