My ElasticSeach service is borken, and how I fix it.

My website(http://ourcoders.com/) have an ElasticSearch service, I use it to provide search service and related content function

Several days ago, I found out that this service is broken, so I start to want to fix it.

First I don’t remember why I think that this might because my java VM is 32bit, I try a lot of time to install OpenJDK-9,  OpenJDK-10, default-JDK, try to enable 64bit mode. But I was failed. 

Today I go into ElasticSearch install directory, I try to start it manually, I found out there is a error information:

So I edited my elasticsearch.yml add xpack.ml.enabled: false, and it fixed.

Best opensource or Open Api OCR engines or services

Microsoft Azure Computer Vision API

Read text in images

Detect text in an image using optical character recognition (OCR) and extract the recognized words into a machine-readable character stream. Analyze images to detect embedded text, generate character streams, and enable searching. Save time and effort by taking photos of text instead of copying it.

Preview: Read handwritten text from images

Detect and extract handwritten text from notes, letters, essays, whiteboards, forms, and other sources. Reduce paper clutter and be more productive by taking photos of handwritten notes instead of transcribing them, and make the digital notes easy to find by implementing search. Handwritten OCR works with different surfaces and backgrounds, such as white paper, yellow sticky notes, and whiteboards.

ABBYY FineReader Engine

The software development kit ABBYY FineReader Engine allows software developers to create applications that extract textual information from paper documents, images or displays. This AI-powered OCR SDK provides your application with excellent text recognition, PDF conversion, and data capture functionalities, enabling it to convert scans into searchable PDF, Word or Excel documents, and access data on photos or screenshots.

Available for Windows, Linux, Mac OS and embedded platforms. On premises or in the Cloud.

Highest OCR accuracy

Provide your customers with the outstanding OCR quality available in ABBYY FineReader. Leading providers of ECM systems, document imaging and capture solutions, RPA solutions, as well as a scanner and MFP manufacturers trust ABBYY OCR technology.

Increased value

Expand your solutions. ABBYY FineReader Engine enables your software to convert TIFF libraries into PDF, PDF/A, Word or other formats, and accurately extract field values. Develop on Windows, Linux or Mac and offer your software in the Cloud or on VM platforms.

Faster time to market

Outperform your competition and get premium OCR solutions to the market quickly – with OCR toolkit’s powerful APIs. Easily integrate world-class OCR features with the help of pre-configured tools, parameters, code samples and-printed and other components.

The comprehensive set of recognition technologies

With OCR toolkit integration, applications can extract machine printed text in over 200 languages as well as hand-printed text, optical marks, and barcode values.

Powerful PDF processing tools

Versatile APIs allow processing many PDF types and converting scanned documents, TIFFs, JPEGs or image-only PDFs into different searchable PDF and PDF/A files.

Artificial Intelligence and Machine Learning

AI, ML, and other advanced technologies provide outstanding recognition accuracy for multi-language documents and deliver searchable and editable documents that reflect their originals.

Multi-core CPUs, Cloud and Virtual Machines support

Support for document processing in parallel threads on multi-core CPUs, deployment in the cloud and virtual environments guarantees fast, flexible and scalable processing.

OCR.space

The OCR.space Online OCR service converts scans or (smartphone) images of text documents into editable files by using Optical Character Recognition (OCR) technologies. It uses state-of-the-art modern OCR software. The recognition quality is comparable to commercial OCR SDK software (e. g. Abbyy).

Our Online OCR service is free to use, no registration necessary. We do not need your email address. Just upload your image files. The OCR software takes a JPG, PNG or PDF (PDF OCR with full support for multi-page documents and multi-column text). The only restriction is the images/PDF must not be larger than 5MB. Email us if you need to process larger documents. If you need to automate your OCR and process many documents, do not web-scrape this page. It is made for humans, not computers. Instead, please use the provided free OCR API .

Freelancing websites and how to start to do freelancing jobs

Websites:

How to start:

  • Most of those websites have international groups bidding on them at a fraction of the cost. And unless your skills are highly specialized you will have a tough time competing. Maybe you can choose the local business to start.
  • Finding a good, well-paid job on freelancing sites can be difficult, you just need to be patient. If your clients like your work, they will often reach out to you again with additional work.
  • You’ll have to do a few projects on the cheap initially, do the best job you can to make sure you get 5-star feedback. After a few months, people will start coming to you.
  • Reading the brief carefully, asking meaningful questions and responding with some context & ideas will really make you stand out.
  • Some clients like long meetings and meaningless phone calls, so you need to bill for the time of meetings and calls. 
  • Having a well thought out and prepared portfolio is key to landing great clients.
  • One of the most important things to know about freelancing is that not every client is a good client, some of them are not worth to contact.
  • Take a look at reviews of clients,  don’t just accept your first offer, don’t take clients that are brand new to the site. 

I bought a gaming laptop and Google provides a service called Colab, So now I can learn how to use deep learning.

Several days ago, I bought a gaming laptop to do some deep learning work. My girlfriend is very happy because now she can use the new gaming laptop play the Playerunknown battleground. And I’m happy because this laptop is much faster than my Mac pro several times when you are doing deep learning tasks.

I update all the codes from tensorflow and play the codes in the tutorials one by one. And I found out now Google provides a service called Colab. Our codes can run on Google’ VM with a GPU google provides.

But a lot of people find out their codes can’t run on Colab because the system often breaks by the error of out of memory. Turns out, Google may share memory and  GPUs between your different sessions. There are some codes can help understand how much memory is free for you.

Normally Google provides a Tesla K80 with 11GB memory, so when you get lucky you may will get this information:

And sometimes your memory almost  all is used, like this:

When the bad thing is happening, you can use command kill to free memory first:

You may need to wait for several minutes, it will kill your current runtime and after you connect to a new runtime, you may be lucky to have full memory unused.

When I try the example DCGAN from tensorflow tutorials, My Mac Pro  (3.7 GHz Quad-Core Intel Xeon E5, Two AMD FirePro D300 2048 MB) needs 255 seconds to finish one epoch.

Google Colab (Nvidia Tesla K80 11GB) needs 30 seconds to finish one epoch. 

My gaming laptop (Nvidia GeForce 1050 ti 4GB) needs 34 seconds to finish one epoch.

So, Google Colab is very useful, but when you run some tasks which took too long, Colab may disconnect and you may never connect to the original session, so you may need run it again and again. So I am happy that I bought my own gaming laptop. 

See also: https://stackoverflow.com/questions/48750199/google-colaboratory-misleading-information-about-its-gpu-only-5-ram-available

Use More like this function of Elastic Search to implement the related content function

I have a forum about the programmer.

At first, I used “title search” to implement the related content of threads. It means if you want to know, which threads are related to one thread, you just use the title of this thread to search. This is a very easy approach, but there are some disadvantages of this method:

  1. Sometimes the title is to sample to represent the content.
  2. Sometimes there are no similar contents, but the author wrote other contents in this site, so we want these contents can be shown.
  3. The title is too short to represent, but we just cannot search all the content in this thread.

So, I am happy to find out Elastic Search, which I am using for search, have a function “More like this” can do it.

Before we start to write codes we can use CURL to test the idea.

Note:

  1. Put -d at the end, so we can use shell multi-line string.
  2. [{"_id" : 3721}] is the id of content.
  3. Use "_source": ["title","id"] to limit return fileds, we just want to generate a related content list, we don’t other information.

Result is:

But some content is truly in the search engine, you can find it in search, but this method can not return any related content. It will return like this:

So we can provide more condition (title, author, etc.) to get more results, like this:

Now it is done.

But there is another small problem, is “More like this” function is much slower than search, so you must want to cache the result. 🙂

My website configuration log

Web part ( http://ourcoders.com/ ) :

Elastic Search part :

Automatically transcribe

Reportex

The idea of Reportex is very similar to my idea of something about best audio/video editor. When you select a paragraph of words, it will automatedly select the corresponding audio clips.

Reportex beta editting

But right now it is just a demo.

[TIL] 3d scanner hardware, software, app and course

Hardware

DIY 3D Scanner by Alex

In this video, Alex used stepper motors, threaded rod, IR sensor, home made 3D print components to build a DIY 3D Scanner hardware. There are three kinds of sensors can be to measure the distance, IR sensors, Ultrasonic sensors and Laser sensors.

FabScan

FabScan is an open source 3D laser Scanner hardware, you will get all the information you need to build one from here. FabScan uses a laser sensor to get the distance information for 3d reconstruction and use a camera to get the texture of the object.

There is a video show you how to Assembly it:

There is a video show you how to use it:

Software

VisualSFM

VisualSFM is a 3d scanner software, made by Changchang Wu, support Windows/Linux/Mac. You can put all the photos shotted by different angles, VisualSFM uses SIFT (Scale-invariant feature transform) method to get the feature points of photos, connect the same feature point on different photos, to get the relation of photos, to get the 3D model.

VisualSFM on MacOS

If you are interesting VisualSFM, there is a great video will show you how to use it on youtube.

See also:

PhotoScan

use PhotoScan scan photo files:

Trick

You need to use a camera or your phone to take a lot of photos of the object from all the different angles. This is very painful. I developed a trick method use my iPhone. First, I shot a short video shot every angle of the object use my iPhone, send to my Mac, use FFMPEG to convert this video to photos.

The number 6 is to extract how many frames from 1s video.

I did a lot of 3D scans, use this method.

App

MobileFusion

Microsoft’s MobileFusion app is the best app make 3D scans, but it never comes to market.

Course

If you want to write your own codes to make 3d scanning, you may want to learn this course, Photogrammetry by Cyrill Stachniss.

Cyrill Stachniss is a full professor for photogrammetry at the University of Bonn and recently became the deputy managing director of the Institute of Geodesy and Geoinformation.

Photogrammetry:

And finally, you always can go to Reddit find anything, of course, include 3D Scanning.