An Optimization Method of Approximate Nearest Neighbor Classifiers

Each of us enjoys the convenience of various online services (online search, news recommendation, etc.) every day. Behind these services is a huge amount of data that needs to be processed by the computer in real-time. For example, in the field of image search, for a given query image, the system should quickly find similar images from a huge database (such as containing millions or even hundreds of millions of images); In the news recommendation, the computer also needs to find the most relevant news recommendation from a large amount of news according to the user’s habits.

It is inseparable from the approximate nearest neighbor search algorithm to quickly find valid data from massive data. The approximate nearest neighbor search problem is also important in many fields, including machine learning, data compression, bioinformatics, document retrieval, and data analysis. It can help people quickly find effective content in massive data, but if we want to apply it to reality, it is needed to solve the problem of how to shorten the search time. If we have some positive sample data, we want to select positive data from the unlabeled data set. This problem is a kind of semi-supervised clustering problem. In this project, we modify indexing for fast retrieval of nearest neighbors to speed up the search process and effectively classify data.