Commit 395d2ce6 authored by huchen's avatar huchen
Browse files

init the faiss for rocm

parent 5ded39f5
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project DOES NOT adhere to [Semantic Versioning](https://semver.org/spec/v2.0.0.html)
at the moment.
We try to indicate most contributions here with the contributor names who are not part of
the Facebook Faiss team. Feel free to add entries here if you submit a PR.
## [Unreleased]
## [1.7.2] - 2021-12-15
### Added
- Support LSQ on GPU (by @KinglittleQ)
- Support for exact 1D kmeans (by @KinglittleQ)
## [1.7.1] - 2021-05-27
### Added
- Support for building C bindings through the `FAISS_ENABLE_C_API` CMake option.
- Serializing the indexes with the python pickle module
- Support for the NNDescent k-NN graph building method (by @KinglittleQ)
- Support for the NSG graph indexing method (by @KinglittleQ)
- Residual quantizers: support as codec and unoptimized search
- Support for 4-bit PQ implementation for ARM (by @vorj, @n-miyamoto-fixstars, @LWisteria, and @matsui528)
- Implementation of Local Search Quantization (by @KinglittleQ)
### Changed
- The order of xb an xq was different between `faiss.knn` and `faiss.knn_gpu`.
Also the metric argument was called distance_type.
- The typed vectors (LongVector, LongLongVector, etc.) of the SWIG interface have
been deprecated. They have been replaced with Int32Vector, Int64Vector, etc. (by h-vetinari)
### Fixed
- Fixed a bug causing kNN search functions for IndexBinaryHash and
IndexBinaryMultiHash to return results in a random order.
- Copy constructor of AlignedTable had a bug leading to crashes when cloning
IVFPQ indices.
## [1.7.0] - 2021-01-27
## [1.6.5] - 2020-11-22
## [1.6.4] - 2020-10-12
### Added
- Arbitrary dimensions per sub-quantizer now allowed for `GpuIndexIVFPQ`.
- Brute-force kNN on GPU (`bfKnn`) now accepts `int32` indices.
- Nightly conda builds now available (for CPU).
- Faiss is now supported on Windows.
## [1.6.3] - 2020-03-24
### Added
- Support alternative distances on GPU for GpuIndexFlat, including L1, Linf and
Lp metrics.
- Support METRIC_INNER_PRODUCT for GpuIndexIVFPQ.
- Support float16 coarse quantizer for GpuIndexIVFFlat and GpuIndexIVFPQ. GPU
Tensor Core operations (mixed-precision arithmetic) are enabled on supported
hardware when operating with float16 data.
- Support k-means clustering with encoded vectors. This makes it possible to
train on larger datasets without decompressing them in RAM, and is especially
useful for binary datasets (see https://github.com/facebookresearch/faiss/blob/main/tests/test_build_blocks.py#L92).
- Support weighted k-means. Weights can be associated to each training point
(see https://github.com/facebookresearch/faiss/blob/main/tests/test_build_blocks.py).
- Serialize callback in python, to write to pipes or sockets (see
https://github.com/facebookresearch/faiss/wiki/Index-IO,-cloning-and-hyper-parameter-tuning).
- Reconstruct arbitrary ids from IndexIVF + efficient remove of a small number
of ids. This avoids 2 inefficiencies: O(ntotal) removal of vectors and
IndexIDMap2 on top of indexIVF. Documentation here:
https://github.com/facebookresearch/faiss/wiki/Special-operations-on-indexes.
- Support inner product as a metric in IndexHNSW (see
https://github.com/facebookresearch/faiss/blob/main/tests/test_index.py#L490).
- Support PQ of sizes other than 8 bit in IndexIVFPQ.
- Demo on how to perform searches sequentially on an IVF index. This is useful
for an OnDisk index with a very large batch of queries. In that case, it is
worthwhile to scan the index sequentially (see
https://github.com/facebookresearch/faiss/blob/main/tests/test_ivflib.py#L62).
- Range search support for most binary indexes.
- Support for hashing-based binary indexes (see
https://github.com/facebookresearch/faiss/wiki/Binary-indexes).
### Changed
- Replaced obj table in Clustering object: now it is a ClusteringIterationStats
structure that contains additional statistics.
### Removed
- Removed support for useFloat16Accumulator for accumulators on GPU (all
accumulations are now done in float32, regardless of whether float16 or float32
input data is used).
### Fixed
- Some python3 fixes in benchmarks.
- Fixed GpuCloner (some fields were not copied, default to no precomputed tables
with IndexIVFPQ).
- Fixed support for new pytorch versions.
- Serialization bug with alternative distances.
- Removed test on multiple-of-4 dimensions when switching between blas and AVX
implementations.
## [1.6.2] - 2020-03-10
## [1.6.1] - 2019-12-04
## [1.6.0] - 2019-09-24
### Added
- Faiss as a codec: We introduce a new API within Faiss to encode fixed-size
vectors into fixed-size codes. The encoding is lossy and the tradeoff between
compression and reconstruction accuracy can be adjusted.
- ScalarQuantizer support for GPU, see gpu/GpuIndexIVFScalarQuantizer.h. This is
particularly useful as GPU memory is often less abundant than CPU.
- Added easy-to-use serialization functions for indexes to byte arrays in Python
(faiss.serialize_index, faiss.deserialize_index).
- The Python KMeans object can be used to use the GPU directly, just add
gpu=True to the constuctor see gpu/test/test_gpu_index.py test TestGPUKmeans.
### Changed
- Change in the code layout: many C++ sources are now in subdirectories impl/
and utils/.
## [1.5.3] - 2019-06-24
### Added
- Basic support for 6 new metrics in CPU IndexFlat and IndexHNSW (https://github.com/facebookresearch/faiss/issues/848).
- Support for IndexIDMap/IndexIDMap2 with binary indexes (https://github.com/facebookresearch/faiss/issues/780).
### Changed
- Throw python exception for OOM (https://github.com/facebookresearch/faiss/issues/758).
- Make DistanceComputer available for all random access indexes.
- Gradually moving from long to uint64_t for portability.
### Fixed
- Slow scanning of inverted lists (https://github.com/facebookresearch/faiss/issues/836).
## [1.5.2] - 2019-05-28
### Added
- Support for searching several inverted lists in parallel (parallel_mode != 0).
- Better support for PQ codes where nbit != 8 or 16.
- IVFSpectralHash implementation: spectral hash codes inside an IVF.
- 6-bit per component scalar quantizer (4 and 8 bit were already supported).
- Combinations of inverted lists: HStackInvertedLists and VStackInvertedLists.
- Configurable number of threads for OnDiskInvertedLists prefetching (including
0=no prefetch).
- More test and demo code compatible with Python 3 (print with parentheses).
### Changed
- License was changed from BSD+Patents to MIT.
- Exceptions raised in sub-indexes of IndexShards and IndexReplicas are now
propagated.
- Refactored benchmark code: data loading is now in a single file.
## [1.5.1] - 2019-04-05
### Added
- MatrixStats object, which reports useful statistics about a dataset.
- Option to round coordinates during k-means optimization.
- An alternative option for search in HNSW.
- Support for range search in IVFScalarQuantizer.
- Support for direct uint_8 codec in ScalarQuantizer.
- Better support for PQ code assignment with external index.
- Support for IMI2x16 (4B virtual centroids).
- Support for k = 2048 search on GPU (instead of 1024).
- Support for renaming an ondisk invertedlists.
- Support for nterrupting computations with interrupt signal (ctrl-C) in python.
- Simplified build system (with --with-cuda/--with-cuda-arch options).
### Changed
- Moved stats() and imbalance_factor() from IndexIVF to InvertedLists object.
- Renamed IndexProxy to IndexReplicas.
- Most CUDA mem alloc failures now throw exceptions instead of terminating on an
assertion.
- Updated example Dockerfile.
- Conda packages now depend on the cudatoolkit packages, which fixes some
interferences with pytorch. Consequentially, faiss-gpu should now be installed
by conda install -c pytorch faiss-gpu cudatoolkit=10.0.
## [1.5.0] - 2018-12-19
### Added
- New GpuIndexBinaryFlat index.
- New IndexBinaryHNSW index.
## [1.4.0] - 2018-08-30
### Added
- Automatic tracking of C++ references in Python.
- Support for non-intel platforms, some functions optimized for ARM.
- Support for overriding nprobe for concurrent searches.
- Support for floating-point quantizers in binary indices.
### Fixed
- No more segfaults due to Python's GC.
- GpuIndexIVFFlat issues for float32 with 64 / 128 dims.
- Sharding of flat indexes on GPU with index_cpu_to_gpu_multiple.
## [1.3.0] - 2018-07-10
### Added
- Support for binary indexes (IndexBinaryFlat, IndexBinaryIVF).
- Support fp16 encoding in scalar quantizer.
- Support for deduplication in IndexIVFFlat.
- Support for index serialization.
### Fixed
- MMAP bug for normal indices.
- Propagation of io_flags in read func.
- k-selection for CUDA 9.
- Race condition in OnDiskInvertedLists.
## [1.2.1] - 2018-02-28
### Added
- Support for on-disk storage of IndexIVF data.
- C bindings.
- Extended tutorial to GPU indices.
[Unreleased]: https://github.com/facebookresearch/faiss/compare/v1.7.2...HEAD
[1.7.2]: https://github.com/facebookresearch/faiss/compare/v1.7.1...v1.7.2
[1.7.1]: https://github.com/facebookresearch/faiss/compare/v1.7.0...v1.7.1
[1.7.0]: https://github.com/facebookresearch/faiss/compare/v1.6.5...v1.7.0
[1.6.5]: https://github.com/facebookresearch/faiss/compare/v1.6.4...v1.6.5
[1.6.4]: https://github.com/facebookresearch/faiss/compare/v1.6.3...v1.6.4
[1.6.3]: https://github.com/facebookresearch/faiss/compare/v1.6.2...v1.6.3
[1.6.2]: https://github.com/facebookresearch/faiss/compare/v1.6.1...v1.6.2
[1.6.1]: https://github.com/facebookresearch/faiss/compare/v1.6.0...v1.6.1
[1.6.0]: https://github.com/facebookresearch/faiss/compare/v1.5.3...v1.6.0
[1.5.3]: https://github.com/facebookresearch/faiss/compare/v1.5.2...v1.5.3
[1.5.2]: https://github.com/facebookresearch/faiss/compare/v1.5.1...v1.5.2
[1.5.1]: https://github.com/facebookresearch/faiss/compare/v1.5.0...v1.5.1
[1.5.0]: https://github.com/facebookresearch/faiss/compare/v1.4.0...v1.5.0
[1.4.0]: https://github.com/facebookresearch/faiss/compare/v1.3.0...v1.4.0
[1.3.0]: https://github.com/facebookresearch/faiss/compare/v1.2.1...v1.3.0
[1.2.1]: https://github.com/facebookresearch/faiss/releases/tag/v1.2.1
# Copyright (c) Facebook, Inc. and its affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.
cmake_minimum_required(VERSION 3.17 FATAL_ERROR)
project(faiss
VERSION 1.6.4
DESCRIPTION "A library for efficient similarity search and clustering of dense vectors."
HOMEPAGE_URL "https://github.com/facebookresearch/faiss"
LANGUAGES CXX)
include(GNUInstallDirs)
set(CMAKE_CXX_STANDARD 11)
list(APPEND CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")
# Valid values are "generic", "avx2".
option(FAISS_OPT_LEVEL "" "generic")
option(FAISS_ENABLE_GPU "Enable support for GPU indexes." ON)
option(FAISS_ENABLE_PYTHON "Build Python extension." ON)
option(FAISS_ENABLE_C_API "Build C API." OFF)
# HC
if(FAISS_ENABLE_GPU)
set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER})
find_package(HIP)
#enable_language(CUDA)
endif()
add_subdirectory(faiss)
if(FAISS_ENABLE_GPU)
add_subdirectory(faiss/gpu)
endif()
if(FAISS_ENABLE_PYTHON)
add_subdirectory(faiss/python)
endif()
if(FAISS_ENABLE_C_API)
add_subdirectory(c_api)
endif()
add_subdirectory(demos)
add_subdirectory(tutorial/cpp)
# CTest must be included in the top level to enable `make test` target.
include(CTest)
if(BUILD_TESTING)
add_subdirectory(tests)
if(FAISS_ENABLE_GPU)
add_subdirectory(faiss/gpu/test)
endif()
endif()
# Code of Conduct
Facebook has adopted a Code of Conduct that we expect project participants to adhere to. Please [read the full text](https://code.fb.com/codeofconduct) so that you can understand what actions will and will not be tolerated.
\ No newline at end of file
# Contributing to Faiss
We want to make contributing to this project as easy and transparent as
possible.
## Our Development Process
We mainly develop Faiss within Facebook. Sometimes, we will sync the
github version of Faiss with the internal state.
## Pull Requests
We welcome pull requests that add significant value to Faiss. If you plan to do
a major development and contribute it back to Faiss, please contact us first before
putting too much effort into it.
1. Fork the repo and create your branch from `main`.
2. If you've added code that should be tested, add tests.
3. If you've changed APIs, update the documentation.
4. Ensure the test suite passes.
5. Make sure your code lints.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
There is a Facebook internal test suite for Faiss, and we need to run
all changes to Faiss through it.
## Contributor License Agreement ("CLA")
In order to accept your pull request, we need you to submit a CLA. You only need
to do this once to work on any of Facebook's open source projects.
Complete your CLA here: <https://code.facebook.com/cla>
## Issues
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.
Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
disclosure of security bugs. In those cases, please go through the process
outlined on that page and do not file a public issue.
## Coding Style
* 4 or 2 spaces for indentation in C++ (no tabs)
* 80 character line length (both for C++ and Python)
* C++ language level: C++11
## License
By contributing to Faiss, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
FROM nvidia/cuda:8.0-devel-centos7
# Install MKL
RUN yum-config-manager --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo
RUN rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
RUN yum install -y intel-mkl-2019.3-062
ENV LD_LIBRARY_PATH /opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH
ENV LIBRARY_PATH /opt/intel/mkl/lib/intel64:$LIBRARY_PATH
ENV LD_PRELOAD /usr/lib64/libgomp.so.1:/opt/intel/mkl/lib/intel64/libmkl_def.so:\
/opt/intel/mkl/lib/intel64/libmkl_avx2.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:\
/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_gnu_thread.so
# Install necessary build tools
RUN yum install -y gcc-c++ make swig3
# Install necesary headers/libs
RUN yum install -y python-devel numpy
COPY . /opt/faiss
WORKDIR /opt/faiss
# --with-cuda=/usr/local/cuda-8.0
RUN ./configure --prefix=/usr --libdir=/usr/lib64 --without-cuda
RUN make -j $(nproc)
RUN make -C python
RUN make test
RUN make install
RUN make -C demos demo_ivfpq_indexing && ./demos/demo_ivfpq_indexing
This diff is collapsed.
# Installing Faiss via conda
The recommended way to install Faiss is through [conda](https://docs.conda.io).
Stable releases are pushed regularly to the pytorch conda channel, as well as
pre-release nightly builds.
The CPU-only `faiss-cpu` conda package is currently available on Linux, OSX, and
Windows. The `faiss-gpu`, containing both CPU and GPU indices, is available on
Linux systems, for various versions of CUDA.
To install the latest stable release:
``` shell
# CPU-only version
$ conda install -c pytorch faiss-cpu
# GPU(+CPU) version
$ conda install -c pytorch faiss-gpu
# or for a specific CUDA version
$ conda install -c pytorch faiss-gpu cudatoolkit=10.2 # for CUDA 10.2
```
Nightly pre-release packages can be installed as follows:
``` shell
# CPU-only version
$ conda install -c pytorch/label/nightly faiss-cpu
# GPU(+CPU) version
$ conda install -c pytorch/label/nightly faiss-gpu
```
## Installing from conda-forge
Faiss is also being packaged by [conda-forge](https://conda-forge.org/), the
community-driven packaging ecosystem for conda. The packaging effort is
collaborating with the Faiss team to ensure high-quality package builds.
Due to the comprehensive infrastructure of conda-forge, it may even happen that
certain build combinations are supported in conda-forge that are not available
through the pytorch channel. To install, use
``` shell
# CPU version
$ conda install -c conda-forge faiss-cpu
# GPU version
$ conda install -c conda-forge faiss-gpu
```
You can tell which channel your conda packages come from by using `conda list`.
If you are having problems using a package built by conda-forge, please raise
an [issue](https://github.com/conda-forge/faiss-split-feedstock/issues) on the
conda-forge package "feedstock".
# Building from source
Faiss can be built from source using CMake.
Faiss is supported on x86_64 machines on Linux, OSX, and Windows. It has been
found to run on other platforms as well, see
[other platforms](https://github.com/facebookresearch/faiss/wiki/Related-projects#bindings-to-other-languages-and-porting-to-other-platforms).
The basic requirements are:
- a C++11 compiler (with support for OpenMP support version 2 or higher),
- a BLAS implementation (we strongly recommend using Intel MKL for best
performance).
The optional requirements are:
- for GPU indices:
- nvcc,
- the CUDA toolkit,
- for the python bindings:
- python 3,
- numpy,
- and swig.
Indications for specific configurations are available in the [troubleshooting
section of the wiki](https://github.com/facebookresearch/faiss/wiki/Troubleshooting).
## Step 1: invoking CMake
``` shell
$ cmake -B build .
```
This generates the system-dependent configuration/build files in the `build/`
subdirectory.
Several options can be passed to CMake, among which:
- general options:
- `-DFAISS_ENABLE_GPU=OFF` in order to disable building GPU indices (possible
values are `ON` and `OFF`),
- `-DFAISS_ENABLE_PYTHON=OFF` in order to disable building python bindings
(possible values are `ON` and `OFF`),
- `-DBUILD_TESTING=OFF` in order to disable building C++ tests,
- `-DBUILD_SHARED_LIBS=ON` in order to build a shared library (possible values
are `ON` and `OFF`),
- optimization-related options:
- `-DCMAKE_BUILD_TYPE=Release` in order to enable generic compiler
optimization options (enables `-O3` on gcc for instance),
- `-DFAISS_OPT_LEVEL=avx2` in order to enable the required compiler flags to
generate code using optimized SIMD instructions (possible values are `generic`,
`sse4`, and `avx2`, by increasing order of optimization),
- BLAS-related options:
- `-DBLA_VENDOR=Intel10_64_dyn -DMKL_LIBRARIES=/path/to/mkl/libs` to use the
Intel MKL BLAS implementation, which is significantly faster than OpenBLAS
(more information about the values for the `BLA_VENDOR` option can be found in
the [CMake docs](https://cmake.org/cmake/help/latest/module/FindBLAS.html)),
- GPU-related options:
- `-DCUDAToolkit_ROOT=/path/to/cuda-10.1` in order to hint to the path of
the CUDA toolkit (for more information, see
[CMake docs](https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html)),
- `-DCMAKE_CUDA_ARCHITECTURES="75;72"` for specifying which GPU architectures
to build against (see [CUDA docs](https://developer.nvidia.com/cuda-gpus) to
determine which architecture(s) you should pick),
- python-related options:
- `-DPython_EXECUTABLE=/path/to/python3.7` in order to build a python
interface for a different python than the default one (see
[CMake docs](https://cmake.org/cmake/help/latest/module/FindPython.html)).
## Step 2: Invoking Make
``` shell
$ make -C build -j faiss
```
This builds the C++ library (`libfaiss.a` by default, and `libfaiss.so` if
`-DBUILD_SHARED_LIBS=ON` was passed to CMake).
The `-j` option enables parallel compilation of multiple units, leading to a
faster build, but increasing the chances of running out of memory, in which case
it is recommended to set the `-j` option to a fixed value (such as `-j4`).
## Step 3: Building the python bindings (optional)
``` shell
$ make -C build -j swigfaiss
$ (cd build/faiss/python && python setup.py install)
```
The first command builds the python bindings for Faiss, while the second one
generates and installs the python package.
## Step 4: Installing the C++ library and headers (optional)
``` shell
$ make -C build install
```
This will make the compiled library (either `libfaiss.a` or `libfaiss.so` on
Linux) available system-wide, as well as the C++ headers. This step is not
needed to install the python package only.
## Step 5: Testing (optional)
### Running the C++ test suite
To run the whole test suite, make sure that `cmake` was invoked with
`-DBUILD_TESTING=ON`, and run:
``` shell
$ make -C build test
```
### Running the python test suite
``` shell
$ (cd build/faiss/python && python setup.py build)
$ PYTHONPATH="$(ls -d ./build/faiss/python/build/lib*/)" pytest tests/test_*.py
```
### Basic example
A basic usage example is available in
[`demos/demo_ivfpq_indexing.cpp`](https://github.com/facebookresearch/faiss/blob/main/demos/demo_ivfpq_indexing.cpp).
It creates a small index, stores it and performs some searches. A normal runtime
is around 20s. With a fast machine and Intel MKL's BLAS it runs in 2.5s.
It can be built with
``` shell
$ make -C build demo_ivfpq_indexing
```
and subsequently ran with
``` shell
$ ./build/demos/demo_ivfpq_indexing
```
### Basic GPU example
``` shell
$ make -C build demo_ivfpq_indexing_gpu
$ ./build/demos/demo_ivfpq_indexing_gpu
```
This produce the GPU code equivalent to the CPU `demo_ivfpq_indexing`. It also
shows how to translate indexes from/to a GPU.
### A real-life benchmark
A longer example runs and evaluates Faiss on the SIFT1M dataset. To run it,
please download the ANN_SIFT1M dataset from http://corpus-texmex.irisa.fr/
and unzip it to the subdirectory `sift1M` at the root of the source
directory for this repository.
Then compile and run the following (after ensuring you have installed faiss):
``` shell
$ make -C build demo_sift1M
$ ./build/demos/demo_sift1M
```
This is a demonstration of the high-level auto-tuning API. You can try
setting a different index_key to find the indexing structure that
gives the best performance.
### Real-life test
The following script extends the demo_sift1M test to several types of
indexes. This must be run from the root of the source directory for this
repository:
``` shell
$ mkdir tmp # graphs of the output will be written here
$ python demos/demo_auto_tune.py
```
It will cycle through a few types of indexes and find optimal
operating points. You can play around with the types of indexes.
### Real-life test on GPU
The example above also runs on GPU. Edit `demos/demo_auto_tune.py` at line 100
with the values
``` python
keys_to_test = keys_gpu
use_gpu = True
```
and you can run
``` shell
$ python demos/demo_auto_tune.py
```
to test the GPU code.
MIT License
Copyright (c) Facebook, Inc. and its affiliates.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
# Faiss # Faiss
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by [Facebook AI Research](https://research.fb.com/category/facebook-ai-research-fair/).
## News
See [CHANGELOG.md](CHANGELOG.md) for detailed information about latest features.
## Introduction
Faiss contains several methods for similarity search. It assumes that the instances are represented as vectors and are identified by an integer, and that the vectors can be compared with L2 (Euclidean) distances or dot products. Vectors that are similar to a query vector are those that have the lowest L2 distance or the highest dot product with the query vector. It also supports cosine similarity, since this is a dot product on normalized vectors.
Most of the methods, like those based on binary vectors and compact quantization codes, solely use a compressed representation of the vectors and do not require to keep the original vectors. This generally comes at the cost of a less precise search but these methods can scale to billions of vectors in main memory on a single server.
The GPU implementation can accept input from either CPU or GPU memory. On a server with GPUs, the GPU indexes can be used a drop-in replacement for the CPU indexes (e.g., replace `IndexFlatL2` with `GpuIndexFlatL2`) and copies to/from GPU memory are handled automatically. Results will be faster however if both input and output remain resident on the GPU. Both single and multi-GPU usage is supported.
## Building
The library is mostly implemented in C++, with optional GPU support provided via CUDA, and an optional Python interface. The CPU version requires a BLAS library. It compiles with a Makefile and can be packaged in a docker image. See [INSTALL.md](INSTALL.md) for details.
## How Faiss works
Faiss is built around an index type that stores a set of vectors, and provides a function to search in them with L2 and/or dot product vector comparison. Some index types are simple baselines, such as exact search. Most of the available indexing structures correspond to various trade-offs with respect to
- search time
- search quality
- memory used per index vector
- training time
- need for external data for unsupervised training
The optional GPU implementation provides what is likely (as of March 2017) the fastest exact and approximate (compressed-domain) nearest neighbor search implementation for high-dimensional vectors, fastest Lloyd's k-means, and fastest small k-selection algorithm known. [The implementation is detailed here](https://arxiv.org/abs/1702.08734).
## Full documentation of Faiss
The following are entry points for documentation:
- the full documentation, including a [tutorial](https://github.com/facebookresearch/faiss/wiki/Getting-started), a [FAQ](https://github.com/facebookresearch/faiss/wiki/FAQ) and a [troubleshooting section](https://github.com/facebookresearch/faiss/wiki/Troubleshooting) can be found on the [wiki page](http://github.com/facebookresearch/faiss/wiki)
- the [doxygen documentation](https://facebookresearch.github.io/faiss) gives per-class information
- to reproduce results from our research papers, [Polysemous codes](https://arxiv.org/abs/1609.01882) and [Billion-scale similarity search with GPUs](https://arxiv.org/abs/1702.08734), refer to the [benchmarks README](benchs/README.md). For [
Link and code: Fast indexing with graphs and compact regression codes](https://arxiv.org/abs/1804.09996), see the [link_and_code README](benchs/link_and_code)
## Authors
The main authors of Faiss are:
- [Hervé Jégou](https://github.com/jegou) initiated the Faiss project and wrote its first implementation
- [Matthijs Douze](https://github.com/mdouze) implemented most of the CPU Faiss
- [Jeff Johnson](https://github.com/wickedfoo) implemented all of the GPU Faiss
- [Lucas Hosseini](https://github.com/beauby) implemented the binary indexes
## Reference
Reference to cite when you use Faiss in a research paper:
```
@article{JDH17,
title={Billion-scale similarity search with GPUs},
author={Johnson, Jeff and Douze, Matthijs and J{\'e}gou, Herv{\'e}},
journal={arXiv preprint arXiv:1702.08734},
year={2017}
}
```
## Join the Faiss community
For public discussion of Faiss or for questions, there is a Facebook group at https://www.facebook.com/groups/faissusers/
We monitor the [issues page](http://github.com/facebookresearch/faiss/issues) of the repository.
You can report bugs, ask questions, etc.
## License
Faiss is MIT-licensed.
# Benchmarking scripts
This directory contains benchmarking scripts that can reproduce the
numbers reported in the two papers
```
@inproceedings{DJP16,
Author = {Douze, Matthijs and J{\'e}gou, Herv{\'e} and Perronnin, Florent},
Booktitle = "ECCV",
Organization = {Springer},
Title = {Polysemous codes},
Year = {2016}
}
```
and
```
@inproceedings{JDJ17,
Author = {Jeff Johnson and Matthijs Douze and Herv{\'e} J{\'e}gou},
journal= {arXiv:1702.08734},,
Title = {Billion-scale similarity search with GPUs},
Year = {2017},
}
```
Note that the numbers (especially timings) change slightly due to changes in the implementation, different machines, etc.
The scripts are self-contained. They depend only on Faiss and external training data that should be stored in sub-directories.
## SIFT1M experiments
The script [`bench_polysemous_sift1m.py`](bench_polysemous_sift1m.py) reproduces the numbers in
Figure 3 from the "Polysemous" paper.
### Getting SIFT1M
To run it, please download the ANN_SIFT1M dataset from
http://corpus-texmex.irisa.fr/
and unzip it to the subdirectory sift1M.
### Result
The output looks like:
```
PQ training on 100000 points, remains 0 points: training polysemous on centroids
add vectors to index
PQ baseline 7.517 ms per query, R@1 0.4474
Polysemous 64 9.875 ms per query, R@1 0.4474
Polysemous 62 8.358 ms per query, R@1 0.4474
Polysemous 58 5.531 ms per query, R@1 0.4474
Polysemous 54 3.420 ms per query, R@1 0.4478
Polysemous 50 2.182 ms per query, R@1 0.4475
Polysemous 46 1.621 ms per query, R@1 0.4408
Polysemous 42 1.448 ms per query, R@1 0.4174
Polysemous 38 1.331 ms per query, R@1 0.3563
Polysemous 34 1.334 ms per query, R@1 0.2661
Polysemous 30 1.272 ms per query, R@1 0.1794
```
## Experiments on 1B elements dataset
The script [`bench_polysemous_1bn.py`](bench_polysemous_1bn.py) reproduces a few experiments on
two datasets of size 1B from the Polysemous codes" paper.
### Getting BIGANN
Download the four files of ANN_SIFT1B from
http://corpus-texmex.irisa.fr/ to subdirectory bigann/
### Getting Deep1B
The ground-truth and queries are available here
https://yadi.sk/d/11eDCm7Dsn9GA
For the learning and database vectors, use the script
https://github.com/arbabenko/GNOIMI/blob/master/downloadDeep1B.py
to download the data to subdirectory deep1b/, then concatenate the
database files to base.fvecs and the training files to learn.fvecs
### Running the experiments
These experiments are quite long. To support resuming, the script
stores the result of training to a temporary directory, `/tmp/bench_polysemous`.
The script `bench_polysemous_1bn.py` takes at least two arguments:
- the dataset name: SIFT1000M (aka SIFT1B, aka BIGANN) or Deep1B. SIFT1M, SIFT2M,... are also supported to make subsets of for small experiments (note that SIFT1M as a subset of SIFT1B is not the same as the SIFT1M above)
- the type of index to build, which should be a valid [index_factory key](https://github.com/facebookresearch/faiss/wiki/High-level-interface-and-auto-tuning#index-factory) (see below for examples)
- the remaining arguments are parsed as search-time parameters.
### Experiments of Table 2
The `IMI*+PolyD+ADC` results in Table 2 can be reproduced with (for 16 bytes):
```
python bench_polysemous_1bn.par SIFT1000M IMI2x12,PQ16 nprobe=16,max_codes={10000,30000},ht={44..54}
```
Training takes about 2 minutes and adding vectors to the dataset
takes 3.1 h. These operations are multithreaded. Note that in the command
above, we use bash's [brace expansion](https://www.gnu.org/software/bash/manual/html_node/Brace-Expansion.html) to set a grid of parameters.
The search is *not* multithreaded, and the output looks like:
```
R@1 R@10 R@100 time %pass
nprobe=16,max_codes=10000,ht=44 0.1779 0.2994 0.3139 0.194 12.45
nprobe=16,max_codes=10000,ht=45 0.1859 0.3183 0.3339 0.197 14.24
nprobe=16,max_codes=10000,ht=46 0.1930 0.3366 0.3543 0.202 16.22
nprobe=16,max_codes=10000,ht=47 0.1993 0.3550 0.3745 0.209 18.39
nprobe=16,max_codes=10000,ht=48 0.2033 0.3694 0.3917 0.640 20.77
nprobe=16,max_codes=10000,ht=49 0.2070 0.3839 0.4077 0.229 23.36
nprobe=16,max_codes=10000,ht=50 0.2101 0.3949 0.4205 0.232 26.17
nprobe=16,max_codes=10000,ht=51 0.2120 0.4042 0.4310 0.239 29.21
nprobe=16,max_codes=10000,ht=52 0.2134 0.4113 0.4402 0.245 32.47
nprobe=16,max_codes=10000,ht=53 0.2157 0.4184 0.4482 0.250 35.96
nprobe=16,max_codes=10000,ht=54 0.2170 0.4240 0.4546 0.256 39.66
nprobe=16,max_codes=30000,ht=44 0.1882 0.3327 0.3555 0.226 11.29
nprobe=16,max_codes=30000,ht=45 0.1964 0.3525 0.3771 0.231 13.05
nprobe=16,max_codes=30000,ht=46 0.2039 0.3713 0.3987 0.236 15.01
nprobe=16,max_codes=30000,ht=47 0.2103 0.3907 0.4202 0.245 17.19
nprobe=16,max_codes=30000,ht=48 0.2145 0.4055 0.4384 0.251 19.60
nprobe=16,max_codes=30000,ht=49 0.2179 0.4198 0.4550 0.257 22.25
nprobe=16,max_codes=30000,ht=50 0.2208 0.4305 0.4681 0.268 25.15
nprobe=16,max_codes=30000,ht=51 0.2227 0.4402 0.4791 0.275 28.30
nprobe=16,max_codes=30000,ht=52 0.2241 0.4473 0.4884 0.284 31.70
nprobe=16,max_codes=30000,ht=53 0.2265 0.4544 0.4965 0.294 35.34
nprobe=16,max_codes=30000,ht=54 0.2278 0.4601 0.5031 0.303 39.20
```
The result reported in table 2 is the one for which the %pass (percentage of code comparisons that pass the Hamming check) is around 20%, which occurs for Hamming threshold `ht=48`.
The 8-byte results can be reproduced with the factory key `IMI2x12,PQ8`
### Experiments of the appendix
The experiments in the appendix are only in the ArXiv version of the paper (table 3).
```
python bench_polysemous_1bn.py SIFT1000M OPQ8_64,IMI2x13,PQ8 nprobe={1,2,4,8,16,32,64,128},ht={20,24,26,28,30}
R@1 R@10 R@100 time %pass
nprobe=1,ht=20 0.0351 0.0616 0.0751 0.158 19.01
...
nprobe=32,ht=28 0.1256 0.3563 0.5026 0.561 52.61
...
```
Here again the runs are not exactly the same but the original result was obtained from nprobe=32,ht=28.
For Deep1B, we used a simple version of [auto-tuning](https://github.com/facebookresearch/faiss/wiki/High-level-interface-and-auto-tuning/_edit#auto-tuning-the-runtime-parameters) to sweep through the set of operating points:
```
python bench_polysemous_1bn.py Deep1B OPQ20_80,IMI2x14,PQ20 autotune
...
Done in 4067.555 s, available OPs:
Parameters 1-R@1 time
0.0000 0.000
nprobe=1,ht=22,max_codes=256 0.0215 3.115
nprobe=1,ht=30,max_codes=256 0.0381 3.120
...
nprobe=512,ht=68,max_codes=524288 0.4478 36.903
nprobe=1024,ht=80,max_codes=131072 0.4557 46.363
nprobe=1024,ht=78,max_codes=262144 0.4616 61.939
...
```
The original results were obtained with `nprobe=1024,ht=66,max_codes=262144`.
## GPU experiments
The benchmarks below run 1 or 4 Titan X GPUs and reproduce the results of the "GPU paper". They are also a good starting point on how to use GPU Faiss.
### Search on SIFT1M
See above on how to get SIFT1M into subdirectory sift1M/. The script [`bench_gpu_sift1m.py`](bench_gpu_sift1m.py) reproduces the "exact k-NN time" plot in the ArXiv paper, and the SIFT1M numbers.
The output is:
```
============ Exact search
add vectors to index
warmup
benchmark
k=1 0.715 s, R@1 0.9914
k=2 0.729 s, R@1 0.9935
k=4 0.731 s, R@1 0.9935
k=8 0.732 s, R@1 0.9935
k=16 0.742 s, R@1 0.9935
k=32 0.737 s, R@1 0.9935
k=64 0.753 s, R@1 0.9935
k=128 0.761 s, R@1 0.9935
k=256 0.799 s, R@1 0.9935
k=512 0.975 s, R@1 0.9935
k=1024 1.424 s, R@1 0.9935
============ Approximate search
train
WARNING clustering 100000 points to 4096 centroids: please provide at least 159744 training points
add vectors to index
WARN: increase temp memory to avoid cudaMalloc, or decrease query/add size (alloc 256000000 B, highwater 256000000 B)
warmup
benchmark
nprobe= 1 0.043 s recalls= 0.3909 0.4312 0.4312
nprobe= 2 0.040 s recalls= 0.5041 0.5636 0.5636
nprobe= 4 0.048 s recalls= 0.6048 0.6897 0.6897
nprobe= 8 0.064 s recalls= 0.6879 0.8028 0.8028
nprobe= 16 0.088 s recalls= 0.7534 0.8940 0.8940
nprobe= 32 0.134 s recalls= 0.7957 0.9549 0.9550
nprobe= 64 0.224 s recalls= 0.8125 0.9833 0.9834
nprobe= 128 0.395 s recalls= 0.8205 0.9953 0.9954
nprobe= 256 0.717 s recalls= 0.8227 0.9993 0.9994
nprobe= 512 1.348 s recalls= 0.8228 0.9999 1.0000
```
The run produces two warnings:
- the clustering complains that it does not have enough training data, there is not much we can do about this.
- the add() function complains that there is an inefficient memory allocation, but this is a concern only when it happens often, and we are not benchmarking the add time anyways.
To index small datasets, it is more efficient to use a `GpuIVFFlat`, which just stores the full vectors in the inverted lists. We did not mention this in the the paper because it is not as scalable. To experiment with this setting, change the `index_factory` string from "IVF4096,PQ64" to "IVF16384,Flat". This gives:
```
nprobe= 1 0.025 s recalls= 0.4084 0.4105 0.4105
nprobe= 2 0.033 s recalls= 0.5235 0.5264 0.5264
nprobe= 4 0.033 s recalls= 0.6332 0.6367 0.6367
nprobe= 8 0.040 s recalls= 0.7358 0.7403 0.7403
nprobe= 16 0.049 s recalls= 0.8273 0.8324 0.8324
nprobe= 32 0.068 s recalls= 0.8957 0.9024 0.9024
nprobe= 64 0.104 s recalls= 0.9477 0.9549 0.9549
nprobe= 128 0.174 s recalls= 0.9760 0.9837 0.9837
nprobe= 256 0.299 s recalls= 0.9866 0.9944 0.9944
nprobe= 512 0.527 s recalls= 0.9907 0.9987 0.9987
```
### Clustering on MNIST8m
To get the "infinite MNIST dataset", follow the instructions on [Léon Bottou's website](http://leon.bottou.org/projects/infimnist). The script assumes the file `mnist8m-patterns-idx3-ubyte` is in subdirectory `mnist8m`
The script [`kmeans_mnist.py`](kmeans_mnist.py) produces the following output:
```
python kmeans_mnist.py 1 256
...
Clustering 8100000 points in 784D to 256 clusters, redo 1 times, 20 iterations
Preprocessing in 7.94526 s
Iteration 19 (131.697 s, search 114.78 s): objective=1.44881e+13 imbalance=1.05963 nsplit=0
final objective: 1.449e+13
total runtime: 140.615 s
```
### search on SIFT1B
The script [`bench_gpu_1bn.py`](bench_gpu_1bn.py) runs multi-gpu searches on the two 1-billion vector datasets we considered. It is more complex than the previous scripts, because it supports many search options and decomposes the dataset build process in Python to exploit the best possible CPU/GPU parallelism and GPU distribution.
Even on multiple GPUs, building the 1B datasets can last several hours. It is often a good idea to validate that everything is working fine on smaller datasets like SIFT1M, SIFT2M, etc.
The search results on SIFT1B in the "GPU paper" can be obtained with
<!-- see P57124181 -->
```
python bench_gpu_1bn.py SIFT1000M OPQ8_32,IVF262144,PQ8 -nnn 10 -ngpu 1 -tempmem $[1536*1024*1024]
...
0/10000 (0.024 s) probe=1 : 0.161 s 1-R@1: 0.0752 1-R@10: 0.1924
0/10000 (0.005 s) probe=2 : 0.150 s 1-R@1: 0.0964 1-R@10: 0.2693
0/10000 (0.005 s) probe=4 : 0.153 s 1-R@1: 0.1102 1-R@10: 0.3328
0/10000 (0.005 s) probe=8 : 0.170 s 1-R@1: 0.1220 1-R@10: 0.3827
0/10000 (0.005 s) probe=16 : 0.196 s 1-R@1: 0.1290 1-R@10: 0.4151
0/10000 (0.006 s) probe=32 : 0.244 s 1-R@1: 0.1314 1-R@10: 0.4345
0/10000 (0.006 s) probe=64 : 0.353 s 1-R@1: 0.1332 1-R@10: 0.4461
0/10000 (0.005 s) probe=128: 0.587 s 1-R@1: 0.1341 1-R@10: 0.4502
0/10000 (0.006 s) probe=256: 1.160 s 1-R@1: 0.1342 1-R@10: 0.4511
```
We use the `-tempmem` option to reduce the temporary memory allocation to 1.5G, otherwise the dataset does not fit in GPU memory
### search on Deep1B
The same script generates the GPU search results on Deep1B.
```
python bench_gpu_1bn.py Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -R 2 -ngpu 4 -altadd -noptables -tempmem $[1024*1024*1024]
...
0/10000 (0.115 s) probe=1 : 0.239 s 1-R@1: 0.2387 1-R@10: 0.3420
0/10000 (0.006 s) probe=2 : 0.103 s 1-R@1: 0.3110 1-R@10: 0.4623
0/10000 (0.005 s) probe=4 : 0.105 s 1-R@1: 0.3772 1-R@10: 0.5862
0/10000 (0.005 s) probe=8 : 0.116 s 1-R@1: 0.4235 1-R@10: 0.6889
0/10000 (0.005 s) probe=16 : 0.133 s 1-R@1: 0.4517 1-R@10: 0.7693
0/10000 (0.005 s) probe=32 : 0.168 s 1-R@1: 0.4713 1-R@10: 0.8281
0/10000 (0.005 s) probe=64 : 0.238 s 1-R@1: 0.4841 1-R@10: 0.8649
0/10000 (0.007 s) probe=128: 0.384 s 1-R@1: 0.4900 1-R@10: 0.8816
0/10000 (0.005 s) probe=256: 0.736 s 1-R@1: 0.4933 1-R@10: 0.8912
```
Here we are a bit tight on memory so we disable precomputed tables (`-noptables`) and restrict the amount of temporary memory. The `-altadd` option avoids GPU memory overflows during add.
### knn-graph on Deep1B
The same script generates the KNN-graph on Deep1B. Note that the inverted file from above will not be re-used because the training sets are different. For the knngraph, the script will first do a pass over the whole dataset to compute the ground-truth knn for a subset of 10k nodes, for evaluation.
```
python bench_gpu_1bn.py Deep1B OPQ20_80,IVF262144,PQ20 -nnn 10 -altadd -knngraph -R 2 -noptables -tempmem $[1<<30] -ngpu 4
...
CPU index contains 1000000000 vectors, move to GPU
Copy CPU index to 2 sharded GPU indexes
dispatch to GPUs 0:2
IndexShards shard 0 indices 0:500000000
IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 1 indices 500000000:1000000000
IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
dispatch to GPUs 2:4
IndexShards shard 0 indices 0:500000000
IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
IndexShards shard 1 indices 500000000:1000000000
IndexIVFPQ size 500000000 -> GpuIndexIVFPQ indicesOptions=0 usePrecomputed=0 useFloat16=0 reserveVecs=0
move to GPU done in 151.535 s
search...
999997440/1000000000 (8389.961 s, 0.3379) probe=1 : 8389.990 s rank-10 intersection results: 0.3379
999997440/1000000000 (9205.934 s, 0.4079) probe=2 : 9205.966 s rank-10 intersection results: 0.4079
999997440/1000000000 (9741.095 s, 0.4722) probe=4 : 9741.128 s rank-10 intersection results: 0.4722
999997440/1000000000 (10830.420 s, 0.5256) probe=8 : 10830.455 s rank-10 intersection results: 0.5256
999997440/1000000000 (12531.716 s, 0.5603) probe=16 : 12531.758 s rank-10 intersection results: 0.5603
999997440/1000000000 (15922.519 s, 0.5825) probe=32 : 15922.571 s rank-10 intersection results: 0.5825
999997440/1000000000 (22774.153 s, 0.5950) probe=64 : 22774.220 s rank-10 intersection results: 0.5950
999997440/1000000000 (36717.207 s, 0.6015) probe=128: 36717.309 s rank-10 intersection results: 0.6015
999997440/1000000000 (70616.392 s, 0.6047) probe=256: 70616.581 s rank-10 intersection results: 0.6047
```
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
#include <omp.h>
#include <cstdio>
#include <faiss/impl/ScalarQuantizer.h>
#include <faiss/utils/distances.h>
#include <faiss/utils/random.h>
#include <faiss/utils/utils.h>
using namespace faiss;
int main() {
int d = 128;
int n = 2000;
std::vector<float> x(d * n);
float_rand(x.data(), d * n, 12345);
// make sure it's idempotent
ScalarQuantizer sq(d, ScalarQuantizer::QT_6bit);
omp_set_num_threads(1);
sq.train(n, x.data());
size_t code_size = sq.code_size;
printf("code size: %ld\n", sq.code_size);
// encode
std::vector<uint8_t> codes(code_size * n);
sq.compute_codes(x.data(), codes.data(), n);
// decode
std::vector<float> x2(d * n);
sq.decode(codes.data(), x2.data(), n);
printf("sqL2 recons error: %g\n",
fvec_L2sqr(x.data(), x2.data(), n * d) / n);
// encode again
std::vector<uint8_t> codes2(code_size * n);
sq.compute_codes(x2.data(), codes2.data(), n);
size_t ndiff = 0;
for (size_t i = 0; i < codes.size(); i++) {
if (codes[i] != codes2[i])
ndiff++;
}
printf("ndiff for idempotence: %ld / %ld\n", ndiff, codes.size());
std::unique_ptr<ScalarQuantizer::SQDistanceComputer> dc(
sq.get_distance_computer());
dc->codes = codes.data();
dc->code_size = sq.code_size;
printf("code size: %ld\n", dc->code_size);
double sum_dis = 0;
double t0 = getmillisecs();
for (int i = 0; i < n; i++) {
dc->set_query(&x[i * d]);
for (int j = 0; j < n; j++) {
sum_dis += (*dc)(j);
}
}
printf("distances computed in %.3f ms, checksum=%g\n",
getmillisecs() - t0,
sum_dis);
return 0;
}
# Benchmark of IVF variants
This is a benchmark of IVF index variants, looking at compression vs. speed vs. accuracy.
The results are in [this wiki chapter](https://github.com/facebookresearch/faiss/wiki/Indexing-1G-vectors)
The code is organized as:
- `datasets.py`: code to access the datafiles, compute the ground-truth and report accuracies
- `bench_all_ivf.py`: evaluate one type of inverted file
- `run_on_cluster_generic.bash`: call `bench_all_ivf.py` for all tested types of indices.
Since the number of experiments is quite large the script is structued so that the benchmark can be run on a cluster.
- `parse_bench_all_ivf.py`: make nice tradeoff plots from all the results.
The code depends on Faiss and can use 1 to 8 GPUs to do the k-means clustering for large vocabularies.
It was run in October 2018 for the results in the wiki.
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import os
import sys
import time
import pdb
import numpy as np
import faiss
import argparse
import datasets
from datasets import sanitize
######################################################
# Command-line parsing
######################################################
parser = argparse.ArgumentParser()
def aa(*args, **kwargs):
group.add_argument(*args, **kwargs)
group = parser.add_argument_group('dataset options')
aa('--db', default='deep1M', help='dataset')
aa('--compute_gt', default=False, action='store_true',
help='compute and store the groundtruth')
aa('--force_IP', default=False, action="store_true",
help='force IP search instead of L2')
group = parser.add_argument_group('index consturction')
aa('--indexkey', default='HNSW32', help='index_factory type')
aa('--maxtrain', default=256 * 256, type=int,
help='maximum number of training points (0 to set automatically)')
aa('--indexfile', default='', help='file to read or write index from')
aa('--add_bs', default=-1, type=int,
help='add elements index by batches of this size')
group = parser.add_argument_group('IVF options')
aa('--by_residual', default=-1, type=int,
help="set if index should use residuals (default=unchanged)")
aa('--no_precomputed_tables', action='store_true', default=False,
help='disable precomputed tables (uses less memory)')
aa('--get_centroids_from', default='',
help='get the centroids from this index (to speed up training)')
aa('--clustering_niter', default=-1, type=int,
help='number of clustering iterations (-1 = leave default)')
aa('--train_on_gpu', default=False, action='store_true',
help='do training on GPU')
group = parser.add_argument_group('index-specific options')
aa('--M0', default=-1, type=int, help='size of base level for HNSW')
aa('--RQ_train_default', default=False, action="store_true",
help='disable progressive dim training for RQ')
aa('--RQ_beam_size', default=-1, type=int,
help='set beam size at add time')
aa('--LSQ_encode_ils_iters', default=-1, type=int,
help='ILS iterations for LSQ')
aa('--RQ_use_beam_LUT', default=-1, type=int,
help='use beam LUT at add time')
group = parser.add_argument_group('searching')
aa('--k', default=100, type=int, help='nb of nearest neighbors')
aa('--inter', default=False, action='store_true',
help='use intersection measure instead of 1-recall as metric')
aa('--searchthreads', default=-1, type=int,
help='nb of threads to use at search time')
aa('--searchparams', nargs='+', default=['autotune'],
help="search parameters to use (can be autotune or a list of params)")
aa('--n_autotune', default=500, type=int,
help="max nb of autotune experiments")
aa('--autotune_max', default=[], nargs='*',
help='set max value for autotune variables format "var:val" (exclusive)')
aa('--autotune_range', default=[], nargs='*',
help='set complete autotune range, format "var:val1,val2,..."')
aa('--min_test_duration', default=3.0, type=float,
help='run test at least for so long to avoid jitter')
args = parser.parse_args()
print("args:", args)
os.system('echo -n "nb processors "; '
'cat /proc/cpuinfo | grep ^processor | wc -l; '
'cat /proc/cpuinfo | grep ^"model name" | tail -1')
######################################################
# Load dataset
######################################################
ds = datasets.load_dataset(
dataset=args.db, compute_gt=args.compute_gt)
if args.force_IP:
ds.metric = "IP"
print(ds)
nq, d = ds.nq, ds.d
nb, d = ds.nq, ds.d
######################################################
# Make index
######################################################
def unwind_index_ivf(index):
if isinstance(index, faiss.IndexPreTransform):
assert index.chain.size() == 1
vt = index.chain.at(0)
index_ivf, vt2 = unwind_index_ivf(faiss.downcast_index(index.index))
assert vt2 is None
return index_ivf, vt
if hasattr(faiss, "IndexRefine") and isinstance(index, faiss.IndexRefine):
return unwind_index_ivf(faiss.downcast_index(index.base_index))
if isinstance(index, faiss.IndexIVF):
return index, None
else:
return None, None
def apply_AQ_options(index, args):
# if not(
# isinstance(index, faiss.IndexAdditiveQuantize) or
# isinstance(index, faiss.IndexIVFAdditiveQuantizer)):
# return
if args.RQ_train_default:
print("set default training for RQ")
index.rq.train_type
index.rq.train_type = faiss.ResidualQuantizer.Train_default
if args.RQ_beam_size != -1:
print("set RQ beam size to", args.RQ_beam_size)
index.rq.max_beam_size
index.rq.max_beam_size = args.RQ_beam_size
if args.LSQ_encode_ils_iters != -1:
print("set LSQ ils iterations to", args.LSQ_encode_ils_iters)
index.lsq.encode_ils_iters
index.lsq.encode_ils_iters = args.LSQ_encode_ils_iters
if args.RQ_use_beam_LUT != -1:
print("set RQ beam LUT to", args.RQ_use_beam_LUT)
index.rq.use_beam_LUT
index.rq.use_beam_LUT = args.RQ_use_beam_LUT
if args.indexfile and os.path.exists(args.indexfile):
print("reading", args.indexfile)
index = faiss.read_index(args.indexfile)
index_ivf, vec_transform = unwind_index_ivf(index)
if vec_transform is None:
vec_transform = lambda x: x
else:
print("build index, key=", args.indexkey)
index = faiss.index_factory(
d, args.indexkey, faiss.METRIC_L2 if ds.metric == "L2" else
faiss.METRIC_INNER_PRODUCT
)
index_ivf, vec_transform = unwind_index_ivf(index)
if vec_transform is None:
vec_transform = lambda x: x
else:
vec_transform = faiss.downcast_VectorTransform(vec_transform)
if args.by_residual != -1:
by_residual = args.by_residual == 1
print("setting by_residual = ", by_residual)
index_ivf.by_residual # check if field exists
index_ivf.by_residual = by_residual
if index_ivf:
print("Update add-time parameters")
# adjust default parameters used at add time for quantizers
# because otherwise the assignment is inaccurate
quantizer = faiss.downcast_index(index_ivf.quantizer)
if isinstance(quantizer, faiss.IndexRefine):
print(" update quantizer k_factor=", quantizer.k_factor, end=" -> ")
quantizer.k_factor = 32 if index_ivf.nlist < 1e6 else 64
print(quantizer.k_factor)
base_index = faiss.downcast_index(quantizer.base_index)
if isinstance(base_index, faiss.IndexIVF):
print(" update quantizer nprobe=", base_index.nprobe, end=" -> ")
base_index.nprobe = (
16 if base_index.nlist < 1e5 else
32 if base_index.nlist < 4e6 else
64)
print(base_index.nprobe)
elif isinstance(quantizer, faiss.IndexHNSW):
print(" update quantizer efSearch=", quantizer.hnsw.efSearch, end=" -> ")
quantizer.hnsw.efSearch = 40 if index_ivf.nlist < 4e6 else 64
print(quantizer.hnsw.efSearch)
apply_AQ_options(index_ivf or index, args)
if index_ivf:
index_ivf.verbose = True
index_ivf.quantizer.verbose = True
index_ivf.cp.verbose = True
else:
index.verbose = True
maxtrain = args.maxtrain
if maxtrain == 0:
if 'IMI' in args.indexkey:
maxtrain = int(256 * 2 ** (np.log2(index_ivf.nlist) / 2))
elif index_ivf:
maxtrain = 50 * index_ivf.nlist
else:
# just guess...
maxtrain = 256 * 100
maxtrain = max(maxtrain, 256 * 100)
print("setting maxtrain to %d" % maxtrain)
try:
xt2 = ds.get_train(maxtrain=maxtrain)
except NotImplementedError:
print("No training set: training on database")
xt2 = ds.get_database()[:maxtrain]
print("train, size", xt2.shape)
assert np.all(np.isfinite(xt2))
if (isinstance(vec_transform, faiss.OPQMatrix) and
isinstance(index_ivf, faiss.IndexIVFPQFastScan)):
print(" Forcing OPQ training PQ to PQ4")
ref_pq = index_ivf.pq
training_pq = faiss.ProductQuantizer(
ref_pq.d, ref_pq.M, ref_pq.nbits
)
vec_transform.pq
vec_transform.pq = training_pq
if args.get_centroids_from == '':
if args.clustering_niter >= 0:
print(("setting nb of clustering iterations to %d" %
args.clustering_niter))
index_ivf.cp.niter = args.clustering_niter
if args.train_on_gpu:
print("add a training index on GPU")
train_index = faiss.index_cpu_to_all_gpus(
faiss.IndexFlatL2(index_ivf.d))
index_ivf.clustering_index = train_index
else:
print("Getting centroids from", args.get_centroids_from)
src_index = faiss.read_index(args.get_centroids_from)
src_quant = faiss.downcast_index(src_index.quantizer)
centroids = faiss.vector_to_array(src_quant.xb)
centroids = centroids.reshape(-1, d)
print(" centroid table shape", centroids.shape)
if isinstance(vec_transform, faiss.VectorTransform):
print(" training vector transform")
vec_transform.train(xt2)
print(" transform centroids")
centroids = vec_transform.apply_py(centroids)
if not index_ivf.quantizer.is_trained:
print(" training quantizer")
index_ivf.quantizer.train(centroids)
print(" add centroids to quantizer")
index_ivf.quantizer.add(centroids)
del src_index
t0 = time.time()
index.train(xt2)
print(" train in %.3f s" % (time.time() - t0))
print("adding")
t0 = time.time()
if args.add_bs == -1:
index.add(sanitize(ds.get_database()))
else:
i0 = 0
for xblock in ds.database_iterator(bs=args.add_bs):
i1 = i0 + len(xblock)
print(" adding %d:%d / %d [%.3f s, RSS %d kiB] " % (
i0, i1, ds.nb, time.time() - t0,
faiss.get_mem_usage_kb()))
index.add(xblock)
i0 = i1
print(" add in %.3f s" % (time.time() - t0))
if args.indexfile:
print("storing", args.indexfile)
faiss.write_index(index, args.indexfile)
if args.no_precomputed_tables:
if isinstance(index_ivf, faiss.IndexIVFPQ):
print("disabling precomputed table")
index_ivf.use_precomputed_table = -1
index_ivf.precomputed_table.clear()
if args.indexfile:
print("index size on disk: ", os.stat(args.indexfile).st_size)
if hasattr(index, "code_size"):
print("vector code_size", index.code_size)
if hasattr(index_ivf, "code_size"):
print("vector code_size (IVF)", index_ivf.code_size)
print("current RSS:", faiss.get_mem_usage_kb() * 1024)
precomputed_table_size = 0
if hasattr(index_ivf, 'precomputed_table'):
precomputed_table_size = index_ivf.precomputed_table.size() * 4
print("precomputed tables size:", precomputed_table_size)
#############################################################
# Index is ready
#############################################################
xq = sanitize(ds.get_queries())
gt = ds.get_groundtruth(k=args.k)
assert gt.shape[1] == args.k, pdb.set_trace()
if args.searchthreads != -1:
print("Setting nb of threads to", args.searchthreads)
faiss.omp_set_num_threads(args.searchthreads)
else:
print("nb search threads: ", faiss.omp_get_max_threads())
ps = faiss.ParameterSpace()
ps.initialize(index)
parametersets = args.searchparams
if args.inter:
header = (
'%-40s inter@%3d time(ms/q) nb distances #runs' %
("parameters", args.k)
)
else:
header = (
'%-40s R@1 R@10 R@100 time(ms/q) nb distances #runs' %
"parameters"
)
def compute_inter(a, b):
nq, rank = a.shape
ninter = sum(
np.intersect1d(a[i, :rank], b[i, :rank]).size
for i in range(nq)
)
return ninter / a.size
def eval_setting(index, xq, gt, k, inter, min_time):
nq = xq.shape[0]
ivf_stats = faiss.cvar.indexIVF_stats
ivf_stats.reset()
nrun = 0
t0 = time.time()
while True:
D, I = index.search(xq, k)
nrun += 1
t1 = time.time()
if t1 - t0 > min_time:
break
ms_per_query = ((t1 - t0) * 1000.0 / nq / nrun)
if inter:
rank = k
inter_measure = compute_inter(gt[:, :rank], I[:, :rank])
print("%.4f" % inter_measure, end=' ')
else:
for rank in 1, 10, 100:
n_ok = (I[:, :rank] == gt[:, :1]).sum()
print("%.4f" % (n_ok / float(nq)), end=' ')
print(" %9.5f " % ms_per_query, end=' ')
print("%12d " % (ivf_stats.ndis / nrun), end=' ')
print(nrun)
if parametersets == ['autotune']:
ps.n_experiments = args.n_autotune
ps.min_test_duration = args.min_test_duration
for kv in args.autotune_max:
k, vmax = kv.split(':')
vmax = float(vmax)
print("limiting %s to %g" % (k, vmax))
pr = ps.add_range(k)
values = faiss.vector_to_array(pr.values)
values = np.array([v for v in values if v < vmax])
faiss.copy_array_to_vector(values, pr.values)
for kv in args.autotune_range:
k, vals = kv.split(':')
vals = np.fromstring(vals, sep=',')
print("setting %s to %s" % (k, vals))
pr = ps.add_range(k)
faiss.copy_array_to_vector(vals, pr.values)
# setup the Criterion object
if args.inter:
print("Optimize for intersection @ ", args.k)
crit = faiss.IntersectionCriterion(nq, args.k)
else:
print("Optimize for 1-recall @ 1")
crit = faiss.OneRecallAtRCriterion(nq, 1)
# by default, the criterion will request only 1 NN
crit.nnn = args.k
crit.set_groundtruth(None, gt.astype('int64'))
# then we let Faiss find the optimal parameters by itself
print("exploring operating points, %d threads" % faiss.omp_get_max_threads());
ps.display()
t0 = time.time()
op = ps.explore(index, xq, crit)
print("Done in %.3f s, available OPs:" % (time.time() - t0))
op.display()
print("Re-running evaluation on selected OPs")
print(header)
opv = op.optimal_pts
maxw = max(max(len(opv.at(i).key) for i in range(opv.size())), 40)
for i in range(opv.size()):
opt = opv.at(i)
ps.set_index_parameters(index, opt.key)
print(opt.key.ljust(maxw), end=' ')
sys.stdout.flush()
eval_setting(index, xq, gt, args.k, args.inter, args.min_test_duration)
else:
print(header)
for param in parametersets:
print("%-40s " % param, end=' ')
sys.stdout.flush()
ps.set_index_parameters(index, param)
eval_setting(index, xq, gt, args.k, args.inter, args.min_test_duration)
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import os
import numpy as np
import faiss
import argparse
import datasets
from datasets import sanitize
######################################################
# Command-line parsing
######################################################
parser = argparse.ArgumentParser()
def aa(*args, **kwargs):
group.add_argument(*args, **kwargs)
group = parser.add_argument_group('dataset options')
aa('--db', default='deep1M', help='dataset')
aa('--nt', default=65536, type=int)
aa('--nb', default=100000, type=int)
aa('--nt_sample', default=0, type=int)
group = parser.add_argument_group('kmeans options')
aa('--k', default=256, type=int)
aa('--seed', default=12345, type=int)
aa('--pcadim', default=-1, type=int, help='PCA to this dimension')
aa('--niter', default=25, type=int)
aa('--eval_freq', default=100, type=int)
args = parser.parse_args()
print("args:", args)
os.system('echo -n "nb processors "; '
'cat /proc/cpuinfo | grep ^processor | wc -l; '
'cat /proc/cpuinfo | grep ^"model name" | tail -1')
ngpu = faiss.get_num_gpus()
print("nb GPUs:", ngpu)
######################################################
# Load dataset
######################################################
xt, xb, xq, gt = datasets.load_data(dataset=args.db)
if args.nt_sample == 0:
xt_pca = xt[args.nt:args.nt + 10000]
xt = xt[:args.nt]
else:
xt_pca = xt[args.nt_sample:args.nt_sample + 10000]
rs = np.random.RandomState(args.seed)
idx = rs.choice(args.nt_sample, size=args.nt, replace=False)
xt = xt[idx]
xb = xb[:args.nb]
d = xb.shape[1]
if args.pcadim != -1:
print("training PCA: %d -> %d" % (d, args.pcadim))
pca = faiss.PCAMatrix(d, args.pcadim)
pca.train(sanitize(xt_pca))
xt = pca.apply_py(sanitize(xt))
xb = pca.apply_py(sanitize(xb))
d = xb.shape[1]
######################################################
# Run clustering
######################################################
index = faiss.IndexFlatL2(d)
if ngpu > 0:
print("moving index to GPU")
index = faiss.index_cpu_to_all_gpus(index)
clustering = faiss.Clustering(d, args.k)
clustering.verbose = True
clustering.seed = args.seed
clustering.max_points_per_centroid = 10**6
clustering.min_points_per_centroid = 1
centroids = None
for iter0 in range(0, args.niter, args.eval_freq):
iter1 = min(args.niter, iter0 + args.eval_freq)
clustering.niter = iter1 - iter0
if iter0 > 0:
faiss.copy_array_to_vector(centroids.ravel(), clustering.centroids)
clustering.train(sanitize(xt), index)
index.reset()
centroids = faiss.vector_to_array(clustering.centroids).reshape(args.k, d)
index.add(centroids)
_, I = index.search(sanitize(xb), 1)
error = ((xb - centroids[I.ravel()]) ** 2).sum()
print("iter1=%d quantization error on test: %.4f" % (iter1, error))
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import time
import sys
import os
import argparse
import numpy as np
def eval_recalls(name, I, gt, times):
k = I.shape[1]
s = "%-40s recall" % name
nq = len(gt)
for rank in 1, 10, 100, 1000:
if rank > k:
break
recall = (I[:, :rank] == gt[:, :1]).sum() / nq
s += "@%d: %.4f " % (rank, recall)
s += "time: %.4f s (± %.4f)" % (np.mean(times), np.std(times))
print(s)
def eval_inters(name, I, gt, times):
k = I.shape[1]
s = "%-40s inter" % name
nq = len(gt)
for rank in 1, 10, 100, 1000:
if rank > k:
break
ninter = 0
for i in range(nq):
ninter += np.intersect1d(I[i, :rank], gt[i, :rank]).size
inter = ninter / (nq * rank)
s += "@%d: %.4f " % (rank, inter)
s += "time: %.4f s (± %.4f)" % (np.mean(times), np.std(times))
print(s)
def main():
parser = argparse.ArgumentParser()
def aa(*args, **kwargs):
group.add_argument(*args, **kwargs)
group = parser.add_argument_group('dataset options')
aa('--db', default='deep1M', help='dataset')
aa('--measure', default="1-recall",
help="perf measure to use: 1-recall or inter")
aa('--download', default=False, action="store_true")
aa('--lib', default='faiss', help='library to use (faiss or scann)')
aa('--thenscann', default=False, action="store_true")
aa('--base_dir', default='/checkpoint/matthijs/faiss_improvements/cmp_ivf_scan_2')
group = parser.add_argument_group('searching')
aa('--k', default=10, type=int, help='nb of nearest neighbors')
aa('--pre_reorder_k', default="0,10,100,1000", help='values for reorder_k')
aa('--nprobe', default="1,2,5,10,20,50,100,200", help='values for nprobe')
aa('--nrun', default=5, type=int, help='nb of runs to perform')
args = parser.parse_args()
print("args:", args)
pre_reorder_k_tab = [int(x) for x in args.pre_reorder_k.split(',')]
nprobe_tab = [int(x) for x in args.nprobe.split(',')]
os.system('echo -n "nb processors "; '
'cat /proc/cpuinfo | grep ^processor | wc -l; '
'cat /proc/cpuinfo | grep ^"model name" | tail -1')
cache_dir = args.base_dir + "/" + args.db + "/"
k = args.k
nrun = args.nrun
if args.lib == "faiss":
# prepare cache
import faiss
from datasets import load_dataset
ds = load_dataset(args.db, download=args.download)
print(ds)
if not os.path.exists(cache_dir + "xb.npy"):
# store for SCANN
os.system(f"rm -rf {cache_dir}; mkdir -p {cache_dir}")
tosave = dict(
# xt = ds.get_train(10),
xb = ds.get_database(),
xq = ds.get_queries(),
gt = ds.get_groundtruth()
)
for name, v in tosave.items():
fname = cache_dir + "/" + name + ".npy"
print("save", fname)
np.save(fname, v)
open(cache_dir + "metric", "w").write(ds.metric)
name1_to_metric = {
"IP": faiss.METRIC_INNER_PRODUCT,
"L2": faiss.METRIC_L2
}
index_fname = cache_dir + "index.faiss"
if not os.path.exists(index_fname):
index = faiss_make_index(
ds.get_database(), name1_to_metric[ds.metric], index_fname)
else:
index = faiss.read_index(index_fname)
xb = ds.get_database()
xq = ds.get_queries()
gt = ds.get_groundtruth()
faiss_eval_search(
index, xq, xb, nprobe_tab, pre_reorder_k_tab, k, gt,
nrun, args.measure
)
if args.lib == "scann":
from scann.scann_ops.py import scann_ops_pybind
dataset = {}
for kn in "xb xq gt".split():
fname = cache_dir + "/" + kn + ".npy"
print("load", fname)
dataset[kn] = np.load(fname)
name1_to_name2 = {
"IP": "dot_product",
"L2": "squared_l2"
}
distance_measure = name1_to_name2[open(cache_dir + "metric").read()]
xb = dataset["xb"]
xq = dataset["xq"]
gt = dataset["gt"]
scann_dir = cache_dir + "/scann1.1.1_serialized"
if os.path.exists(scann_dir + "/scann_config.pb"):
searcher = scann_ops_pybind.load_searcher(scann_dir)
else:
searcher = scann_make_index(xb, distance_measure, scann_dir, 0)
scann_dir = cache_dir + "/scann1.1.1_serialized_reorder"
if os.path.exists(scann_dir + "/scann_config.pb"):
searcher_reo = scann_ops_pybind.load_searcher(scann_dir)
else:
searcher_reo = scann_make_index(xb, distance_measure, scann_dir, 100)
scann_eval_search(
searcher, searcher_reo,
xq, xb, nprobe_tab, pre_reorder_k_tab, k, gt,
nrun, args.measure
)
if args.lib != "scann" and args.thenscann:
# just append --lib scann, that will override the previous cmdline
# options
cmdline = " ".join(sys.argv) + " --lib scann"
cmdline = (
". ~/anaconda3/etc/profile.d/conda.sh ; " +
"conda activate scann_1.1.1; "
"python -u " + cmdline)
print("running", cmdline)
os.system(cmdline)
###############################################################
# SCANN
###############################################################
def scann_make_index(xb, distance_measure, scann_dir, reorder_k):
import scann
print("build index")
if distance_measure == "dot_product":
thr = 0.2
else:
thr = 0
k = 10
sb = scann.scann_ops_pybind.builder(xb, k, distance_measure)
sb = sb.tree(num_leaves=2000, num_leaves_to_search=100, training_sample_size=250000)
sb = sb.score_ah(2, anisotropic_quantization_threshold=thr)
if reorder_k > 0:
sb = sb.reorder(reorder_k)
searcher = sb.build()
print("done")
print("write index to", scann_dir)
os.system(f"rm -rf {scann_dir}; mkdir -p {scann_dir}")
# os.mkdir(scann_dir)
searcher.serialize(scann_dir)
return searcher
def scann_eval_search(
searcher, searcher_reo,
xq, xb, nprobe_tab, pre_reorder_k_tab, k, gt,
nrun, measure):
# warmup
for _run in range(5):
searcher.search_batched(xq)
for nprobe in nprobe_tab:
for pre_reorder_k in pre_reorder_k_tab:
times = []
for _run in range(nrun):
if pre_reorder_k == 0:
t0 = time.time()
I, D = searcher.search_batched(
xq, leaves_to_search=nprobe, final_num_neighbors=k
)
t1 = time.time()
else:
t0 = time.time()
I, D = searcher_reo.search_batched(
xq, leaves_to_search=nprobe, final_num_neighbors=k,
pre_reorder_num_neighbors=pre_reorder_k
)
t1 = time.time()
times.append(t1 - t0)
header = "SCANN nprobe=%4d reo=%4d" % (nprobe, pre_reorder_k)
if measure == "1-recall":
eval_recalls(header, I, gt, times)
else:
eval_inters(header, I, gt, times)
###############################################################
# Faiss
###############################################################
def faiss_make_index(xb, metric_type, fname):
import faiss
d = xb.shape[1]
M = d // 2
index = faiss.index_factory(d, f"IVF2000,PQ{M}x4fs", metric_type)
# if not by_residual:
# print("setting no residual")
# index.by_residual = False
print("train")
# index.train(ds.get_train())
index.train(xb[:250000])
print("add")
index.add(xb)
print("write index", fname)
faiss.write_index(index, fname)
return index
def faiss_eval_search(
index, xq, xb, nprobe_tab, pre_reorder_k_tab,
k, gt, nrun, measure
):
import faiss
print("use precomputed table=", index.use_precomputed_table,
"by residual=", index.by_residual)
print("adding a refine index")
index_refine = faiss.IndexRefineFlat(index, faiss.swig_ptr(xb))
print("set single thread")
faiss.omp_set_num_threads(1)
print("warmup")
for _run in range(5):
index.search(xq, k)
print("run timing")
for nprobe in nprobe_tab:
for pre_reorder_k in pre_reorder_k_tab:
index.nprobe = nprobe
times = []
for _run in range(nrun):
if pre_reorder_k == 0:
t0 = time.time()
D, I = index.search(xq, k)
t1 = time.time()
else:
index_refine.k_factor = pre_reorder_k / k
t0 = time.time()
D, I = index_refine.search(xq, k)
t1 = time.time()
times.append(t1 - t0)
header = "Faiss nprobe=%4d reo=%4d" % (nprobe, pre_reorder_k)
if measure == "1-recall":
eval_recalls(header, I, gt, times)
else:
eval_inters(header, I, gt, times)
if __name__ == "__main__":
main()
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
"""
Common functions to load datasets and compute their ground-truth
"""
import time
import numpy as np
import faiss
from faiss.contrib import datasets as faiss_datasets
print("path:", faiss_datasets.__file__)
faiss_datasets.dataset_basedir = '/checkpoint/matthijs/simsearch/'
def sanitize(x):
return np.ascontiguousarray(x, dtype='float32')
#################################################################
# Dataset
#################################################################
class DatasetCentroids(faiss_datasets.Dataset):
def __init__(self, ds, indexfile):
self.d = ds.d
self.metric = ds.metric
self.nq = ds.nq
self.xq = ds.get_queries()
# get the xb set
src_index = faiss.read_index(indexfile)
src_quant = faiss.downcast_index(src_index.quantizer)
centroids = faiss.vector_to_array(src_quant.xb)
self.xb = centroids.reshape(-1, self.d)
self.nb = self.nt = len(self.xb)
def get_queries(self):
return self.xq
def get_database(self):
return self.xb
def get_train(self, maxtrain=None):
return self.xb
def get_groundtruth(self, k=100):
return faiss.knn(
self.xq, self.xb, k,
faiss.METRIC_L2 if self.metric == 'L2' else faiss.METRIC_INNER_PRODUCT
)[1]
def load_dataset(dataset='deep1M', compute_gt=False, download=False):
print("load data", dataset)
if dataset == 'sift1M':
return faiss_datasets.DatasetSIFT1M()
elif dataset.startswith('bigann'):
dbsize = 1000 if dataset == "bigann1B" else int(dataset[6:-1])
return faiss_datasets.DatasetBigANN(nb_M=dbsize)
elif dataset.startswith("deep_centroids_"):
ncent = int(dataset[len("deep_centroids_"):])
centdir = "/checkpoint/matthijs/bench_all_ivf/precomputed_clusters"
return DatasetCentroids(
faiss_datasets.DatasetDeep1B(nb=1000000),
f"{centdir}/clustering.dbdeep1M.IVF{ncent}.faissindex"
)
elif dataset.startswith("deep"):
szsuf = dataset[4:]
if szsuf[-1] == 'M':
dbsize = 10 ** 6 * int(szsuf[:-1])
elif szsuf == '1B':
dbsize = 10 ** 9
elif szsuf[-1] == 'k':
dbsize = 1000 * int(szsuf[:-1])
else:
assert False, "did not recognize suffix " + szsuf
return faiss_datasets.DatasetDeep1B(nb=dbsize)
elif dataset == "music-100":
return faiss_datasets.DatasetMusic100()
elif dataset == "glove":
return faiss_datasets.DatasetGlove(download=download)
else:
assert False
#################################################################
# Evaluation
#################################################################
def evaluate_DI(D, I, gt):
nq = gt.shape[0]
k = I.shape[1]
rank = 1
while rank <= k:
recall = (I[:, :rank] == gt[:, :1]).sum() / float(nq)
print("R@%d: %.4f" % (rank, recall), end=' ')
rank *= 10
def evaluate(xq, gt, index, k=100, endl=True):
t0 = time.time()
D, I = index.search(xq, k)
t1 = time.time()
nq = xq.shape[0]
print("\t %8.4f ms per query, " % (
(t1 - t0) * 1000.0 / nq), end=' ')
rank = 1
while rank <= k:
recall = (I[:, :rank] == gt[:, :1]).sum() / float(nq)
print("R@%d: %.4f" % (rank, recall), end=' ')
rank *= 10
if endl:
print()
return D, I
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import logging
# https://stackoverflow.com/questions/7016056/python-logging-not-outputting-anything
logging.basicConfig()
logger = logging.getLogger('faiss.contrib.exhaustive_search')
logger.setLevel(logging.INFO)
from faiss.contrib import datasets
from faiss.contrib.exhaustive_search import knn_ground_truth
from faiss.contrib import vecs_io
ds = datasets.DatasetDeep1B(nb=int(1e9))
print("computing GT matches for", ds)
D, I = knn_ground_truth(
ds.get_queries(),
ds.database_iterator(bs=65536),
k=100
)
vecs_io.ivecs_write("/tmp/tt.ivecs", I)
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import os
import numpy as np
from collections import defaultdict
from matplotlib import pyplot
import re
from argparse import Namespace
from faiss.contrib.factory_tools import get_code_size as unitsize
def dbsize_from_name(dbname):
sufs = {
'1B': 10**9,
'100M': 10**8,
'10M': 10**7,
'1M': 10**6,
}
for s in sufs:
if dbname.endswith(s):
return sufs[s]
else:
assert False
def keep_latest_stdout(fnames):
fnames = [fname for fname in fnames if fname.endswith('.stdout')]
fnames.sort()
n = len(fnames)
fnames2 = []
for i, fname in enumerate(fnames):
if i + 1 < n and fnames[i + 1][:-8] == fname[:-8]:
continue
fnames2.append(fname)
return fnames2
def parse_result_file(fname):
# print fname
st = 0
res = []
keys = []
stats = {}
stats['run_version'] = fname[-8]
indexkey = None
for l in open(fname):
if l.startswith("srun:"):
# looks like a crash...
if indexkey is None:
raise RuntimeError("instant crash")
break
elif st == 0:
if l.startswith("dataset in dimension"):
fi = l.split()
stats["d"] = int(fi[3][:-1])
stats["nq"] = int(fi[9])
stats["nb"] = int(fi[11])
stats["nt"] = int(fi[13])
if l.startswith('index size on disk:'):
stats['index_size'] = int(l.split()[-1])
if l.startswith('current RSS:'):
stats['RSS'] = int(l.split()[-1])
if l.startswith('precomputed tables size:'):
stats['tables_size'] = int(l.split()[-1])
if l.startswith('Setting nb of threads to'):
stats['n_threads'] = int(l.split()[-1])
if l.startswith(' add in'):
stats['add_time'] = float(l.split()[-2])
if l.startswith("vector code_size"):
stats['code_size'] = float(l.split()[-1])
if l.startswith('args:'):
args = eval(l[l.find(' '):])
indexkey = args.indexkey
elif "time(ms/q)" in l:
# result header
if 'R@1 R@10 R@100' in l:
stats["measure"] = "recall"
stats["ranks"] = [1, 10, 100]
elif 'I@1 I@10 I@100' in l:
stats["measure"] = "inter"
stats["ranks"] = [1, 10, 100]
elif 'inter@' in l:
stats["measure"] = "inter"
fi = l.split()
if fi[1] == "inter@":
rank = int(fi[2])
else:
rank = int(fi[1][len("inter@"):])
stats["ranks"] = [rank]
else:
assert False
st = 1
elif 'index size on disk:' in l:
stats["index_size"] = int(l.split()[-1])
elif st == 1:
st = 2
elif st == 2:
fi = l.split()
if l[0] == " ":
# means there are 0 parameters
fi = [""] + fi
keys.append(fi[0])
res.append([float(x) for x in fi[1:]])
return indexkey, np.array(res), keys, stats
# the directory used in run_on_cluster.bash
basedir = "/checkpoint/matthijs/bench_all_ivf/"
logdir = basedir + 'logs/'
def collect_results_for(db='deep1M', prefix="autotune."):
# run parsing
allres = {}
allstats = {}
missing = []
fnames = keep_latest_stdout(os.listdir(logdir))
# print fnames
# filenames are in the form <key>.x.stdout
# where x is a version number (from a to z)
# keep only latest version of each name
for fname in fnames:
if not (
'db' + db in fname and
fname.startswith(prefix) and
fname.endswith('.stdout')
):
continue
print("parse", fname, end=" ", flush=True)
try:
indexkey, res, _, stats = parse_result_file(logdir + fname)
except RuntimeError as e:
print("FAIL %s" % e)
res = np.zeros((2, 0))
except Exception as e:
print("PARSE ERROR " + e)
res = np.zeros((2, 0))
else:
print(len(res), "results")
if res.size == 0:
missing.append(fname)
else:
if indexkey in allres:
if allstats[indexkey]['run_version'] > stats['run_version']:
# don't use this run
continue
allres[indexkey] = res
allstats[indexkey] = stats
return allres, allstats
def extract_pareto_optimal(allres, keys, recall_idx=0, times_idx=3):
bigtab = []
for i, k in enumerate(keys):
v = allres[k]
perf = v[:, recall_idx]
times = v[:, times_idx]
bigtab.append(
np.vstack((
np.ones(times.size) * i,
perf, times
))
)
if bigtab == []:
return [], np.zeros((3, 0))
bigtab = np.hstack(bigtab)
# sort by perf
perm = np.argsort(bigtab[1, :])
bigtab_sorted = bigtab[:, perm]
best_times = np.minimum.accumulate(bigtab_sorted[2, ::-1])[::-1]
selection, = np.where(bigtab_sorted[2, :] == best_times)
selected_keys = [
keys[i] for i in
np.unique(bigtab_sorted[0, selection].astype(int))
]
ops = bigtab_sorted[:, selection]
return selected_keys, ops
def plot_subset(
allres, allstats, selected_methods, recall_idx, times_idx=3,
report=["overhead", "build time"]):
# important methods
for k in selected_methods:
v = allres[k]
stats = allstats[k]
d = stats["d"]
dbsize = stats["nb"]
if "index_size" in stats and "tables_size" in stats:
tot_size = stats['index_size'] + stats['tables_size']
else:
tot_size = -1
id_size = 8 # 64 bit
addt = ''
if 'add_time' in stats:
add_time = stats['add_time']
if add_time > 7200:
add_min = add_time / 60
addt = ', %dh%02d' % (add_min / 60, add_min % 60)
else:
add_sec = int(add_time)
addt = ', %dm%02d' % (add_sec / 60, add_sec % 60)
code_size = unitsize(d, k)
label = k
if "code_size" in report:
label += " %d bytes" % code_size
tight_size = (code_size + id_size) * dbsize
if tot_size < 0 or "overhead" not in report:
pass # don't know what the index size is
elif tot_size > 10 * tight_size:
label += " overhead x%.1f" % (tot_size / tight_size)
else:
label += " overhead+%.1f%%" % (
tot_size / tight_size * 100 - 100)
if "build time" in report:
label += " " + addt
linestyle = (':' if 'Refine' in k or 'RFlat' in k else
'-.' if 'SQ' in k else
'-' if '4fs' in k else
'-')
print(k, linestyle)
pyplot.semilogy(v[:, recall_idx], 1000 / v[:, times_idx], label=label,
linestyle=linestyle,
marker='o' if '4fs' in k else '+')
recall_rank = stats["ranks"][recall_idx]
if stats["measure"] == "recall":
pyplot.xlabel('1-recall at %d' % recall_rank)
elif stats["measure"] == "inter":
pyplot.xlabel('inter @ %d' % recall_rank)
else:
assert False
pyplot.ylabel('QPS (%d threads)' % stats["n_threads"])
def plot_tradeoffs(db, allres, allstats, code_size, recall_rank):
stat0 = next(iter(allstats.values()))
d = stat0["d"]
n_threads = stat0["n_threads"]
recall_idx = stat0["ranks"].index(recall_rank)
# times come after the perf measure
times_idx = len(stat0["ranks"])
if type(code_size) == int:
if code_size == 0:
code_size = [0, 1e50]
code_size_name = "any code size"
else:
code_size_name = "code_size=%d" % code_size
code_size = [code_size, code_size]
elif type(code_size) == tuple:
code_size_name = "code_size in [%d, %d]" % code_size
else:
assert False
names_maxperf = []
for k in sorted(allres):
v = allres[k]
if v.ndim != 2: continue
us = unitsize(d, k)
if not code_size[0] <= us <= code_size[1]: continue
names_maxperf.append((v[-1, recall_idx], k))
# sort from lowest to highest topline accuracy
names_maxperf.sort()
names = [name for mp, name in names_maxperf]
selected_methods, optimal_points = \
extract_pareto_optimal(allres, names, recall_idx, times_idx)
not_selected = list(set(names) - set(selected_methods))
print("methods without an optimal OP: ", not_selected)
pyplot.title('database ' + db + ' ' + code_size_name)
# grayed out lines
for k in not_selected:
v = allres[k]
if v.ndim != 2: continue
us = unitsize(d, k)
if not code_size[0] <= us <= code_size[1]: continue
linestyle = (':' if 'PQ' in k else
'-.' if 'SQ4' in k else
'--' if 'SQ8' in k else '-')
pyplot.semilogy(v[:, recall_idx], 1000 / v[:, times_idx], label=None,
linestyle=linestyle,
marker='o' if 'HNSW' in k else '+',
color='#cccccc', linewidth=0.2)
plot_subset(allres, allstats, selected_methods, recall_idx, times_idx)
if len(not_selected) == 0:
om = ''
else:
om = '\nomitted:'
nc = len(om)
for m in not_selected:
if nc > 80:
om += '\n'
nc = 0
om += ' ' + m
nc += len(m) + 1
# pyplot.semilogy(optimal_points[1, :], optimal_points[2, :], marker="s")
# print(optimal_points[0, :])
pyplot.xlabel('1-recall at %d %s' % (recall_rank, om) )
pyplot.ylabel('QPS (%d threads)' % n_threads)
pyplot.legend()
pyplot.grid()
return selected_methods, not_selected
if __name__ == "__main__xx":
# tests on centroids indexing (v1)
for k in 1, 32, 128:
pyplot.gcf().set_size_inches(15, 10)
i = 1
for ncent in 65536, 262144, 1048576, 4194304:
db = f'deep_centroids_{ncent}.k{k}.'
allres, allstats = collect_results_for(
db=db, prefix="cent_index.")
pyplot.subplot(2, 2, i)
plot_subset(
allres, allstats, list(allres.keys()),
recall_idx=0,
times_idx=1,
report=["code_size"]
)
i += 1
pyplot.title(f"{ncent} centroids")
pyplot.legend()
pyplot.xlim([0.95, 1])
pyplot.grid()
pyplot.savefig('figs/deep1B_centroids_k%d.png' % k)
if __name__ == "__main__xx":
# centroids plot per k
pyplot.gcf().set_size_inches(15, 10)
i=1
for ncent in 65536, 262144, 1048576, 4194304:
xyd = defaultdict(list)
for k in 1, 4, 8, 16, 32, 64, 128, 256:
db = f'deep_centroids_{ncent}.k{k}.'
allres, allstats = collect_results_for(db=db, prefix="cent_index.")
for indexkey, res in allres.items():
idx, = np.where(res[:, 0] >= 0.99)
if idx.size > 0:
xyd[indexkey].append((k, 1000 / res[idx[0], 1]))
pyplot.subplot(2, 2, i)
i += 1
for indexkey, xy in xyd.items():
xy = np.array(xy)
pyplot.loglog(xy[:, 0], xy[:, 1], 'o-', label=indexkey)
pyplot.title(f"{ncent} centroids")
pyplot.xlabel("k")
xt = 2**np.arange(9)
pyplot.xticks(xt, ["%d" % x for x in xt])
pyplot.ylabel("QPS (32 threads)")
pyplot.legend()
pyplot.grid()
pyplot.savefig('../plots/deep1B_centroids_min99.png')
if __name__ == "__main__xx":
# main indexing plots
i = 0
for db in 'bigann10M', 'deep10M', 'bigann100M', 'deep100M', 'deep1B', 'bigann1B':
allres, allstats = collect_results_for(
db=db, prefix="autotune.")
for cs in 8, 16, 32, 64:
pyplot.figure(i)
i += 1
pyplot.gcf().set_size_inches(15, 10)
cs_range = (
(0, 8) if cs == 8 else (cs // 2 + 1, cs)
)
plot_tradeoffs(
db, allres, allstats, code_size=cs_range, recall_rank=1)
pyplot.savefig('../plots/tradeoffs_%s_cs%d_r1.png' % (
db, cs))
if __name__ == "__main__":
# 1M indexes
i = 0
for db in "glove", "music-100":
pyplot.figure(i)
pyplot.gcf().set_size_inches(15, 10)
i += 1
allres, allstats = collect_results_for(db=db, prefix="autotune.")
plot_tradeoffs(db, allres, allstats, code_size=0, recall_rank=1)
pyplot.savefig('../plots/1M_tradeoffs_' + db + ".png")
for db in "sift1M", "deep1M":
allres, allstats = collect_results_for(db=db, prefix="autotune.")
pyplot.figure(i)
pyplot.gcf().set_size_inches(15, 10)
i += 1
plot_tradeoffs(db, allres, allstats, code_size=(0, 64), recall_rank=1)
pyplot.savefig('../plots/1M_tradeoffs_' + db + "_small.png")
pyplot.figure(i)
pyplot.gcf().set_size_inches(15, 10)
i += 1
plot_tradeoffs(db, allres, allstats, code_size=(65, 10000), recall_rank=1)
pyplot.savefig('../plots/1M_tradeoffs_' + db + "_large.png")
if __name__ == "__main__xx":
db = 'sift1M'
allres, allstats = collect_results_for(db=db, prefix="autotune.")
pyplot.gcf().set_size_inches(15, 10)
keys = [
"IVF1024,PQ32x8",
"IVF1024,PQ64x4",
"IVF1024,PQ64x4fs",
"IVF1024,PQ64x4fsr",
"IVF1024,SQ4",
"IVF1024,SQ8"
]
plot_subset(allres, allstats, keys, recall_idx=0, report=["code_size"])
pyplot.legend()
pyplot.title(db)
pyplot.xlabel("1-recall@1")
pyplot.ylabel("QPS (32 threads)")
pyplot.grid()
pyplot.savefig('../plots/ivf1024_variants.png')
pyplot.figure(2)
pyplot.gcf().set_size_inches(15, 10)
keys = [
"HNSW32",
"IVF1024,PQ64x4fs",
"IVF1024,PQ64x4fsr",
"IVF1024,PQ64x4fs,RFlat",
"IVF1024,PQ64x4fs,Refine(SQfp16)",
"IVF1024,PQ64x4fs,Refine(SQ8)",
]
plot_subset(allres, allstats, keys, recall_idx=0, report=["code_size"])
pyplot.legend()
pyplot.title(db)
pyplot.xlabel("1-recall@1")
pyplot.ylabel("QPS (32 threads)")
pyplot.grid()
pyplot.savefig('../plots/ivf1024_rerank.png')
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment