Search notes:

Python library: ScaNN

import numpy as np
import scann
Github repository about-python, path: /libraries/ScaNN/example/import-libraries.py
We create 3 million vectors.
1 million is centered around (1, 0, 0), 1 million around (0, 1, 0) and 1 million around (0, 0, 1):
v1 = np.array([
        np.array([
           np.random.normal(1, 0.01),
           np.random.normal(0, 0.01),
           np.random.normal(0, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)

v2 = np.array([
        np.array([
           np.random.normal(0, 0.01),
           np.random.normal(1, 0.01),
           np.random.normal(0, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)

v3 = np.array([
        np.array([
           np.random.normal(0, 0.01),
           np.random.normal(0, 0.01),
           np.random.normal(1, 0.01)
        ])
        for _ in range(1000000)
     ]).astype(np.float32)
Github repository about-python, path: /libraries/ScaNN/example/vectors.py
We additionially create 10000 vectors that are centered around (0.5, 0.5, 0.5). The goal of this example is to find these vectors (which is why they're called needles here):
needles = np.array([
       np.array([
          np.random.normal(0.5, 0.01),
          np.random.normal(0.5, 0.01),
          np.random.normal(0.5, 0.01)
     ])
     for _ in range(10000)
   ]).astype(np.float32)
Github repository about-python, path: /libraries/ScaNN/example/needles.py
The vectors are combined and randomly shuffled:
data = np.concatenate( (v1, v2, v3, needles) )
np.random.shuffle(data)
Github repository about-python, path: /libraries/ScaNN/example/data.py
Creating a builder:
builder = scann.scann_ops_pybind.builder(
    data,
    num_neighbors    =  10,
    distance_measure = 'squared_l2'  # or, alternatively: 'dot_product'
)
Github repository about-python, path: /libraries/ScaNN/example/builder.py
builder = builder.tree(
    num_leaves           =   10000,
    num_leaves_to_search =   10000,
    training_sample_size = 1000000
)
Github repository about-python, path: /libraries/ScaNN/example/tree.py
builder = builder.score_ah(
    10000,
    anisotropic_quantization_threshold = 0.001
)
Github repository about-python, path: /libraries/ScaNN/example/score_ah.py
builder = builder.reorder(
    1000
)
Github repository about-python, path: /libraries/ScaNN/example/reorder.py
Creating a searcher:
searcher = builder.build()
Github repository about-python, path: /libraries/ScaNN/example/searcher.py
Executing the query:
query = np.array([ 0.5, 0.5, 0.5 ]).astype(np.float32)
neighbors, distances = searcher.search(query, final_num_neighbors=10)
Github repository about-python, path: /libraries/ScaNN/example/query.py
Printing the result:
for x in zip(neighbors,distances):
    print(x)
Github repository about-python, path: /libraries/ScaNN/example/result.py

Index