54 points | by Pringled2 days ago
I would actually perhaps think the next step would be to add some sugar that allows you to run a random / fixed grid of hyper-parameters and get a report of accuracy and speed for your specific data set.
Hybrid search is a really cool idea though; it's not something we support at the moment, but definitely something we could investigate and add as an upcoming feature, thanks for the suggestion!
1. When you say backends, do you plan to integrate like a client with some "vector" stores. 2. Also any benchmarks? 3. Lastly, why python?
2: we adopted the same methodology as ann-benchmarks for our evaluation, so technically the benchmarks there are valid for the backends we support. However it's a good suggestion to add those explicitly to the repo, I'll add a todo for that.
3: mainly because a: it's the language we are most the comfortable with developing in, b: it's the most widely used and adopted language for ML and c: (almost) all the algorithms we support are written in C/C++/Cython already.
So these are nearest neighbor search implementations, not database backends.