Currently we compute similarity in the API by doing the current process:
- given the mbid/submission, get the lowlevel id
- given the lowlevel id, look up the vector of this submission in the annoy index
- use the vector to look up similar items
This means that it fails at the 2nd step if the submission isn't available in the annoy index. However, we can compute the vector directly from the lowlevel data, skipping this step allowing us to look up a lowlevel submission directly.
There are two options here for making sure that the API always works:
- First try and look up vector in the annoy index, and fall back to computing from lowlevel if it's not found
- Keep a high-water mark in redis indicating the max ll id and use that to decide if we do annoy or lowlevel
- always compute the vector from the lowlevel data (this includes 1 more database lookup on every API lookup, so may be an additional load depending on API usage)
in addition to the lookup, if any error occurs when accessing this API endpoint the frontend should also show a useful error message.