-
Improvement
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
When upgrading to solr 9, we found that negative boosts don't work. The instrument core uses this uses this to move search results further down the list in the case that the instrument is a "Family" - e.g. "trumpet family" vs "trumpet".
We tried to apply a boost to all non-matching items, e.g. -type:Family^5 but this has the result of setting the score of this item to 0 and omitting it from the search results, so we can't use this technique.
We tried to use a boost bq between 0 and 1 (e.g. 0.1 or 0.01), but the solr documentation indicates that this is additive, so a score is always going to be (query score + boost score). To do a multiplicative boost (where you can factor a score by a value), you must use the {{
{!boost}}} query parser or the boost parameter to edismax. We don't want to switch directly to edismax at the moment.
We were able to simulate a low boost by sorting by abs(sub(1,termfreq(type,'Family'))) and then by score (this is 0 if 'Family' is in the type field, 1 otherwise). But this makes the results still look weird. E.g. "baritone" has the word "trumpet" in its description, so it has some score (we weight on the description field), but the result "trumpet family".
At the moment, the we just removed this boost flag and the results still look OK, but we should see if there is a way of moving the search results for Family types a bit further.
We discussed this in IRC. the area core has an artist_count field that is computed in sir: https://github.com/metabrainz/sir/blob/47c50f422b5681c1169a60b78bf3ebbd1d4138db/sir/schema/modelext.py#L42
We would compute something similar for instruments, reosarevok suggests
SELECT i.name, count
, i.gid
DESC;
FROM instrument i
JOIN link_attribute_type lat ON i.gid = lat.gid
JOIN link_attribute la ON la.attribute_type = lat.id
GROUP BY i.gid, i.name
ORDER BY count
3:39 PM <reosarevok> That's not exactly the same as how many rels use the instrument (link can be reused) but I expect it's close enough and a much simpler query. bitmap: does that seem legit for instrument popularity?
3:40 PM <reosarevok> The first family is guitar family | 1470 | f68936f2-194c-4bcd-94a9-81e1dd947b8d - much lower than most important guitars
3:42 PM <reosarevok> I think so
3:42 PM <reosarevok> I see we already use rels elsewhere: recording_count = column_property(select(func.count([LinkRecordingWork.id).where(LinkRecordingWork.work_id == Work.id))
3:43 PM <reosarevok> Although that's for actual relationships rather than attributes (which is less relevant for instrument)
And this would allow us to continue using additive boosts, as all instruments would have a count which would allow us to break ties when the score based on instrument name is the same.