Uploaded image for project: 'MusicBrainz Search Server'
  1. MusicBrainz Search Server
  2. SEARCH-686

Rank instrument families lower in search results

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Schema
    • None

      When upgrading to solr 9, we found that negative boosts don't work. The instrument core uses this uses this to move search results further down the list in the case that the instrument is a "Family" - e.g. "trumpet family" vs "trumpet".

      We tried to apply a boost to all non-matching items, e.g. -type:Family^5 but this has the result of setting the score of this item to 0 and omitting it from the search results, so we can't use this technique.

      We tried to use a boost bq between 0 and 1 (e.g. 0.1 or 0.01), but the solr documentation indicates that this is additive, so a score is always going to be (query score + boost score). To do a multiplicative boost (where you can factor a score by a value), you must use the {{

      {!boost}

      }} query parser or the boost parameter to edismax. We don't want to switch directly to edismax at the moment.

      We were able to simulate a low boost by sorting by abs(sub(1,termfreq(type,'Family'))) and then by score (this is 0 if 'Family' is in the type field, 1 otherwise). But this makes the results still look weird. E.g. "baritone" has the word "trumpet" in its description, so it has some score (we weight on the description field), but the result "trumpet family".

      At the moment, the we just removed this boost flag and the results still look OK, but we should see if there is a way of moving the search results for Family types a bit further.

          [SEARCH-686] Rank instrument families lower in search results

          We discussed this in IRC. the area core has an artist_count field that is computed in sir: https://github.com/metabrainz/sir/blob/47c50f422b5681c1169a60b78bf3ebbd1d4138db/sir/schema/modelext.py#L42

          We would compute something similar for instruments, reosarevok suggests

          SELECT i.name, count, i.gid
          FROM instrument i
          JOIN link_attribute_type lat ON i.gid = lat.gid
          JOIN link_attribute la ON la.attribute_type = lat.id
          GROUP BY i.gid, i.name
          ORDER BY count DESC;

           
          3:39 PM <reosarevok> That's not exactly the same as how many rels use the instrument (link can be reused) but I expect it's close enough and a much simpler query. bitmap: does that seem legit for instrument popularity?
          3:40 PM <reosarevok> The first family is guitar family | 1470 | f68936f2-194c-4bcd-94a9-81e1dd947b8d - much lower than most important guitars
          3:42 PM <reosarevok> I think so
          3:42 PM <reosarevok> I see we already use rels elsewhere: recording_count = column_property(select(func.count([LinkRecordingWork.id).where(LinkRecordingWork.work_id == Work.id))
          3:43 PM <reosarevok> Although that's for actual relationships rather than attributes (which is less relevant for instrument)
           
          And this would allow us to continue using additive boosts, as all instruments would have a count which would allow us to break ties when the score based on instrument name is the same.

          Alastair Porter added a comment - We discussed this in IRC. the area core has an artist_count field that is computed in sir: https://github.com/metabrainz/sir/blob/47c50f422b5681c1169a60b78bf3ebbd1d4138db/sir/schema/modelext.py#L42 We would compute something similar for instruments, reosarevok suggests SELECT i.name, count , i.gid FROM instrument i JOIN link_attribute_type lat ON i.gid = lat.gid JOIN link_attribute la ON la.attribute_type = lat.id GROUP BY i.gid, i.name ORDER BY count DESC;   3:39 PM <reosarevok> That's not exactly the same as how many rels use the instrument (link can be reused) but I expect it's close enough and a much simpler query. bitmap: does that seem legit for instrument popularity? 3:40 PM <reosarevok> The first family is guitar family | 1470 | f68936f2-194c-4bcd-94a9-81e1dd947b8d - much lower than most important guitars 3:42 PM <reosarevok> I think so 3:42 PM <reosarevok> I see we already use rels elsewhere: recording_count = column_property(select( func.count([LinkRecordingWork.id ).where(LinkRecordingWork.work_id == Work.id )) 3:43 PM <reosarevok> Although that's for actual relationships rather than attributes (which is less relevant for instrument)   And this would allow us to continue using additive boosts, as all instruments would have a count which would allow us to break ties when the score based on instrument name is the same.

          Do we use popularity for instruments? If we could use popularity boosts based on how often they're used as attributes (rather than in relationships) then we could probably skip negative boosts for family anyway, since specific family members probably get used a lot more (or should be used more at least).

          Nicolás Tamargo added a comment - Do we use popularity for instruments? If we could use popularity boosts based on how often they're used as attributes (rather than in relationships) then we could probably skip negative boosts for family anyway, since specific family members probably get used a lot more (or should be used more at least).

            Unassigned Unassigned
            alastairp Alastair Porter
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Version Package