Uploaded image for project: 'MusicBrainz Search Server'
  1. MusicBrainz Search Server
  2. SEARCH-167

Artist search should deal better with artists being entered misspelt into basic artist search

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2012-03-23
    • Component/s: None
    • Labels:
      None

      Description

      Several search queries seem to return results that are not nearly as good as they could be. It seems to me that some of the techniques described below for improving search results should be applied by default.

      In particular:
      The results of searching for "rudy wiedoeft" does not include "rudy wiedoeft's Californians" or "rudy wiedoeft's palace trio" on the first page; instead they appear on page five. I would expect a single-character difference to appear much higher in the search results.

      Similarly, a search for "rudy wied" should return results for all the above, much closer to the top than page 5.

      And, "rudy green" does not return "rudy greene" anywhere near the top of the results, despite the one-character difference.

      These results can be improved by the following techniques:
      (advanced search):
      (rudy green) OR (rudy* green) OR (rudy green*)

      (rudy wied) OR (rudy* wied) OR (rudy wied*)

      "rudy* green*" gives great results compared to "rudy green"

      A simple spelling mistake can turn a great search result into a terrible one:
      Search for "go-cart mozart", expect to find go-kart mozart.

      results contain nothing useful

      Change it to
      (go cart mozart) OR (go* cart mozart) OR (go cart* mozart) or (go cart mozart*)
      and the results are great, suggesting that hyphens need to be considered word breaks.

      Even simply switching to advanced search on a total misspelling can improve things:

      (simple search): aaron lebedeef does not include the desired artist[1] anywhere near the top of the results (not even on the first page)

      An advanced search for the same returns it as the ninth result.

      Using the previously-described technique improves it even further: (aaron lebedeef) OR (aaron* lebedeef) OR (aaron lebedeef*) returns it as the seventh result.

      Using a fuzzy search on all the above improves many of the above results even further:
      "rudy~ wiedoeft~" is great.
      "rudy~ wied~" is not so great, but no worse.
      "rudy~ green~" is great.
      "go-cart~ mozart~" is not so great, but no worse.
      "aaron~ lebedeef~" is great.

      Combining all techniques works out the best, though:
      (rudy~ wiedoeft~) OR (rudy wiedoeft*) or (rudy* wiedoeft): great
      (rudy~ wied~) OR (rudy wied*) OR (rudy* wied): great
      (rudy~ green~) OR (rudy green*) OR (rudy* green): great.
      (go~ cart~ mozart~) OR (go*~ cart mozart) OR (go cart*~ mozart) OR (go cart mozart*~): great
      (aaron~ lebedeef~) OR (aaron lebedeef*) OR (aaron* lebedeef): great

      So it seems to me that:
      1. Advanced search should be on by default
      2. fuzzy matching of search terms should be on by default.
      3. hyphens should break words
      4. combinations of fuzzy and non-fuzzy matching (appending a wildcard and fuzziness to each of the words separately and ORing that with the fuzzy search on all words) should be performed by default

      Reference http://chatlogs.musicbrainz.org/musicbrainz-devel/2011/2011-03/2011-03-09.html#T20-15-07-468551 for conversation where the problems were found and discussed.

      1. http://test.musicbrainz.org/artist/911d1b3b-e93a-4896-9fda-42013b2c8a7e

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ijabz Paul Taylor
                Reporter:
                hawke Alex Mauer
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Packages

                  Version Package
                  2012-03-23