Uploaded image for project: 'MusicBrainz Search Server'
  1. MusicBrainz Search Server
  2. SEARCH-314

Combining diacritics are not handled correctly

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2014-10-08
    • Labels:
      None

      Description

      Compare https://beta.musicbrainz.org/search?query=b%C3%A4r&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=ba%CC%88r&type=artist&limit=25&method=indexed
      The first one is using U+00E4 LATIN SMALL LETTER A WITH DIAERESIS, the second one is using U+0061 LATIN SMALL LETTER A followed by U+0308 COMBINING DIAERESIS. Unicode considers those as canonically equivalent, i.e. they should look and behave the same, so searching for either should find exactly the same results.

      Also compare https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%CC%81%D0%B2%D0%B8%D1%87&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%D0%B2%D0%B8%D1%87&type=artist&limit=25&method=indexed
      The second one has a combining diacritic which is not present at all in the first, but we ignore accents so those should also give the same results.

      What actually happens, looking at the first one, is that the combining diacritic is treated like a space, so the search results find "ba" and "r" for the first example and "Шостако" and "вич" for the second. A combining diacritic is not a word separator though and shouldn't be treated as one.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ijabz Paul Taylor
                Reporter:
                nikki nikki
              • Votes:
                1 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Packages

                  Version Package
                  2014-10-08