-
Bug
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
Compare https://beta.musicbrainz.org/search?query=b%C3%A4r&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=ba%CC%88r&type=artist&limit=25&method=indexed
The first one is using U+00E4 LATIN SMALL LETTER A WITH DIAERESIS, the second one is using U+0061 LATIN SMALL LETTER A followed by U+0308 COMBINING DIAERESIS. Unicode considers those as canonically equivalent, i.e. they should look and behave the same, so searching for either should find exactly the same results.
Also compare https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%CC%81%D0%B2%D0%B8%D1%87&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%D0%B2%D0%B8%D1%87&type=artist&limit=25&method=indexed
The second one has a combining diacritic which is not present at all in the first, but we ignore accents so those should also give the same results.
What actually happens, looking at the first one, is that the combining diacritic is treated like a space, so the search results find "ba" and "r" for the first example and "Шостако" and "вич" for the second. A combining diacritic is not a word separator though and shouldn't be treated as one.
- is related to
-
MBS-6010 Normalise text to NFC
- Closed