Loading...

XML

Word

Printable

Type: Bug
Resolution: Fixed
Priority: Normal
Fix Version/s: 2014-10-08
Affects Version/s: None
Component/s: None
Labels:
None

Compare https://beta.musicbrainz.org/search?query=b%C3%A4r&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=ba%CC%88r&type=artist&limit=25&method=indexed
The first one is using U+00E4 LATIN SMALL LETTER A WITH DIAERESIS, the second one is using U+0061 LATIN SMALL LETTER A followed by U+0308 COMBINING DIAERESIS. Unicode considers those as canonically equivalent, i.e. they should look and behave the same, so searching for either should find exactly the same results.

Also compare https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%CC%81%D0%B2%D0%B8%D1%87&type=artist&method=indexed and https://beta.musicbrainz.org/search?query=%D0%A8%D0%BE%D1%81%D1%82%D0%B0%D0%BA%D0%BE%D0%B2%D0%B8%D1%87&type=artist&limit=25&method=indexed
The second one has a combining diacritic which is not present at all in the first, but we ignore accents so those should also give the same results.

What actually happens, looking at the first one, is that the combining diacritic is treated like a space, so the search results find "ba" and "r" for the first example and "Шостако" and "вич" for the second. A combining diacritic is not a word separator though and shouldn't be treated as one.

is related to

MBS-6010 Normalise text to NFC

Closed

Assignee:: Paul Taylor

Reporter:: nikki

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2013-08-01 07:48

Updated:: 2014-10-08 12:55

Resolved:: 2014-10-01 09:39

Version	Package
2014-10-08

Details

Description

Attachments

Issue Links

Activity

People

Dates

Packages