Uploaded image for project: 'MusicBrainz Server'
  1. MusicBrainz Server
  2. MBS-12290

Guess Case incorrectly capitalizes Georgian script

    • Icon: Improvement Improvement
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • None
    • Guess case
    • None

      Modern Georgian is usually written as a unicase script called Mkhedruli. Capital forms called Mtavruli exist, but are not used to begin sentences or for proper nouns. Mtavruli are instead used for headings, emphases, etc. Mtavruli is used as ALLCAPS.

      Unicode originally only encoded Mkhedruli, but with version 11 in 2018, Mtavruli were added as casing pairs with Mkhedruli.

      Currently the Guess Case feature changes the first letter of a word is Mkhedruli. Since Mtavruli is only used in special circumstances, excluding the Georgian script from Guess Case would work.

      See:
      https://en.wikipedia.org/wiki/Georgian_scripts#Mkhedruli
      https://en.wikipedia.org/wiki/Georgian_(Unicode_block)
      https://medium.com/@akakirazmadze/mtavruli-letters-5665d0aec09c

          [MBS-12290] Guess Case incorrectly capitalizes Georgian script

          pmepepnoute added a comment -

          Closing the issue I reported since it's such a niche case 

          pmepepnoute added a comment - Closing the issue I reported since it's such a niche case 

          pmepepnoute added a comment -

          OK, that makes sense. The only reason I noticed is because I've been adding releases from a Georgian artist and had to research how capitalization worked in Georgian. I think that almost everyone added Georgian script release would know the correct capitalization, and Georgian script released are uncommon enough that there's probably no point in adding a special mode for it. (I assume that the Turkish mode is for the dotted and undotted "I"s?) I keep an eye on the languages and scripts page, so I'll fix any mistakes eventually.

          Playing around with the guess case function, it seems that it affects any character with Unicode case folding (Tested using Safari and Chromium on the latest MacOS.)

          pmepepnoute added a comment - OK, that makes sense. The only reason I noticed is because I've been adding releases from a Georgian artist and had to research how capitalization worked in Georgian. I think that almost everyone added Georgian script release would know the correct capitalization, and Georgian script released are uncommon enough that there's probably no point in adding a special mode for it. (I assume that the Turkish mode is for the dotted and undotted "I"s?) I keep an eye on the languages and scripts page, so I'll fix any mistakes eventually. Playing around with the guess case function, it seems that it affects any character with Unicode case folding (Tested using Safari and Chromium on the latest MacOS.)

          The main issue here is that we have no guess case for Georgian, so nobody should probably be using guess case in Georgian to begin with. Obviously someone using the existing guess case modes "English", "French" or "Turkish" and expecting it to work for Georgian would be making a mistake "Sentence" would be closer, but still require a bit of manual amendment. We could, of course, consider having a Georgian guess case mode (and it might very well make sense, if it's as simple as "all lowercase"), but honestly, the easiest at least for now might be to just have a Georgian guideline (we don't seem to have any at the moment) and the guideline could additionally remind users that guess case will not give 100% correct results for Georgian, but that sentence mode will give the closest results and require manually lowercasing the first letter of each title.

          Nicolás Tamargo added a comment - The main issue here is that we have no guess case for Georgian, so nobody should probably be using guess case in Georgian to begin with. Obviously someone using the existing guess case modes "English", "French" or "Turkish" and expecting it to work for Georgian would be making a mistake "Sentence" would be closer, but still require a bit of manual amendment. We could, of course, consider having a Georgian guess case mode (and it might very well make sense, if it's as simple as "all lowercase"), but honestly, the easiest at least for now might be to just have a Georgian guideline (we don't seem to have any at the moment) and the guideline could additionally remind users that guess case will not give 100% correct results for Georgian, but that sentence mode will give the closest results and require manually lowercasing the first letter of each title.

            Unassigned Unassigned
            pmepepnoute pmepepnoute
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package