Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-167

Picard should handle non-UTF-8 locales better

      If you use Picard with a non-UTF-8 locale, saving files whose filenames would contain unsupported characters gives an error and fails to rename the file, e.g.

      UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 30: ordinal not in range(128)

      UnicodeEncodeError: 'latin-1' codec can't encode characters in position 20-22: ordinal not in range(256)

      UnicodeEncodeError: 'euc_jp' codec can't encode character u'\uff5e' in position 35: illegal multibyte sequence

      I did see a message when using an ASCII locale (not for the others though), but I think it would be better if Picard could handle it more intelligently and replace any unsupported characters with something else instead of just having errors.

      from someone who needed help on IRC: http://chatlogs.musicbrainz.org/musicbrainz/2012/2012-03/2012-03-10.html#T21-54-34-143034

          [PICARD-167] Picard should handle non-UTF-8 locales better

          Enjeck Cleopatra added a comment - - edited

          I attempted to recreate this issue on Picard 2.0.4 running on Windows 7. I renamed a track to "çéñ ", all consisting of non-UTF-8 characters. The track was successfully renamed. Clicking the save icon worked smoothly, with no errors.

          Enjeck Cleopatra added a comment - - edited I attempted to recreate this issue on Picard 2.0.4 running on Windows 7. I renamed a track to "çéñ ", all consisting of non-UTF-8 characters. The track was successfully renamed. Clicking the save icon worked smoothly, with no errors.

          jacobbrett added a comment - - edited

          I have a similar error in Picard 1.4.2:

          E: 09:26:29 Traceback (most recent call last):
           File "/usr/lib/picard/picard/album.py", line 184, in _release_request_finished
           parsed = self._parse_release(document)
           File "/usr/lib/picard/picard/album.py", line 149, in _parse_release
           add_release_to_user_collections(release_node)
           File "/usr/lib/picard/picard/collection.py", line 149, in add_release_to_user_collections
           (release_node.id, user_collections[node.id]))
           UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 22: ordinal not in range(128)
          

          The error only seems to occur when the album being loaded is linked to one or more user collections where the collection title contains one or more unicode characters; in my instance, a unicode 'right single quotation mark' (u2019) triggers the error.

          jacobbrett added a comment - - edited I have a similar error in Picard 1.4.2: E: 09:26:29 Traceback (most recent call last): File "/usr/lib/picard/picard/album.py", line 184, in _release_request_finished parsed = self._parse_release(document) File "/usr/lib/picard/picard/album.py", line 149, in _parse_release add_release_to_user_collections(release_node) File "/usr/lib/picard/picard/collection.py", line 149, in add_release_to_user_collections (release_node.id, user_collections[node.id])) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 22: ordinal not in range(128) The error only seems to occur when the album being loaded is linked to one or more user collections where the collection title contains one or more unicode characters; in my instance, a unicode 'right single quotation mark' (u2019) triggers the error.

          At least related to PICARD-233, maybe even a duplicate

          Philipp Wolfer added a comment - At least related to PICARD-233 , maybe even a duplicate

          Max Grender-Jones added a comment - - edited

          It seems this issue is sufficiently well understood that it is detected at startup - running from the console gives me:

          System locale charset is ANSI_X3.4-1968
          Your system's locale charset (i.e. the charset used to encode filenames)
          is set to ANSI_X3.4-1968. It is highly unlikely that this has been done
          intentionally. Most likely the locale is not set at all. An invalid setting
          will result in problems when creating data projects.
          To properly set the locale charset make sure the LC_* environment variables
          are set. Normally the distribution setup tools take care of this.

          Translation: Picard will have problems with non-english characters
          in filenames until you change your charset.

          Obviously fixing the locale is my problem, not picards...

          However, when I try to import files with accents in them, nothing happens in the UI, but I get:

          File "/usr/lib/python2.7/site-packages/picard/util/thread.py", line 58, in run_item
          result = func()
          File "/usr/lib/python2.7/site-packages/picard/tagger.py", line 377, in get_files
          root, dirs, files = walk.next()
          File "/usr/lib/python2.7/os.py", line 294, in walk
          for x in walk(new_path, topdown, onerror, followlinks):
          File "/usr/lib/python2.7/os.py", line 284, in walk
          if isdir(join(top, name)):
          File "/usr/lib/python2.7/posixpath.py", line 80, in join
          path += '/' + b
          UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128)

          Is there any way the call to walk could be surrounded by a try/except block which reports back to the gui layer so that the error is at least easily detected? Otherwise it looks like a picard bug whereas it's really my bug...

          Max Grender-Jones added a comment - - edited It seems this issue is sufficiently well understood that it is detected at startup - running from the console gives me: System locale charset is ANSI_X3.4-1968 Your system's locale charset (i.e. the charset used to encode filenames) is set to ANSI_X3.4-1968. It is highly unlikely that this has been done intentionally. Most likely the locale is not set at all. An invalid setting will result in problems when creating data projects. To properly set the locale charset make sure the LC_* environment variables are set. Normally the distribution setup tools take care of this. Translation: Picard will have problems with non-english characters in filenames until you change your charset. Obviously fixing the locale is my problem, not picards... However, when I try to import files with accents in them, nothing happens in the UI, but I get: File "/usr/lib/python2.7/site-packages/picard/util/thread.py", line 58, in run_item result = func() File "/usr/lib/python2.7/site-packages/picard/tagger.py", line 377, in get_files root, dirs, files = walk.next() File "/usr/lib/python2.7/os.py", line 294, in walk for x in walk(new_path, topdown, onerror, followlinks): File "/usr/lib/python2.7/os.py", line 284, in walk if isdir(join(top, name)): File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128) Is there any way the call to walk could be surrounded by a try/except block which reports back to the gui layer so that the error is at least easily detected? Otherwise it looks like a picard bug whereas it's really my bug...

          Hi, I'm having a similar (but stranger I think) issue. I can't read with Picard a specific album. It throws that same kind of error:

          davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ picard 1.mp3 
          E: 3076032192 12:13:27 Traceback (most recent call last):
            File "/usr/lib/picard/picard/util/thread.py", line 59, in run_item
              result = func()
            File "/usr/lib/picard/picard/formats/id3.py", line 180, in _load
              name = str(frame.desc.lower())
          UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 3: ordinal not in range(128)
          

          but my locale seems to be fine...

          davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ locale
          LANG=es_ES.UTF-8
          LANGUAGE=
          LC_CTYPE="es_ES.UTF-8"
          LC_NUMERIC="es_ES.UTF-8"
          LC_TIME="es_ES.UTF-8"
          LC_COLLATE="es_ES.UTF-8"
          LC_MONETARY="es_ES.UTF-8"
          LC_MESSAGES="es_ES.UTF-8"
          LC_PAPER="es_ES.UTF-8"
          LC_NAME="es_ES.UTF-8"
          LC_ADDRESS="es_ES.UTF-8"
          LC_TELEPHONE="es_ES.UTF-8"
          LC_MEASUREMENT="es_ES.UTF-8"
          LC_IDENTIFICATION="es_ES.UTF-8"
          LC_ALL=
          

          and there doesn't seem to be strange characters in the filename nor in the id3 tags, which I deleted, just in case.

          davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ id3 -l 1.mp3 
          1.mp3: No ID3 tag.
          

          The conflictive files play perfectly fine and can be downloaded from here: http://archive.org/download/revoltosa/Bongo_Botrako_Revoltosa.zip

          Thanks a lot.

          David Rodríguez added a comment - Hi, I'm having a similar (but stranger I think) issue. I can't read with Picard a specific album. It throws that same kind of error: davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ picard 1.mp3 E: 3076032192 12:13:27 Traceback (most recent call last): File "/usr/lib/picard/picard/util/thread.py", line 59, in run_item result = func() File "/usr/lib/picard/picard/formats/id3.py", line 180, in _load name = str(frame.desc.lower()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 3: ordinal not in range(128) but my locale seems to be fine... davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ locale LANG=es_ES.UTF-8 LANGUAGE= LC_CTYPE="es_ES.UTF-8" LC_NUMERIC="es_ES.UTF-8" LC_TIME="es_ES.UTF-8" LC_COLLATE="es_ES.UTF-8" LC_MONETARY="es_ES.UTF-8" LC_MESSAGES="es_ES.UTF-8" LC_PAPER="es_ES.UTF-8" LC_NAME="es_ES.UTF-8" LC_ADDRESS="es_ES.UTF-8" LC_TELEPHONE="es_ES.UTF-8" LC_MEASUREMENT="es_ES.UTF-8" LC_IDENTIFICATION="es_ES.UTF-8" LC_ALL= and there doesn't seem to be strange characters in the filename nor in the id3 tags, which I deleted, just in case. davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ id3 -l 1.mp3 1.mp3: No ID3 tag. The conflictive files play perfectly fine and can be downloaded from here: http://archive.org/download/revoltosa/Bongo_Botrako_Revoltosa.zip Thanks a lot.

          nikki added a comment -

          Two tickets from Trac: http://bugs.musicbrainz.org/ticket/5962 and http://bugs.musicbrainz.org/ticket/4353

          There are two issues it seems, reading files with filenames with the wrong encoding and writing files with the wrong encoding. The latter could be fixed by replacing any unsupported characters (as I suggested in the ticket description) but that doesn't fix the former.

          nikki added a comment - Two tickets from Trac: http://bugs.musicbrainz.org/ticket/5962 and http://bugs.musicbrainz.org/ticket/4353 There are two issues it seems, reading files with filenames with the wrong encoding and writing files with the wrong encoding. The latter could be fixed by replacing any unsupported characters (as I suggested in the ticket description) but that doesn't fix the former.

            Unassigned Unassigned
            nikki nikki
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:

                Version Package