[PICARD-167] Picard should handle non-UTF-8 locales better

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: File Move & Rename
Labels:
- test-on-picard-2

If you use Picard with a non-UTF-8 locale, saving files whose filenames would contain unsupported characters gives an error and fails to rename the file, e.g.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 30: ordinal not in range(128)

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 20-22: ordinal not in range(256)

UnicodeEncodeError: 'euc_jp' codec can't encode character u'\uff5e' in position 35: illegal multibyte sequence

I did see a message when using an ASCII locale (not for the others though), but I think it would be better if Picard could handle it more intelligently and replace any unsupported characters with something else instead of just having errors.

from someone who needed help on IRC: http://chatlogs.musicbrainz.org/musicbrainz/2012/2012-03/2012-03-10.html#T21-54-34-143034

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

utf-test.PNG
54 kB
2018-11-02 18:09

has related issue

PICARD-1287 Picard crashes "Replace non-ASCII characters"

Open

is related to

PICARD-233 Can't deal with charset of file being different than charset of current application

Open

PICARD-3041 Open Containing Folder doesn't work if the path has grave (é or á) in it, over sshfs

Open

Enjeck Cleopatra added a comment - 2018-11-02 18:09 - edited

I attempted to recreate this issue on Picard 2.0.4 running on Windows 7. I renamed a track to "çéñ ", all consisting of non-UTF-8 characters. The track was successfully renamed. Clicking the save icon worked smoothly, with no errors.

Enjeck Cleopatra added a comment - 2018-11-02 18:09 - edited I attempted to recreate this issue on Picard 2.0.4 running on Windows 7. I renamed a track to "çéñ ", all consisting of non-UTF-8 characters. The track was successfully renamed. Clicking the save icon worked smoothly, with no errors.

jacobbrett added a comment - 2018-01-12 22:33 - edited

I have a similar error in Picard 1.4.2:

E: 09:26:29 Traceback (most recent call last):
 File "/usr/lib/picard/picard/album.py", line 184, in _release_request_finished
 parsed = self._parse_release(document)
 File "/usr/lib/picard/picard/album.py", line 149, in _parse_release
 add_release_to_user_collections(release_node)
 File "/usr/lib/picard/picard/collection.py", line 149, in add_release_to_user_collections
 (release_node.id, user_collections[node.id]))
 UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 22: ordinal not in range(128)

The error only seems to occur when the album being loaded is linked to one or more user collections where the collection title contains one or more unicode characters; in my instance, a unicode 'right single quotation mark' (u2019) triggers the error.

jacobbrett added a comment - 2018-01-12 22:33 - edited I have a similar error in Picard 1.4.2: E: 09:26:29 Traceback (most recent call last): File "/usr/lib/picard/picard/album.py", line 184, in _release_request_finished parsed = self._parse_release(document) File "/usr/lib/picard/picard/album.py", line 149, in _parse_release add_release_to_user_collections(release_node) File "/usr/lib/picard/picard/collection.py", line 149, in add_release_to_user_collections (release_node.id, user_collections[node.id])) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 22: ordinal not in range(128) The error only seems to occur when the album being loaded is linked to one or more user collections where the collection title contains one or more unicode characters; in my instance, a unicode 'right single quotation mark' (u2019) triggers the error.

Philipp Wolfer added a comment - 2014-12-09 14:27

At least related to PICARD-233, maybe even a duplicate

Philipp Wolfer added a comment - 2014-12-09 14:27 At least related to PICARD-233 , maybe even a duplicate

Max Grender-Jones added a comment - 2013-05-17 09:54 - edited

It seems this issue is sufficiently well understood that it is detected at startup - running from the console gives me:

System locale charset is ANSI_X3.4-1968
Your system's locale charset (i.e. the charset used to encode filenames)
is set to ANSI_X3.4-1968. It is highly unlikely that this has been done
intentionally. Most likely the locale is not set at all. An invalid setting
will result in problems when creating data projects.
To properly set the locale charset make sure the LC_* environment variables
are set. Normally the distribution setup tools take care of this.

Translation: Picard will have problems with non-english characters
in filenames until you change your charset.

Obviously fixing the locale is my problem, not picards...

However, when I try to import files with accents in them, nothing happens in the UI, but I get:

File "/usr/lib/python2.7/site-packages/picard/util/thread.py", line 58, in run_item
result = func()
File "/usr/lib/python2.7/site-packages/picard/tagger.py", line 377, in get_files
root, dirs, files = walk.next()
File "/usr/lib/python2.7/os.py", line 294, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/usr/lib/python2.7/os.py", line 284, in walk
if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py", line 80, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128)

Is there any way the call to walk could be surrounded by a try/except block which reports back to the gui layer so that the error is at least easily detected? Otherwise it looks like a picard bug whereas it's really my bug...

Max Grender-Jones added a comment - 2013-05-17 09:54 - edited It seems this issue is sufficiently well understood that it is detected at startup - running from the console gives me: System locale charset is ANSI_X3.4-1968 Your system's locale charset (i.e. the charset used to encode filenames) is set to ANSI_X3.4-1968. It is highly unlikely that this has been done intentionally. Most likely the locale is not set at all. An invalid setting will result in problems when creating data projects. To properly set the locale charset make sure the LC_* environment variables are set. Normally the distribution setup tools take care of this. Translation: Picard will have problems with non-english characters in filenames until you change your charset. Obviously fixing the locale is my problem, not picards... However, when I try to import files with accents in them, nothing happens in the UI, but I get: File "/usr/lib/python2.7/site-packages/picard/util/thread.py", line 58, in run_item result = func() File "/usr/lib/python2.7/site-packages/picard/tagger.py", line 377, in get_files root, dirs, files = walk.next() File "/usr/lib/python2.7/os.py", line 294, in walk for x in walk(new_path, topdown, onerror, followlinks): File "/usr/lib/python2.7/os.py", line 284, in walk if isdir(join(top, name)): File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 21: ordinal not in range(128) Is there any way the call to walk could be surrounded by a try/except block which reports back to the gui layer so that the error is at least easily detected? Otherwise it looks like a picard bug whereas it's really my bug...

David Rodríguez added a comment - 2013-05-01 10:33

Hi, I'm having a similar (but stranger I think) issue. I can't read with Picard a specific album. It throws that same kind of error:

davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ picard 1.mp3 
E: 3076032192 12:13:27 Traceback (most recent call last):
  File "/usr/lib/picard/picard/util/thread.py", line 59, in run_item
    result = func()
  File "/usr/lib/picard/picard/formats/id3.py", line 180, in _load
    name = str(frame.desc.lower())
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 3: ordinal not in range(128)

but my locale seems to be fine...

davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ locale
LANG=es_ES.UTF-8
LANGUAGE=
LC_CTYPE="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_PAPER="es_ES.UTF-8"
LC_NAME="es_ES.UTF-8"
LC_ADDRESS="es_ES.UTF-8"
LC_TELEPHONE="es_ES.UTF-8"
LC_MEASUREMENT="es_ES.UTF-8"
LC_IDENTIFICATION="es_ES.UTF-8"
LC_ALL=

and there doesn't seem to be strange characters in the filename nor in the id3 tags, which I deleted, just in case.

davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ id3 -l 1.mp3 
1.mp3: No ID3 tag.

The conflictive files play perfectly fine and can be downloaded from here: http://archive.org/download/revoltosa/Bongo_Botrako_Revoltosa.zip

Thanks a lot.

David Rodríguez added a comment - 2013-05-01 10:33 Hi, I'm having a similar (but stranger I think) issue. I can't read with Picard a specific album. It throws that same kind of error: davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ picard 1.mp3 E: 3076032192 12:13:27 Traceback (most recent call last): File "/usr/lib/picard/picard/util/thread.py", line 59, in run_item result = func() File "/usr/lib/picard/picard/formats/id3.py", line 180, in _load name = str(frame.desc.lower()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 3: ordinal not in range(128) but my locale seems to be fine... davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ locale LANG=es_ES.UTF-8 LANGUAGE= LC_CTYPE="es_ES.UTF-8" LC_NUMERIC="es_ES.UTF-8" LC_TIME="es_ES.UTF-8" LC_COLLATE="es_ES.UTF-8" LC_MONETARY="es_ES.UTF-8" LC_MESSAGES="es_ES.UTF-8" LC_PAPER="es_ES.UTF-8" LC_NAME="es_ES.UTF-8" LC_ADDRESS="es_ES.UTF-8" LC_TELEPHONE="es_ES.UTF-8" LC_MEASUREMENT="es_ES.UTF-8" LC_IDENTIFICATION="es_ES.UTF-8" LC_ALL= and there doesn't seem to be strange characters in the filename nor in the id3 tags, which I deleted, just in case. davidr@pantani:~/as/Bongo Botrako - Revoltosa (2012)$ id3 -l 1.mp3 1.mp3: No ID3 tag. The conflictive files play perfectly fine and can be downloaded from here: http://archive.org/download/revoltosa/Bongo_Botrako_Revoltosa.zip Thanks a lot.

nikki added a comment - 2012-06-05 17:45

Two tickets from Trac: http://bugs.musicbrainz.org/ticket/5962 and http://bugs.musicbrainz.org/ticket/4353

There are two issues it seems, reading files with filenames with the wrong encoding and writing files with the wrong encoding. The latter could be fixed by replacing any unsupported characters (as I suggested in the ticket description) but that doesn't fix the former.

nikki added a comment - 2012-06-05 17:45 Two tickets from Trac: http://bugs.musicbrainz.org/ticket/5962 and http://bugs.musicbrainz.org/ticket/4353 There are two issues it seems, reading files with filenames with the wrong encoding and writing files with the wrong encoding. The latter could be fixed by replacing any unsupported characters (as I suggested in the ticket description) but that doesn't fix the former.

Details

Description

Attachments

Attachments

Issue Links

Activity

Collapse comment: Enjeck Cleopatra added a comment - 2018-11-02 18:09, Edited by Enjeck Cleopatra - 2018-11-02 18:11

Expand comment: Enjeck Cleopatra added a comment - 2018-11-02 18:09, Edited by Enjeck Cleopatra - 2018-11-02 18:11

Collapse comment: jacobbrett added a comment - 2018-01-12 22:33, Edited by jacobbrett - 2018-01-12 22:34

Expand comment: jacobbrett added a comment - 2018-01-12 22:33, Edited by jacobbrett - 2018-01-12 22:34

Collapse comment: Philipp Wolfer added a comment - 2014-12-09 14:27

Expand comment: Philipp Wolfer added a comment - 2014-12-09 14:27

Collapse comment: Max Grender-Jones added a comment - 2013-05-17 09:54, Edited by Max Grender-Jones - 2013-05-17 09:56

Expand comment: Max Grender-Jones added a comment - 2013-05-17 09:54, Edited by Max Grender-Jones - 2013-05-17 09:56

Collapse comment: David Rodríguez added a comment - 2013-05-01 10:33

Expand comment: David Rodríguez added a comment - 2013-05-01 10:33

Collapse comment: nikki added a comment - 2012-06-05 17:45

Expand comment: nikki added a comment - 2012-06-05 17:45

People

Dates

Packages