Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-1375

Metadata sanitation before move-script execution fails

      When sanitation of metadata occurs before executing the move script (due to slash/backslash in any metadata tag), it fails with the following traceback:

       

      E: 11:47:25,245 /usr/lib/python3.7/site-packages/picard/util/thread.run:51: Traceback (most recent call last):
        File "/usr/lib/python3.7/site-packages/picard/util/thread.py", line 47, in run
          result = self.func()
        File "/usr/lib/python3.7/site-packages/picard/file.py", line 218, in _save_and_rename
          new_filename = self._rename(old_filename, metadata)
        File "/usr/lib/python3.7/site-packages/picard/file.py", line 380, in _rename
          self._make_filename(old_filename, metadata))
        File "/usr/lib/python3.7/site-packages/picard/file.py", line 349, in _make_filename
          new_filename = self._script_to_filename(naming_format, metadata, settings)
        File "/usr/lib/python3.7/site-packages/picard/file.py", line 305, in _script_to_filename
          for name in metadata.keys():RuntimeError: dictionary changed size during iteration
      
      

      This is an issue with different behavior of dict.keys() in Python 2 (creates copy of keys) and Python 3 (iterates over original keys), because the dictionary is being altered while iterating over it.

      It can be fixed (somewhat dirty) by converting keys to a list in file.py, _script_to_filename():

       

      # make sure every metadata can safely be used in a path name
              for name in list(metadata.keys()):
                  if isinstance(metadata[name], str):
      ...
      

       

      I don't have the time to look into this, but the issue could also affect further parts of Picard >2.0 where a dictionary is changed while iterating over it.

       

       

          [PICARD-1375] Metadata sanitation before move-script execution fails

          Fixed, will be released in 2.1.0

          Philipp Wolfer added a comment - Fixed, will be released in 2.1.0

          So I finally found at least a theoretical way to cause this issue:

          If the file metadata contains a tag with empty value then sanitation will set this tag again and remove it in this process.

          Now it is actually difficult to set an empty tag, because usually when you do something like metadata["sometag"] = None it will not be set, empty tags are not part of metadata. But if you do metadata.set("sometag", []) this will create an empty tag. We have other inconsistencies between these two functions, that are not obvious to a caller.

          So far I could not figure out a way to actually create an empty tag with Picard, but you somehow manage that Some questions:

          Some question to you:

          1. What plugins do you have active?
          2. Do you use the sort_multivalue_tags plugin (this is the only plugin I found uses metadata.set, but in a way that IMHO should be save)?
          3. What do you have set in options for preserve tags?

          So I see two potential fixes:

          1. For sanitation use metadata.set, e.g.:

          for name in metadata.keys():
          if isinstance(metadata[name], str):
          metadata.set(name, [sanitize_filename(metadata[name])])

          2. Remove the difference between metadata[] and metadata.set. Not sure why this is needed, eventual side effects have to be taken into account

          Philipp Wolfer added a comment - So I finally found at least a theoretical way to cause this issue: If the file metadata contains a tag with empty value then sanitation will set this tag again and remove it in this process. Now it is actually difficult to set an empty tag, because usually when you do something like metadata ["sometag"] = None it will not be set, empty tags are not part of metadata. But if you do metadata.set("sometag", []) this will create an empty tag. We have other inconsistencies between these two functions, that are not obvious to a caller. So far I could not figure out a way to actually create an empty tag with Picard, but you somehow manage that Some questions: Some question to you: 1. What plugins do you have active? 2. Do you use the sort_multivalue_tags plugin (this is the only plugin I found uses metadata.set, but in a way that IMHO should be save)? 3. What do you have set in options for preserve tags? So I see two potential fixes: 1. For sanitation use metadata.set, e.g.: for name in metadata.keys(): if isinstance(metadata [name] , str): metadata.set(name, [sanitize_filename(metadata [name] )]) 2. Remove the difference between metadata[] and metadata.set. Not sure why this is needed, eventual side effects have to be taken into account

          Thanks a lot for the feedback, interesting issue. I will take a closer look, but it might take a few days until I can get to this

          Philipp Wolfer added a comment - Thanks a lot for the feedback, interesting issue. I will take a closer look, but it might take a few days until I can get to this

          carstenl added a comment -

          Sorry for the delay. Work kept me busy for the last two weeks. I just had the chance to have a look at this issue again.

          The problem has indeed nothing to do with the initial title of the issue. I finally identified the tagger script that is to blame here:

          $if($eq(%_isAudiobook%,True), $set(_author, $trim($rsearch($if2(%_albumartists_sort%,%_artists_sort%),^[^;]*)) ) $if(%_author%, $set(_reader, $if($gt($sub($len(%albumartistsort%),$len($strip(%_author%))),0), $right(%albumartistsort%, $sub($len(%albumartistsort%),$len($strip(%_author%)))), ) ) ) )

          The script sets the author of audiobooks and extracts the reader from the albumartistsort tag (e.g. "Tolkien, J.R.R. read by Williamson, Nicol" gives _author="Tolkien, J.R.R." and _reader="read by Williamson, Nicol"). If there is no distinct reader the _reader tag is set to an empty string. Initially this works, despite the fact that the set-method apparently can't set empty strings but calls unset instead (see func_set in script.py).

          The part I don't understand happens during sanitation: Before sanitation the metadata dict contains the key "~reader", after sanitation it is missing. So it seems like _reader is unset during sanitation, but I don't get where that might happen.

          carstenl added a comment - Sorry for the delay. Work kept me busy for the last two weeks. I just had the chance to have a look at this issue again. The problem has indeed nothing to do with the initial title of the issue. I finally identified the tagger script that is to blame here: $ if ($eq(%_isAudiobook%,True), $set(_author, $trim($rsearch($if2(%_albumartists_sort%,%_artists_sort%),^[^;]*)) ) $ if (%_author%, $set(_reader, $ if ($gt($sub($len(%albumartistsort%),$len($strip(%_author%))),0), $right(%albumartistsort%, $sub($len(%albumartistsort%),$len($strip(%_author%)))), ) ) ) ) The script sets the author of audiobooks and extracts the reader from the albumartistsort tag (e.g. "Tolkien, J.R.R. read by Williamson, Nicol" gives _author="Tolkien, J.R.R." and _reader="read by Williamson, Nicol"). If there is no distinct reader the _reader tag is set to an empty string. Initially this works, despite the fact that the set-method apparently can't set empty strings but calls unset instead (see func_set in script.py). The part I don't understand happens during sanitation: Before sanitation the metadata dict contains the key "~reader", after sanitation it is missing. So it seems like _reader is unset during sanitation, but I don't get where that might happen.

          All right. Feel free to just share the scripts you are using and I can test locally. I will also try to test a bit again on my side.

          Philipp Wolfer added a comment - All right. Feel free to just share the scripts you are using and I can test locally. I will also try to test a bit again on my side.

          carstenl added a comment -

          It seems you were right, the issue could indeed go deeper.

          I did some more checks and now I think it has really nothing to do with slashes. But now I'm also more puzzled what does happen here.

          Like you already tested, renaming a single file with / in one of the visible tags works just fine, the same applies to multiple files.

          For me the error occurs for all audiobooks. In this case I create a hidden variable that contains slashes (To be honest, the variable is used nowhere. The script never worked the way I expected. Now that I know sanitation occurs I understand why... ). And because only for those values with slashes sanitation changes the metadata at all, the most natural thought was that the error must be connected to this hidden variable.

          But when removing this part of the script the error persists though there are no fields with slashes left. The exception is thrown both for single and multiple files.

          By now I suspect one of the other audiobook scripts is the culprit here. To investigate this further I have to rewrite it. I will look into this tomorrow, the scripting language is just too cumbersome for me tonight.

          carstenl added a comment - It seems you were right, the issue could indeed go deeper. I did some more checks and now I think it has really nothing to do with slashes. But now I'm also more puzzled what does happen here. Like you already tested, renaming a single file with / in one of the visible tags works just fine, the same applies to multiple files. For me the error occurs for all audiobooks. In this case I create a hidden variable that contains slashes (To be honest, the variable is used nowhere. The script never worked the way I expected. Now that I know sanitation occurs I understand why... ). And because only for those values with slashes sanitation changes the metadata at all, the most natural thought was that the error must be connected to this hidden variable. But when removing this part of the script the error persists though there are no fields with slashes left. The exception is thrown both for single and multiple files. By now I suspect one of the other audiobook scripts is the culprit here. To investigate this further I have to rewrite it. I will look into this tomorrow, the scripting language is just too cumbersome for me tonight.

          Philipp Wolfer added a comment - - edited

          Looking into this right now, but I am a bit puzzled because I don't see how the size of the list should change in this loop. It's just changing the data, and for that the code is fine. I cannot reproduce it when saving a single file with renaming enabled and tags containing \ or /. The sanitation works, no exception thrown. Maybe something threading related? Can you reproduce this with a single file or only when saving multiple?

          Apart from that you fix would be fine for cases where the actual size of the dict changes, nothing dirty about it. We had a couple of such issues after the Python 3 migration

          I might just apply your fix anyway, but I actually prefer to be able to reproduce an issue. I have the suspicion here that the actual issue goes somewhat deeper.

          Philipp Wolfer added a comment - - edited Looking into this right now, but I am a bit puzzled because I don't see how the size of the list should change in this loop. It's just changing the data, and for that the code is fine. I cannot reproduce it when saving a single file with renaming enabled and tags containing \ or /. The sanitation works, no exception thrown. Maybe something threading related? Can you reproduce this with a single file or only when saving multiple? Apart from that you fix would be fine for cases where the actual size of the dict changes, nothing dirty about it. We had a couple of such issues after the Python 3 migration I might just apply your fix anyway, but I actually prefer to be able to reproduce an issue. I have the suspicion here that the actual issue goes somewhat deeper.

            outsidecontext Philipp Wolfer
            carstenl carstenl
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package
                2.1