Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-905

Metadata summarisation of many albums makes Picard sluggish

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 1.3.2
    • User Interface

      When you have a large number of albums/files loaded, the UI becomes sluggish if you select a large number of these to e.g. save.

      The reason for this is the metadata analysis that is done for common tags when you select multiple items which has to loop through every metadata item (both original and new) for every track selected.

      Performance could be improved by an order of magnitude if, under the covers, picard maintained such a summary at an album level so that it was looping only over albums (except where the album is expanded in the UI and not all of the tracks have been selected). Assuming 20 tracks per album, we should expect this to be 15-20 times faster than at present.

      Of course, it does require maintaining the summary by album whenever the metadata for any track is changed, and you wouldn't want to do this for every individual change of metadata e.g. when the webservice call completes, so it probably needs a flag to say that the album summary needs updating and a timer triggered say (100ms) after the first metadata change - so the technicalities are non-trivial.

      However, this would be a useful improvement to include in Picard 1.4.

          [PICARD-905] Metadata summarisation of many albums makes Picard sluggish

          Sophist added a comment -

          As I noted on PICARD-2811, the metadata summarisation code should do a microsleep (of 1ms) after processing each selected file (or each X selected files) in order to allow I/O related threads to get a better chance of grabbing the GIL.

          This is a better solution that trying to return control to the main thread and should be pretty darned easy to implement.

          Sophist added a comment - As I noted on PICARD-2811 , the metadata summarisation code should do a microsleep (of 1ms) after processing each selected file (or each X selected files) in order to allow I/O related threads to get a better chance of grabbing the GIL. This is a better solution that trying to return control to the main thread and should be pretty darned easy to implement.

          Sophist added a comment - - edited

          Philipp - Thanks!!!!!!

          Gabriel - if you get a chance to measure the performance change from https://github.com/metabrainz/picard/pull/1780 that would be useful feedback.

          Sophist added a comment - - edited Philipp - Thanks!!!!!! Gabriel - if you get a chance to measure the performance change from https://github.com/metabrainz/picard/pull/1780 that would be useful feedback.

          Thanks, that solution looks good. I played the coding monkey and added your code to https://github.com/metabrainz/picard/pull/1780 . Let's have this in a separate issue PICARD-2174, then we can keep this one open.

          Changes to id3 writing are actually not needed, as that is already handled by the calls to update_to_v23. That new_metadata implementation always was about having these differences to internal representation properly displayed in the metadata view.

          This needs some more testing to make sure we don't regress on something we haven't considered yet. But with this more efficient solution we could actually more widely make use of this. We might have a couple of other cases where stored data differs from internal representation due to format limitations.

           

           

          Philipp Wolfer added a comment - Thanks, that solution looks good. I played the coding monkey and added your code to https://github.com/metabrainz/picard/pull/1780 . Let's have this in a separate issue PICARD-2174 , then we can keep this one open. Changes to id3 writing are actually not needed, as that is already handled by the calls to update_to_v23. That new_metadata implementation always was about having these differences to internal representation properly displayed in the metadata view. This needs some more testing to make sure we don't regress on something we haven't considered yet. But with this more efficient solution we could actually more widely make use of this. We might have a couple of other cases where stored data differs from internal representation due to format limitations.    

          Sophist added a comment -

          Note: Removing the metadata object copy should improve performance, but this ticket should still remain open as the algorithmic change proposed should still be kept in mind.

          Sophist added a comment - Note: Removing the metadata object copy should improve performance, but this ticket should still remain open as the algorithmic change proposed should still be kept in mind.

          Sophist added a comment - - edited

          @outsidecontext Actually - yes - I agree. And I can immediately see the performance issue with making copies of the metadata object for every file in the metadata summarisation. 

          I don't have a development environment on my computer at present and don't have time to create one, but I would suggest the following code as a functionally equivalent replacement to new_metadata which avoids copying the metadata object:

          file.py replace all references to self.new_metadata with self.metadata and lines 159-161 with:

          format_specific_metadata(self, metadata, tag):
              return metadata.getall(tag)
          

          id3.py replace lines 653-682 with:

          format_specific_metadata(self, metadata, tag):
              config = get_config()
              if not config.setting["write_id3v23"]:
                  return metadata.getall(tag)
          
              join_with = config.setting["id3v23_join_with"]
          
              values =  metadata.getraw(tag)
              if values is None:
                  return None
          
              if name == "originaldate":
                  values = [v[:4] for v in values]
              elif name == "date":
                  values = [(v[:4] if len(v) < 10 else v) for v in values]
          
              # If this is a multi-valued field, then it needs to be flattened,
              # unless it's TIPL or TMCL which can still be multi-valued.
              if (len(values) > 1 and name not in ID3File._rtipl_roles
                      and not name.startswith("performer:")):
                  values = [join_with.join(values)]
             
             return values
          

          Then in metadatabox.py _update_tags you replace line 589

          new_metadata = file.new_metadata

          with

          new_metadata = file.metadata

          and lines 594/595:

          new_values = new_metadata.getall(name)
          orig_values = orig_metadata.getall(name)
          

          with:

          new_values = file.format_specific_metadata(new_metadata, name)
          orig_values = file.format_specific_metadata(orig_metadata, name)
          

          Hey presto - no object copying for metadatabox. 

          I think there will need to be some sort of change somewhere else in file.py or id3.py to deal with changing data for file saving, and I think it is this:

          id3.py line 426-427 replace:

          for name, values in metadata.rawitems():
              values = [id3text(v, encoding) for v in values]
          

          with:

          for name in metadata.keys():
              values = [id3text(v, encoding) for v in self.format_specific_metadata(metadata, name)]
          

          Sophist added a comment - - edited @outsidecontext Actually - yes - I agree. And I can immediately see the performance issue with making copies of the metadata object for every file in the metadata summarisation.  I don't have a development environment on my computer at present and don't have time to create one, but I would suggest the following code as a functionally equivalent replacement to new_metadata which avoids copying the metadata object: file.py replace all references to self.new_metadata with self.metadata and lines 159-161 with: format_specific_metadata( self , metadata, tag): return metadata.getall(tag) id3.py replace lines 653-682 with: format_specific_metadata( self , metadata, tag): config = get_config() if not config.setting[ "write_id3v23" ]: return metadata.getall(tag) join_with = config.setting[ "id3v23_join_with" ] values = metadata.getraw(tag) if values is None : return None if name == "originaldate" : values = [v[:4] for v in values] elif name == "date" : values = [(v[:4] if len (v) < 10 else v) for v in values] # If this is a multi-valued field, then it needs to be flattened, # unless it's TIPL or TMCL which can still be multi-valued. if ( len (values) > 1 and name not in ID3File._rtipl_roles and not name.startswith( "performer:" )): values = [join_with.join(values)] return values Then in metadatabox.py _update_tags you replace line 589 new_metadata = file .new_metadata with new_metadata = file .metadata and lines 594/595: new_values = new_metadata.getall(name) orig_values = orig_metadata.getall(name) with: new_values = file .format_specific_metadata(new_metadata, name) orig_values = file .format_specific_metadata(orig_metadata, name) Hey presto - no object copying for metadatabox.  I think there will need to be some sort of change somewhere else in file.py or id3.py to deal with changing data for file saving, and I think it is this: id3.py line 426-427 replace: for name, values in metadata.rawitems(): values = [id3text(v, encoding) for v in values] with: for name in metadata.keys(): values = [id3text(v, encoding) for v in self .format_specific_metadata(metadata, name)]

          @Sophist

          > because IMO it should affect how the data is stored internally or displayed to the user - these should be format independent IMO.

           

          Actually it should affect how the data is displayed if it makes a difference. The users expects the data to be stored as they see it. We have a couple of open issues that are directly related to this discrepancy for some ID3 v2.3 differences.

           

          > I am unclear how ID3 V2.3 vs. V2.4 should make a material difference.

          The code Gabriel has identified as the culprit is this one:

          https://github.com/metabrainz/picard/blob/master/picard/formats/id3.py#L654

          Actually I only stumbled over this myself last year when working on related code.

          It tries to solve exactly the issue of presenting the data for 2.3 in the proper way ( it doesn't handle all cases, though).

          I am not surprised this causes performance issues, as it always copies the metadata and modified it whenever file.new_metadata is accessed. It has other issues, like new_metadata returning self.metadata for all formats, but a copy for ID3v2.3. That's an easy source for bugs.

          That functionality clearly needs some refactoring.

           

           

          Philipp Wolfer added a comment - @Sophist > because IMO it should affect how the data is stored internally or displayed to the user - these should be format independent IMO.   Actually it should affect how the data is displayed if it makes a difference. The users expects the data to be stored as they see it. We have a couple of open issues that are directly related to this discrepancy for some ID3 v2.3 differences.   > I am unclear how ID3 V2.3 vs. V2.4 should make a material difference. The code Gabriel has identified as the culprit is this one: https://github.com/metabrainz/picard/blob/master/picard/formats/id3.py#L654 Actually I only stumbled over this myself last year when working on related code. It tries to solve exactly the issue of presenting the data for 2.3 in the proper way ( it doesn't handle all cases, though). I am not surprised this causes performance issues, as it always copies the metadata and modified it whenever file.new_metadata is accessed. It has other issues, like new_metadata returning self.metadata for all formats, but a copy for ID3v2.3. That's an easy source for bugs. That functionality clearly needs some refactoring.    

          Sophist added a comment -

          Gabriel - there are some differences between ID3v2.3 and v2.4 support outside the ID3 file format code - to do with how multi-value variables are handled. We should probably look at localising that functionality to the ID3 format code, because IMO it should affect how the data is stored internally or displayed to the user - these should be format independent IMO.

          But aside from this minor thing, I am unclear how ID3 V2.3 vs. V2.4 should make a material difference.

          The time taken for metadata updates when you have a lot of tracks selected is simply the need to iterate over all those tracks to summarise - and I think that the primary way to speed it up is to pre-calculate partial summaries by album so that you iterate across albums plus potentially a few odd tracks which should be at least an order of magnitude faster.

          But my knowledge of the code is years out of date and partially faded, and you may well be right about using a better / faster way of copying an object. 

          Sophist added a comment - Gabriel - there are some differences between ID3v2.3 and v2.4 support outside the ID3 file format code - to do with how multi-value variables are handled. We should probably look at localising that functionality to the ID3 format code, because IMO it should affect how the data is stored internally or displayed to the user - these should be format independent IMO. But aside from this minor thing, I am unclear how ID3 V2.3 vs. V2.4 should make a material difference. The time taken for metadata updates when you have a lot of tracks selected is simply the need to iterate over all those tracks to summarise - and I think that the primary way to speed it up is to pre-calculate partial summaries by album so that you iterate across albums plus potentially a few odd tracks which should be at least an order of magnitude faster. But my knowledge of the code is years out of date and partially faded, and you may well be right about using a better / faster way of copying an object. 

          Sophist added a comment - - edited

          Gabriel - I am not sure this is the right place to comment on this. This ticket is really focused on how metadata is summarised when you have a lot of albums loaded and selected (and I just changed the title to reflect this).

          As Philipp alluded, there have been a mass of performance improvements in recent months which have improved overall performance, but there are still lots of areas for improvement.

          So please feel free to submit a PR for what you have found. My advice from my own experiences is to be careful, and keep your changes small and self contained. When I tried to make performance fixes, I found it was very easy to make Picard unreliable or actually make performance worse. 

          Sophist added a comment - - edited Gabriel - I am not sure this is the right place to comment on this. This ticket is really focused on how metadata is summarised when you have a lot of albums loaded and selected (and I just changed the title to reflect this). As Philipp alluded, there have been a mass of performance improvements in recent months which have improved overall performance, but there are still lots of areas for improvement. So please feel free to submit a PR for what you have found. My advice from my own experiences is to be careful, and keep your changes small and self contained. When I tried to make performance fixes, I found it was very easy to make Picard unreliable or actually make performance worse. 

          Gabriel Ferreira added a comment - - edited

          The metadatabox should be way better with the coverart changes in https://tickets.metabrainz.org/browse/PICARD-2162 , but it still is pretty slow.

          I was looking into it and a major source of slowdown of metadatabox updates is in id3.new_metadata. 

          Replacing the copy = Metadata(), copy.copy(self.metadata) with from copy import copy as python_copy, copy = python_copy(self.metadata) and commenting out id3v23 halves the metadatabox processing times. I think caching the copy with id3v23 stuff is a good idea.

          A different way to check the difference is switch from id3v2.3 to id3v2.4 tags in the options window.

          Gabriel Ferreira added a comment - - edited The metadatabox should be way better with the coverart changes in https://tickets.metabrainz.org/browse/PICARD-2162 , but it still is pretty slow. I was looking into it and a major source of slowdown of metadatabox updates is in id3.new_metadata.  Replacing the copy = Metadata(), copy.copy(self.metadata) with from copy import copy as python_copy, copy = python_copy(self.metadata) and commenting out id3v23 halves the metadatabox processing times. I think caching the copy with id3v23 stuff is a good idea. A different way to check the difference is switch from id3v2.3 to id3v2.4 tags in the options window.

          Sophist added a comment - - edited

          Give me a few days to try 2.4(.1) and I will let you know.

          However this is specific to summarising metadata when you select a large amount of albums in the UI, so I suspect not.

          Sophist added a comment - - edited Give me a few days to try 2.4(.1) and I will let you know. However this is specific to summarising metadata when you select a large amount of albums in the UI, so I suspect not.

            Unassigned Unassigned
            sophist Sophist
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:

                Version Package