Uploaded image for project: 'Zapped: AcousticBrainz'
  1. Zapped: AcousticBrainz
  2. AB-314

Don't allow more than X recordings in a data dump file

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None

      This will lead to more files with smaller sizes in a single dump which is a good thing.

      https://chatlogs.metabrainz.org/brainzbot/metabrainz/2017-10-24/?msg=4026918&page=2

          [AB-314] Don't allow more than X recordings in a data dump file

          Param Singh added a comment -

          Param Singh added a comment - First cut for lowlevel json dumps: https://github.com/metabrainz/acousticbrainz-server/pull/241

          I'd like to run some tests to see how many files we should put in each archive. We currently have 6 million submissions, so a first dump will have 6m / x parts - does it matter if we put 100k files per archive and end up with 120 dump files (60 ll, 60 hl) or should we try and minimise this?

          Also test the file size of a 200k, 500k, 1m archive. Our original dump was 30GB. For archival purposes it probably doesn't matter if we're around this size, but smaller files are less unwieldy for people to download.

          Check how many submissions we have on average per month. If for example we have ~110k, I'd prefer to put the limit higher so that we don't always create 100k + 10k files each month.

          Some variation of this code could probably also be used for AB-97, if we have the ability to dump X files and not write a record to the incremental dumps table then we can use this dump function to make a sample archive.

          Alastair Porter added a comment - I'd like to run some tests to see how many files we should put in each archive. We currently have 6 million submissions, so a first dump will have 6m / x parts - does it matter if we put 100k files per archive and end up with 120 dump files (60 ll, 60 hl) or should we try and minimise this? Also test the file size of a 200k, 500k, 1m archive. Our original dump was 30GB. For archival purposes it probably doesn't matter if we're around this size, but smaller files are less unwieldy for people to download. Check how many submissions we have on average per month. If for example we have ~110k, I'd prefer to put the limit higher so that we don't always create 100k + 10k files each month. Some variation of this code could probably also be used for AB-97 , if we have the ability to dump X files and not write a record to the incremental dumps table then we can use this dump function to make a sample archive.

            Unassigned Unassigned
            iliekcomputers Param Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Version Package