Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-1357

Support importing local dumps into Spark to improve development experience

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • data dumps
    • None

      Motivation

      Currently, there are two Spark management commands to download either a full or an incremental listen dump from the Metabrainz FTP server and import it into spark.

      At the same time, the dump manager within the ListenBrainz server supports creating dumps through the create_full or create_incremental commands. These dumps are stored inside a listenbrainz-export folder that's also mounted into the spark container(s) at /rec/listenbrainz-export.
      If you want to import those dumps into Spark, e.g. for working on stats computation with your own, known data, you'd have to make them available through a FTP server first, and would have the importer "download" them (copy them locally) first.

      Solution

      An additional request_import_full_local and respective request_import_incremental_local Spark command could search for dumps in the export directory, pick the latest (or specified through ID) dump and import them without having to copy the archive first.

      This could be implemented by further generalizing the ListenBrainzFTPDownloader to provide functions to list files and pick a specific dump archive. The FTP version would then download the archive to a temporary folder and return its path (like now), the local version would directly return the archive path.

            Maxr1998 Maxr1998
            Maxr1998 Maxr1998
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package