-
Improvement
-
Resolution: Fixed
-
Normal
-
None
-
None
Motivation
Currently, there are two Spark management commands to download either a full or an incremental listen dump from the Metabrainz FTP server and import it into spark.
At the same time, the dump manager within the ListenBrainz server supports creating dumps through the create_full or create_incremental commands. These dumps are stored inside a listenbrainz-export folder that's also mounted into the spark container(s) at /rec/listenbrainz-export.
If you want to import those dumps into Spark, e.g. for working on stats computation with your own, known data, you'd have to make them available through a FTP server first, and would have the importer "download" them (copy them locally) first.
Solution
An additional request_import_full_local and respective request_import_incremental_local Spark command could search for dumps in the export directory, pick the latest (or specified through ID) dump and import them without having to copy the archive first.
This could be implemented by further generalizing the ListenBrainzFTPDownloader to provide functions to list files and pick a specific dump archive. The FTP version would then download the archive to a temporary folder and return its path (like now), the local version would directly return the archive path.