Loading...

XML

Word

Printable

Type: Improvement
Resolution: Fixed
Priority: Normal
Fix Version/s: None
Component/s: data dumps
Labels:
None

Motivation

Currently, there are two Spark management commands to download either a full or an incremental listen dump from the Metabrainz FTP server and import it into spark.

At the same time, the dump manager within the ListenBrainz server supports creating dumps through the create_full or create_incremental commands. These dumps are stored inside a listenbrainz-export folder that's also mounted into the spark container(s) at /rec/listenbrainz-export.
If you want to import those dumps into Spark, e.g. for working on stats computation with your own, known data, you'd have to make them available through a FTP server first, and would have the importer "download" them (copy them locally) first.

Solution

An additional request_import_full_local and respective request_import_incremental_local Spark command could search for dumps in the export directory, pick the latest (or specified through ID) dump and import them without having to copy the archive first.

This could be implemented by further generalizing the ListenBrainzFTPDownloader to provide functions to list files and pick a specific dump archive. The FTP version would then download the archive to a temporary folder and return its path (like now), the local version would directly return the archive path.

Assignee:: Maxr1998

Reporter:: Maxr1998

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2023-09-23 09:32

Updated:: 2024-02-27 13:35

Resolved:: 2024-02-27 13:35

Version	Package

Details

Description

Motivation

Solution

Attachments

Activity

People

Dates

Packages