[LB-682] Improve the process for importing listens into HDFS - MetaBrainz Tickets

Type: Task
Resolution: Fixed
Priority: Normal
Fix Version/s: None
Component/s: listenbrainz-labs
Labels:
None

Epic Link:
dev-env and docs

We should update the way that we import listens into spark. Some places for improvement are -

Use a context manager for creating temporary directories locally, this way we don't have to worry about deleting the directories if an error occurs or after the import is done
We don't create the '/temp' directory in HDFS explicitly, which might be confusing for readers who are new to the code.
We should also use different variable names for directories inside HDFS and locally in order to reduce confusion.

amCap1712 added a comment - 2022-02-26 19:00

This was fixed at the time of reworking spark dumps to use parquet format.

amCap1712 added a comment - 2022-02-26 19:00 This was fixed at the time of reworking spark dumps to use parquet format.

Assignee:: amCap1712

Reporter:: Ishaan Shah

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2020-08-02 14:32

Updated:: 2022-02-26 19:00

Resolved:: 2022-02-26 19:00

Version	Package