-
Task
-
Resolution: Fixed
-
Normal
-
None
-
None
We should update the way that we import listens into spark. Some places for improvement are -
- Use a context manager for creating temporary directories locally, this way we don't have to worry about deleting the directories if an error occurs or after the import is done
- We don't create the '/temp' directory in HDFS explicitly, which might be confusing for readers who are new to the code.
- We should also use different variable names for directories inside HDFS and locally in order to reduce confusion.
This was fixed at the time of reworking spark dumps to use parquet format.