Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-722

Restructure data in hdfs to allow easier updates

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None

      Right now the data in hdfs is grouped by listened_at timestamps, this means that adding new data involves updating multiple parquet files. If we grouped this by inserted_at, then for each incremental dump, we'd just create a new parquet file and be done with the updation.

            kartik1712 amCap1712
            iliekcomputers Param Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package