Uploaded image for project: 'ListenBrainz'
  1. ListenBrainz
  2. LB-722

Restructure data in hdfs to allow easier updates

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Right now the data in hdfs is grouped by listened_at timestamps, this means that adding new data involves updating multiple parquet files. If we grouped this by inserted_at, then for each incremental dump, we'd just create a new parquet file and be done with the updation.

        Attachments

          Activity

            People

            Assignee:
            kartik1712 amCap1712
            Reporter:
            iliekcomputers Param Singh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Packages

                Version Package