Uploaded image for project: 'Zapped: AcousticBrainz'
  1. Zapped: AcousticBrainz
  2. AB-440

Make HL extractor reduce load on database

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None

      To be done after AB-439

      The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.

      Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in AB-439 it should be easier to keep track of the highest row id processed.

      I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.

            Unassigned Unassigned
            alastairp Alastair Porter
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:

                Version Package