[AB-440] Make HL extractor reduce load on database - MetaBrainz Tickets

Type: Improvement
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

To be done after ~~AB-439~~

The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.

Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in ~~AB-439~~ it should be easier to keep track of the highest row id processed.

I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.

Assignee:: Unassigned

Reporter:: Alastair Porter

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2020-04-27 07:35

Updated:: 2020-04-27 08:32

Version	Package

Details

Description

Attachments

Activity

People

Dates

Packages