-
Improvement
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
To be done after AB-439
The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.
Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in AB-439 it should be easier to keep track of the highest row id processed.
I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.
Make HL extractor reduce load on database
-
Improvement
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
None
To be done after AB-439
The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.
Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in AB-439 it should be easier to keep track of the highest row id processed.
I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.