• Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None

      To be done after AB-439

      The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.

      Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in AB-439 it should be easier to keep track of the highest row id processed.

      I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.

          Loading...

            • Icon: Improvement Improvement
            • Resolution: Unresolved
            • Icon: Normal Normal
            • None
            • None
            • None
            • None

              To be done after AB-439

              The HL extractor performs an operation to get 100 unprocessed submissions, and if there are none, it sleeps for 30 seconds.

              Because this initial query joins against 3 tables (ll, ll_json, hl) over all rows, we thought that this might take quite a bit of computation in postgres. We started to work on an improvement by choosing only rows higher than most recently computed row_id, but we had some concerns about effectively choosing this id while also threading. With the changes in AB-439 it should be easier to keep track of the highest row id processed.

              I also noticed that we only sleep for 30 seconds if we pick up 0 rows. That is, if we pick up 100, and then pick up 7, once these 7 have finished we will try to pick up another batch, which could have another 15 rows. This means that when we have a slow constant stream of submissions we actually perform this query more often than if we had a backlog. If we pick up some rows and there are less than 100, we should sleep again after this batch.

                    Unassigned Unassigned
                    alastairp Alastair Porter
                    Votes:
                    0 Vote for this issue
                    Watchers:
                    1 Start watching this issue

                      Created:
                      Updated:

                        Version Package

                          Unassigned Unassigned
                          alastairp Alastair Porter
                          Votes:
                          0 Vote for this issue
                          Watchers:
                          1 Start watching this issue

                            Created:
                            Updated:

                              Version Package