[AB-439] Improve multithreaded code and speed up HL extractor

Type: Improvement
Resolution: Fixed
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

We're finding that the HL extractor is struggling to keep up with the number of submissions that we're receiving

The HL extractor works well, but the amount of threading code needed to perform its job means that it's quite difficult to modify it if needed.

We can make the code easier to maintain by replacing the custom threading code with concurrent.futures

An additional improvement to speed up the extractor: The extractor binary can take multiple arguments, in order to process many files at a time. As the expensive part of the hl process is actually loading the models into place, we can speed up this process by just loading them once and processing a large number of files in a single process.

is duplicated by

AB-401 Process more than one lowlevel file with the highlevel extractor at a time

Closed

GitHub Bot added a comment - 2020-05-11 06:50

See code changes in pull request #378 submitted by alastair.

GitHub Bot added a comment - 2020-05-11 06:50 See code changes in pull request #378 submitted by alastair .

Assignee:: Unassigned

Reporter:: Alastair Porter

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2020-04-27 07:26

Updated:: 2022-12-29 20:41

Version	Package

Details

Description

Attachments

Issue Links

Activity

Collapse comment: GitHub Bot added a comment - 2020-05-11 06:50

Expand comment: GitHub Bot added a comment - 2020-05-11 06:50

People

Dates

Packages