-
Improvement
-
Resolution: Fixed
-
Normal
-
None
-
None
-
None
-
None
We're finding that the HL extractor is struggling to keep up with the number of submissions that we're receiving
The HL extractor works well, but the amount of threading code needed to perform its job means that it's quite difficult to modify it if needed.
We can make the code easier to maintain by replacing the custom threading code with concurrent.futures
An additional improvement to speed up the extractor: The extractor binary can take multiple arguments, in order to process many files at a time. As the expensive part of the hl process is actually loading the models into place, we can speed up this process by just loading them once and processing a large number of files in a single process.
- is duplicated by
-
AB-401 Process more than one lowlevel file with the highlevel extractor at a time
-
- Closed
-
See code changes in pull request #378 submitted by alastair.