Uploaded image for project: 'Picard'
  1. Picard
  2. PICARD-2339

Improve clustering performance

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Fixed
    • Icon: Normal Normal
    • 2.7.0b3
    • None
    • None
    • None

      The existing clustering code is using the Levenshtein distance to calculate similarity, which caused a O(n^2) performance. But since only exactly similar matches are being used (similarity threshold 1.0) this was is not strictly necessary.

      If we don't use a threshold (and Picard never did) then we can simplify the code.

            outsidecontext Philipp Wolfer
            outsidecontext Philipp Wolfer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Version Package
                2.7.0b3