Improve clustering performance

XMLWordPrintable

    • Type: Improvement
    • Resolution: Fixed
    • Priority: Normal
    • 2.7.0b3
    • Affects Version/s: None
    • Component/s: None
    • None

      The existing clustering code is using the Levenshtein distance to calculate similarity, which caused a O(n^2) performance. But since only exactly similar matches are being used (similarity threshold 1.0) this was is not strictly necessary.

      If we don't use a threshold (and Picard never did) then we can simplify the code.

            Assignee:
            Philipp Wolfer
            Reporter:
            Philipp Wolfer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Version Package
                2.7.0b3