Uploaded image for project: 'MusicBrainz Server'
  1. MusicBrainz Server
  2. MBS-8695

Partition edit table by year (or id)

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Icon: Normal Normal
    • None
    • None
    • Edit system
    • None

      http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

      Quoting Ian from MBS-8294:

      One option here as far as the temporal splitting would be to actually partition the edit table - that is, have 'edit' be an empty table itself with descendant 'edit_2015', 'edit_2014', etc. (or some other sort of split) tables. Then splitting by year would actually be separate tables. This could also potentially have a positive impact on query times, since statistics would then be kept separately for each of these tables and it will query them in parallel with different statistics to lend to the task. For example, if an editor was very active in 2015 but wasn't around much before that, the system could know to use a sequential scan or such for the edit_2015 table where the editor appears a lot, but an index scan or even not needing to look at all at others where the editor is less prevalent. Perhaps the most obvious example of this is that an editor obviously won't appear in any edits before they joined. So if you're searching for, say, edits by me, this system would hopefully know it doesn't need to look at all at anything before edit_2007!

      I think this would help a lot, since most of the edit listings are sorted by date. The query planner would presumably be smart enough to look at the partitions in order and stop when it finds a given number of rows. Or we'd be able to change the way we query data at the application level in order to get faster results.

      Partitioning by ID would make the distribution of edits more uniform, but it's unclear to me if the planner would use knowledge of the min/max open_time in each partition to decide which to query. If it didn't, perhaps we could add four constraints to each partition: min/max id, and min/max open_time. It should be fine if one of those sets of constraints overlaps with another partition, as long as the sum of them is mutually exclusive.

      A downside is that we'd have to partition every table that has an FK to edit.id, since FKs aren't propagated at all. But that may be a good idea at this point, and if these tables are created/generated automatically, it may not be a huge deal. With the master tables, the application won't have to care about these sub-tables except where it wants to exploit knowledge of them using WHERE conditions.

            Unassigned Unassigned
            bitmap Michael Wiencek
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved:

                Version Package