Uploaded image for project: 'MetaBrainz Hosting'
  1. MetaBrainz Hosting
  2. MBH-565

Recovering prod databases after a disaster

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None

      What happens if our prod database dies dramatically (let's say all of Hetzner explodes), how do we recover?

      LB: Some databases are very large and don't have regular enough dumps (for example the LB timescale) Do we want a complete DB-level backup of all of LB and does that require a new separate failover server?
      In MusicBrainz we use barman for incremental DB backups; could this be used to allow us to manage clusters and to be able to back up the ListenBrainz database as well?
      If there's enough space on aretha then we should maybe just go for it.
      (however, this is not sure, LB databases are huge and it 's not necessarily easy to backup only certain schemas)

      Should we just do that for all the postgres databases and just give ourselves the resources to achieve that?

      • @zas and @bitmap should start investigating how to set that up and configure it, as well as contacting each project's lead to get the necessary information.
      • Additionally Zas should see if we have enough disk space (on aretha and OfficeBrainz backups) to store the entire LB listen table (~400gb?)
        We have multiple schemas in listenbrainz timescale - can barman back up only some of them?

      AB: When @alastairp finishes migration of AB to files, set up AB database in barman, and make sure that we have a separate backup for data files

      These should be a great start for a general disaster recovery policy.
      Our code being all in git, and DB backups at the office separate from where our servers are, if a meteorite crashed in Germany on Hetzner, we would have to start servers from scratch but should have all the pieces to do so.

      Projects that have files should make sure that they are being backed up with borg.

      We should also have some form of document with the procedure to follow in case of disaster and how to go about the recovery. Perhaps we could work on that during our documentation sprint in January (@zaphodbeeblebrox will remind you)

      BB: needs a contingency plan

      CB: needs a contingency plan.

            Unassigned Unassigned
            ApekattQuest, MonkeyPython MonkeyPython
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                Version Package