Uploaded image for project: 'Zapped: AcousticBrainz'
  1. Zapped: AcousticBrainz
  2. AB-357

Evaluate both MB database access methods to decide which one is more efficient

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None

      Right now, in AB we have implemented 2 methods of MB database access viz.

      1. Direct connection of AB db to MB.

      2. Importing the MB data in Ab db and keeping it updated.

      Now, It is important to evaluate the two methods, testing them on the basis of speed and performance and then to decide the method that we would use for a particular function implementation in AB that uses MB data.

          [AB-357] Evaluate both MB database access methods to decide which one is more efficient

          GitHub Bot added a comment -

          See code changes in pull request #292 submitted by rsh7.

          GitHub Bot added a comment - See code changes in pull request #292 submitted by rsh7 .

          Paul Taylor added a comment -

          HI, no I do understand i was suggesting your discarded option (option 1).

          It was only an idea but I dont see that the storage considerations are such an issue since disk space is cheap. Load may well be an an issue if both main MusicBrainz site and AcousticBrainz site are talking to same database, but technically wouldn't it be quite easy to just have a another copy of the database (  a mirror that AB uses, alot easier than propsal 3.

          Paul Taylor added a comment - HI, no I do understand i was suggesting your discarded option (option 1). It was only an idea but I dont see that the storage considerations are such an issue since disk space is cheap. Load may well be an an issue if both main MusicBrainz site and AcousticBrainz site are talking to same database, but technically wouldn't it be quite easy to just have a another copy of the database (  a mirror that AB uses, alot easier than propsal 3.

          I'm not sure you understood Rashi's comment - it sounds like you're both taking about the same thing.

          Your comment is a little unclear - I'm not sure if you mean one database instance which includes both the production MB and AB databases, or have the AB database include a copy of the MB database? I'm not sure if for replication you're referring to having a replica of the MB database on the AB server, or performing replication of the AB database in the same way that MB does it, or combining MB+AB and sending replica packets with data from both products to all MB customers?

          To summarise, we are considering 3 ways of combining the AB/MB data in order to allow us to use up-to-date MB data on the AB website (any other combination of the data is out of the scope of this project):

          1. Host both MB and AB on the same physical database server - discarded because of load/storage considerations
          2. Connect directly to an MB database from the AB app (either a replica database on the AB server, or a master MB mirror)
          3. Replicate parts of the MB database (which relate to recordings in AB) directly to a separate schema inside the AB database.

          Rashi's summer of code project has been to implement 3). This ticket is to do an evaluation of common queries that we need to run in AB using both 2) and 3) to see which one is the fastest (We expect that 3. is faster, but it requires much more infrastructure than 2., so there is a tradeoff that we want to compare)

          Alastair Porter added a comment - I'm not sure you understood Rashi's comment - it sounds like you're both taking about the same thing. Your comment is a little unclear - I'm not sure if you mean one database instance which includes both the production MB and AB databases, or have the AB database include a copy of the MB database? I'm not sure if for replication you're referring to having a replica of the MB database on the AB server, or performing replication of the AB database in the same way that MB does it, or combining MB+AB and sending replica packets with data from both products to all MB customers? To summarise, we are considering 3 ways of combining the AB/MB data in order to allow us to use up-to-date MB data on the AB website  (any other combination of the data is out of the scope of this project): Host both MB and AB on the same physical database server - discarded because of load/storage considerations Connect directly to an MB database from the AB app (either a replica database on the AB server, or a master MB mirror) Replicate parts of the MB database (which relate to recordings in AB) directly to a separate schema inside the AB database. Rashi's summer of code project has been to implement 3). This ticket is to do an evaluation of common queries that we need to run in AB using both 2) and 3) to see which one is the fastest (We expect that 3. is faster, but it requires much more infrastructure than 2., so there is a tradeoff that we want to compare)

          Paul Taylor added a comment -

          I meant just have one database thats contains MusicBrainz and AB, and possibly replicate it. But you have two databases , a MusicBrainz one and a Musicbrainz + AB one.

          Paul Taylor added a comment - I meant just have one database thats contains MusicBrainz and AB, and possibly replicate it. But you have two databases , a MusicBrainz one and a Musicbrainz + AB one.

          Rashi Sah added a comment - - edited

          iiuc, that's what I have implemented for the 2nd method discussed above. We are importing the MB data from the actual MB database and inserting the data in AcousticBrainz database (so, single database) https://github.com/metabrainz/acousticbrainz-server/pull/278.
          And, We are updating the musicbrainz schema in AB repeatedly after intervals.

          Rashi Sah added a comment - - edited iiuc, that's what I have implemented for the 2nd method discussed above. We are importing the MB data from the actual MB database and inserting the data in AcousticBrainz database (so, single database) https://github.com/metabrainz/acousticbrainz-server/pull/278 . And, We are updating the musicbrainz schema in AB repeatedly after intervals.

          Paul Taylor added a comment -

          What about just using a single database that has MB and AB data, or can you  use database replication so that AB is just added to a replicated copy of MB

          Paul Taylor added a comment - What about just using a single database that has MB and AB data, or can you  use database replication so that AB is just added to a replicated copy of MB

            rsh Rashi Sah
            rsh Rashi Sah
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:

                Version Package