++ed by:
COSIMO

1 PAUSE user(s)

Alfie John

NAME

MySQL::Replication - Decentralised, peer-to-peer, multi-master MySQL replication

DESCRIPTION

What is MySQL::Replication

MySQL::Replication is a replacement for MySQL's built-in replication. The reason for this module is that there are a number of issues with MySQL's built-in replication:

  • Multi-Master Replication

    By design, MySQL can only replicate from a single master. There are some approaches to achieving multi-master replication, but they have problems:

    • Ring Topologies Are Fragile

      Multi-master replication is possible by using a ring topology:

        +-----------------+       +-----------------+
        | db1.example.com | ----> | db2.example.com |
        +-----------------+       +-----------------+
                 ^                         |
                 |                         |
                 |                         v
        +-----------------+       +-----------------+
        | db4.example.com | <---- | db3.example.com |
        +-----------------+       +-----------------+

      Each master replicates its queries to the connected slave. The way this achieves multi-master replication is that along with locally generated queries, incoming replicated queries are also replicated out to the connected slave e.g:

      • db1 replicates its queries to db2

      • db2 replicates its queries to db3 (including queries from db1)

      • db3 replicates its queries to db4 (including queries from db1 and db2)

      • db4 replicates its queries to db1 (including queries from db1, db2 and db3)

      Queries replicate all the way around the ring until a master sees an incoming query that was generated locally. These are discarded to prevent queries infinitely replicating around the ring.

      The problem with ring topologies is that if a master dies, queries don't progress around the ring and so replication lag builds:

        +-----------------+       +-----------------+
        | db1.example.com | ----> | db2.example.com |
        +-----------------+       +-----------------+
                                           |
                                           |
                                           v
                                  +-----------------+
                                  | db3.example.com |
                                  +-----------------+

      We can fix a broken ring by bypassing the dead master:

        +-----------------+       +-----------------+
        | db1.example.com | ----> | db2.example.com |
        +-----------------+       +-----------------+
                 ^                         |
                 |                         |
                 |                         v
                 |                +-----------------+
                 +--------------- | db3.example.com |
                                  +-----------------+

      The problem here is that if there are queries still replicating around the ring from the dead master, the queries will not be filtered out and so an infinite replication loop occurs.

      Since MySQL::Replication clients do not binlog incoming queries, there is no risk of serving non-locally generated queries and thus no risk of infinite replication loops.

    • Time Slicing Wastes Time

      Multi-master replication is possible by time slicing:

        @t1:
        +-----------------+       +-----------------+       +-----------------+
        | db1.example.com | ----> | db2.example.com |       | db3.example.com |
        +-----------------+       +-----------------+       +-----------------+
      
        @t2:
        +-----------------+       +-----------------+       +-----------------+
        | db1.example.com |       | db2.example.com | <---- | db3.example.com |
        +-----------------+       +-----------------+       +-----------------+
      
        @t3:
        +-----------------+       +-----------------+       +-----------------+
        | db1.example.com | ----> | db2.example.com |       | db3.example.com |
        +-----------------+       +-----------------+       +-----------------+

      A timer is used to stop replicating from the currently connected master and to switch to another e.g.:

      • the timer is started

      • db2 replicates from db1 for a timeslice

      • the timer fires

      • db2 stops replicating from db1

      • db2 is reconfigured to replicate from db3

      • the timer is started

      • db2 replicates from db3 for a timeslice

      The problem with time slicing is that time is wasted:

      • The slave does no work when the currently connected master doesn't have anything to replicate

      • Replication latency builds up on the blocked masters while they wait for their share of the timer i.e. given N masters with a timeslice of t seconds each, the slave will be at least (N-1) * t seconds behind each master.

      Since MySQL::Replication achieves multi-master replication by running multiple instances of the client in parallel, there is no time slicing via a timer and thus no time wasted by time slicing.

  • Queries May Get Replayed After A Server Crash

    A slave's master position is recorded in the relay-log.info file however writes to the InnoDB tablespace and relay-log.info are not atomically synced to disk. If a slave dies and comes back online, files may be in an inconsistent state. If the InnoDB tablespace was flushed to disk before the crash but relay-log.info wasn't, the slave will restart replication from a stale position and so will replay queries.

    A workaround can be found at http://bugs.mysql.com/bug.php?id=40337 but note:

      "Although such solution has been proposed to reduce the probability of
      corrupted files due to a slave-crash, the performance penalty introduced by
      it has made the approach impractical for highly intensive workloads."

    MySQL::Replication clients store their server positions inside the InnoDB tablespace (i.e. the Replication.SourcePosition table by default). Since updates are done within the same transaction as replicated queries are executed in, writes are atomic. If a slave dies and comes back online, we will still be in a consistent state since either the transaction was committed or it will be rolled back.

  • Moving Slaves To Different Masters Is Hard

    A slave's master position is relative to the directly connected master's binlogs. Given a multi-layer replication topology e.g. a tree topology, a slave's master position is still relative to the directly connected master's binlogs and not relative to the root master's binlogs. If a master in a middle layer dies, moving its slaves to a different master is non-trivial since they will all need their master positions translated to the new master's binlogs.

    MySQL::Replication always deals with canonical binlog positions. In a multi-layer replication topology e.g. a tree topology, positions are always relative to the root server's binlogs. If a relay in a middle layer dies, moving its clients to a different relay is a simple configuration item change since no translation is needed.

How Does MySQL::Replication Work

A MySQL::Replication replication topology is made up of:

  • MySQL::Replication servers

    Servers provide an API to retrieve queries from its local binlogs. A client can request queries from arbitrary binlogs and file positions.

  • MySQL::Replication clients

    Clients retrieve queries from a server and applies them to the local database.

  • MySQL::Replication relays

    Relays provide a transparent caching query proxy for clients to connect to. Relays are useful for reducing data transfers between multiple data centers and reducing load on the servers.

MySQL::Replication Servers

A MySQL master runs a MySQL::Replication server, which serves queries from its local binlogs e.g.:

  db1.example.com:~$ MySQLReplicationServer.pl --binlog db1:/var/lib/mysql/binlogs/mysql-bin.index

The server running on db1.example.com will serve queries from the binlogs listed in mysql-bin.index.

See MySQLReplicationServer.pl for more information on servers.

MySQL::Replication Clients

A MySQL slave runs the MySQL::Replication client e.g.:

  db2.example.com:~$ MySQLReplicationClient.pl --srchost db1.example.com --srcbinlog db1

The client running on db2.example.com will:

  • Get the server position for db1.example.com from the local database

  • Connect to the server running on db1.example.com

  • Request queries, starting from its server position

  • Read the query response from the server

  • Execute the query on the local database

  • Update the server position in the local database

  • Wait for the next query response

  +-----------------+       +-----------------+
  | db1.example.com | ----> | db2.example.com |
  +-----------------+       +-----------------+

To replicate from multiple masters, run multiple instances of the client e.g.:

  db2.example.com:~$ MySQLReplicationClient.pl --srchost db1.example.com --srcbinlog db1
  db2.example.com:~$ MySQLReplicationClient.pl --srchost db3.example.com --srcbinlog db3

  +-----------------+       +-----------------+       +-----------------+
  | db1.example.com | ----> | db2.example.com | <---- | db3.example.com |
  +-----------------+       +-----------------+       +-----------------+

Note that there is no restriction on where the client and server run. e.g. having all databases replication to and from each other is possible:

  +-----------------+       +-----------------+       +-----------------+
  | db1.example.com | <---> | db2.example.com | <---> | db3.example.com |
  +-----------------+       +-----------------+       +-----------------+
           ^                                                   ^
           |                                                   |
           +---------------------------------------------------+

See MySQLReplicationClient.pl for more information on clients.

MySQL::Replication Relays

A MySQL::Replication relay acts as a proxy cache. In a multi-layer replication topology, middle layers run a MySQL::Replication relay e.g.:

  relay.example.com:~$ MySQLReplicationRelay.pl

The relay running on relay.example.com will:

  • Accept requests from connecting clients

  • If the relay can fulfill the request from its cache, it will serve them to the client

  • If the relay cannot fulfill the request from its cache, it will:

    • Connect directly to the server, or if specified, the next relay

    • Relay the request to the next layer

    • Read the query response

    • Cache the query response for future requests

    • Send the query response to the client

    • Wait for the next query response

  +-----------------+       +-------------------+       +-----------------+
  | db1.example.com | ----> | relay.example.com | ----> | db2.example.com |
  +-----------------+       +-------------------+       +-----------------+

By using relays:

  • Bandwidth is saved since multiple clients in one data center need only connect to the local relay, while the relay goes over the WAN to fulfill requests

  • Load is saved on the server since the number of connecting clients is reduced

Note that there is no restriction on the number of layers of relays e.g. a tree of relays is possible:

  +-----------------+
  | db2.example.com | <---------------+
  +-----------------+                 |
                                      |
  +-----------------+       +--------------------+       
  | db3.example.com | <---- | relay2.example.com | <----------------+ 
  +-----------------+       +--------------------+                  | 
                                                                    |
                                                         +--------------------+       +-----------------+
                                                         | relay1.example.com | <---- | db1.example.com |
                                                         +--------------------+       +-----------------+
                                                                    |
  +-----------------+       +--------------------+                  |
  | db4.example.com | <---- | relay3.example.com | <----------------+ 
  +-----------------+       +--------------------+       
                                      |
  +-----------------+                 |
  | db5.example.com | <---------------+
  +-----------------+ 

See MySQLReplicationRelay.pl for more information on relays.

FAQs

What Happens If There Is A Race Condition On A Record

e.g. An insert to the users table occurs on two seperate databases at the same time for the username 'alfie'. The problem here is that username is the primary key. Since the inserts happened at the same time, both inserts succeeded. It is only when they replicate will a primary key constraint fail. If this happens, replication will stop and manual intervention is necessary.

The only way to prevent this is to avoid the race in the first place:

  • Use an external arbiter to protect access to shared resources

    e.g. before inserting into the users table, each program performing the insert contacts the arbiter and request the inserting of 'alfie'. The first request is granted the insert while the others fail.

  • Shard your data so that race conditions cannot occur

    e.g. the users table is sharded based on the first letter of the usename. Inserts for 'alfie' only happen on the database with write access to the 'a' records.

  • Don't use AUTO_INCREMENT keys, use UUIDs instead

    Not useful in the users example, but for tables where AUTO_INCREMENT ids are used, switch to UUIDs to avoid clashes.

BUGS

  • The relay is still in development and have not been released yet

  • Communication over the wire is in plain text. Only use MySQL::Replication over a secure channel (e.g. stunnel, IPSec etc)

  • Row-based replication is not supported yet

  • LOAD DATA events are not supported yet

  • Filtering on queries, tables and schemas are not supported yet

SEE ALSO

AUTHOR

Alfie John, alfiej@opera.com

LICENSE AND COPYRIGHT

Copyright (c) 2011, Opera Software Australia Pty. Ltd. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  * Redistributions of source code must retain the above copyright notice,
    this list of conditions and the following disclaimer.
  * Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions and the following disclaimer in the documentation
    and/or other materials provided with the distribution.
  * Neither the name of the copyright holder nor the names of its contributors
    may be used to endorse or promote products derived from this software
    without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.