One of the wonderful things about the Cloud is all the tools and architectural options you have available. One of the most difficult things is choosing the tool or architecture to use. Azure provides out of the box alternatives to replicate data across data centers for SQL Azure, Storage Tables / Blobs and more recently DocumentDB, however, these come at a premium price and may not provide the efficiency you require.
All of the out of box Azure replication solutions consist of a single read/write data store (source of truth) with geographically redundant read-only data stores. This is a prudent approach from a data consistency standpoint but not the most efficient if you are attempting to post data from half way around the world. Another alternative is to roll your own replication using Service Bus as the communication vehicle to obtain near real time bidirectional data replication with active read/write data stores in each data center.
Service bus topics and subscriptions provide a rich and efficient mechanism to move data. Additionally, when bringing a new data center online it makes replication as simple as deploying your solution to the new data center, bulk copying your data and then adding a Subscription to the Replication Topics in the other data centers.
In a recent project I worked on there were two primary applications, a configuration tool and an end user application. The configuration tool was critical across all data centers but was managed by only a handful of individuals in a central location. These individuals were tasked with a specific area with very little potential for data conflicts. The configuration component was a good fit for our replication solution. The processing component was more complex. While some of the application required strong data concurrency for part of its functionality, it posed little risk of conflict as it collected new user responses. The decision was made to separate the contentious data from the well behaved non-conflict sensitive data and to replicate the non-conflict data using the same approach.
Like many Cloud applications a variety of data stores were utilized including SQL Azure, DocumentDB, Table and Blob Storage. Base components were created for each of these data sources through which all data flowed. For SQL Azure we created a layer on top of Entity Framework. For DocumentDB and Storage Tables and Blobs we used wrappers on top of .Net Azure components for DocumentDB and Storage. Upon successful write of data, these base components wrote a JSON serialization of the write transaction to a Service Bus Topic hosted within its own region without further processing.
Developers implemented data repositories to meet their requirements using the base data persistence components to handle data CRUD operations. These data components then wrote the required replication information to the Service Bus Topic without additional custom coding on their part. The diagram above follows the flow of data as it’s received in the data persistence component to the replicating data center’s data stores.
The data persistence components create a Service Bus Brokered Message for each data write containing a JSON serialized representation of the written data objects to the local data center along with meta data describing their source object type and data store destination (SQL, DocumentDB, Storage). It then sends the message to its local data center’s Service Bus “Replication” Topic.
We chose an Azure App Web Job to host our Replication Processors in each data center. This could have just as easily been created to run within an Azure Cloud Service. Our choice was based on how quickly it could start after a fresh deployment and the “cloudy” (pun intended) future of Azure Cloud Services.
Upon startup, each data center’s Replication Processor reads its configuration data to determine the data center in which it was running and pull meta data on the other data centers with potential to write data. It then Subscribed to the Replication Topic of each of these other data centers and voila, it began to receive and process data write messages from the other data centers.
WARNING: The solution outlined in this post is not a panacea to all of your data replication needs. It works best with data that is written once without contention, or written by one source in sequence. If data consistency is paramount (i.e. Nuclear Reactor Monitoring or Drug Dispensing) you should consider using a tried and extensively tested replication solution. This solution works great where data contention within your application is non-existent or very minimal. If speed is your #1 goal regardless of potential consistency problems be prepared to handle data contention.
How to use Service Bus topics and subscriptions
How to use Azure Service Bus with the WebJobs SDK
Message ordering on Windows Azure Service Bus Queues