Real Time Replication in the Real World - Perforce ?· Real Time Replication in the Real World Richard…

Download Real Time Replication in the Real World - Perforce ?· Real Time Replication in the Real World Richard…

Post on 10-Mar-2019

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

<p>Doc Revision 1.1 </p> <p>Real Time Replication in the Real World </p> <p> Richard E. Baum </p> <p>and </p> <p>C. Thomas Tyler </p> <p>Perforce Software, Inc. </p> <p>Copyright 2010 Perforce Software i </p> <p>Table of Contents 1 Overview ........................................................................................................................ 1 2 Definitions ...................................................................................................................... 1 </p> <p>3 Solutions Involving Replication ..................................................................................... 2 3.1 High Availability Solutions ..................................................................................... 2 </p> <p>3.1.1 High Availability Thinking .............................................................................. 2 3.2 Disaster Recovery Solutions ................................................................................... 2 </p> <p>3.2.1 Disaster Recovery Thinking ............................................................................. 3 </p> <p>3.3 Read-only Replicas .................................................................................................. 3 </p> <p>3.3.1 Read-only Replica Thinking ............................................................................ 3 </p> <p>4 Perforce Tools that Support Metadata Replication ........................................................ 4 4.1 Journal Truncation ................................................................................................... 4 4.2 p4jrep (deprecated) .................................................................................................. 4 </p> <p>4.3 p4 replicate .................................................................................................... 4 </p> <p>4.3.1 Replication to Journal File ............................................................................... 5 </p> <p>4.3.2 Replication to live DB files .............................................................................. 5 4.3.3 Filtering Example ............................................................................................. 6 </p> <p>4.4 p4 export ........................................................................................................... 6 </p> <p>4.4.1 Report Generation from Scripts ....................................................................... 6 4.4.2 Report Generation from SQL Databases.......................................................... 6 </p> <p>4.4.3 Full Replication (Metadata and Archive Files) ................................................ 6 </p> <p>4.4.4 Daily Security Reports ..................................................................................... 6 4.5 Built-in Replication Tools - Summary .................................................................... 7 </p> <p>5 Tools that Support Archive File Replication ................................................................. 7 </p> <p>5.1 Rsync / Robocopy ................................................................................................... 7 5.2 Filesystem or Block-Level Replication ................................................................... 7 </p> <p>5.3 Perfect Consistency vs. Minimum Data Loss ......................................................... 7 6 Putting it All Together ................................................................................................... 8 </p> <p>6.1 Classic DR: Truncation + Commercial WAN Replicator ...................................... 8 </p> <p>6.2 The SDP Solution Using p4 replicate ........................................................... 9 </p> <p>6.2.1 Failover .......................................................................................................... 10 </p> <p>6.3 A Read-only Replica for Continuous Integration .................................................. 10 </p> <p>6.3.1 Define how Users Will Connect to the Replica. ............................................ 10 6.3.2 Use Filtered Replication ................................................................................. 10 6.3.3 Make Archive Files Accessible (read-only) to the Replica ........................... 11 </p> <p>7 Operational Issues with Replication............................................................................. 11 7.1 Obliterates: Replication vs. Classic Journal Truncation ...................................... 12 </p> <p>8 Summary ...................................................................................................................... 12 9 References &amp; Resources .............................................................................................. 13 </p> <p>Copyright 2010 Perforce Software Page 1</p> <p>1 Overview There are myriad ways to configure a Perforce environment to allow for multiple, </p> <p>replicated servers. Configurations are chosen for a wide variety of reasons. Some provide </p> <p>high availability solutions. Some provide disaster recovery. Some provide read-only </p> <p>replicas that take workload off of a main server. There are also combinations of these. </p> <p>What you should deploy depends on your desired business goals, the availability of </p> <p>hardware and network infrastructure, and other a number of factors. </p> <p>The 2009.2 release of Perforce Server provides built-in tools to allow for near-real-time </p> <p>replication of metadata. These tools allow for much easier implementation of both read-</p> <p>only replica servers and high-availability/disaster-recovery solutions. This paper discusses </p> <p>some of the most common replication configurations, the ways they are supported by the </p> <p>Perforce Server, and characteristics of each. </p> <p>2 Definitions A number of terms are used throughout this document. They are defined as follows: </p> <p> HA High Availability A system design protocol and associated implementation that ensures a certain degree of operational continuity during a given measurement </p> <p>period, even in the event of certain failures of hardware or software components. </p> <p> DR Disaster Recovery The process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization </p> <p>after a natural or human-induced disaster. </p> <p> RPO Recovery Point Objective Describes an acceptable amount of data loss measured in time. </p> <p> RTO Recovery Time Objective The duration of time and a service level within which service must be restored after a disaster. </p> <p> Metadata Data contained in the Perforce database files (db.* files in the P4ROOT). </p> <p> Archive Files All revisions of all files submitted to the Perforce Server and currently shelved files. </p> <p> Read-only Replica A Perforce Server instance that operates using a copy of the Perforce metadata. </p> <p>Copyright 2010 Perforce Software Page 2</p> <p> DRBD (Distributed Replicated Block Device) a distributed storage system available as of the 2.6.33 kernel of Linux. DRBD runs over a network and works </p> <p>very much like RAID 1. </p> <p>3 Solutions Involving Replication </p> <p>3.1 High Availability Solutions High Availability solutions keep Perforce servers available to users, despite failures of </p> <p>hardware components. HA solutions are typically deployed in environments where there is </p> <p>little tolerance for unplanned downtime. Large sites with 24x7 uptime requirements due to </p> <p>globally distributed development environments strive for HA. </p> <p>Perforce has excellent built-in journaling capabilities. It is fairly easy to implement a </p> <p>solution that is tolerant to faults, prevents data loss in any single point of failure situation, </p> <p>and that limits data loss in more significant failures. With HA, prevention of data loss for </p> <p>any single point of failure is assumed to be accounted for, and the focus is on limiting </p> <p>downtime. </p> <p>HA solutions generally offer a short RTO and low RPO. They offer the fastest recovery </p> <p>from a strict hardware failure scenario. They also cost more, as they involve additional </p> <p>standby hardware at the same location as the main server. Full metadata replication is also </p> <p>used, and the backup server is typically located on the same local LAN as the main server. </p> <p>This results in good performance of replication processes. Due to the proximity of the </p> <p>primary and backup servers, these solutions offer do not offer much in the way of disaster </p> <p>recovery. A site-wide disaster or regional problem (earthquake, hurricane, etc.) can result </p> <p>in a total outage. </p> <p>3.1.1 High Availability Thinking </p> <p>The following are sample thoughts that lead to the deployment of HA solutions: </p> <p> Were willing to invest in a more sophisticated deployment architecture to reduce unplanned downtime. </p> <p> We will not accept data loss for any Single Point of Failure (SPOF). </p> <p> Downtime is extremely expensive for us. We are willing to spend a lot to reduce the likelihood of downtime, and minimize it when it is unavoidable. </p> <p>3.2 Disaster Recovery Solutions In order to offer a true disaster recovery solution, a secondary server needs to be located in </p> <p>a site that is physically separate from the main server. Full metadata replication provides a </p> <p>reliable failover server in a geographically separate area. Thus, if one site becomes </p> <p>unavailable due to a natural disaster, another can take its place. As WAN connections are </p> <p>Copyright 2010 Perforce Software Page 3</p> <p>often considerably slower than local area network connections, these solutions tend to have </p> <p>a higher RTO and a longer RPO than HA solutions. </p> <p>Near real-time replication over the WAN is possible in some environments. Solutions that </p> <p>achieve this sometimes rely on commercial WAN replication to handle archive files, and </p> <p>p4 replicate to keep metadata up to date. </p> <p>3.2.1 Disaster Recovery Thinking </p> <p>The following are sample thoughts that lead to the deployment of DR solutions: </p> <p> Were willing to invest in a more sophisticated deployment architecture to ensure business continuity in event of a disaster. </p> <p> We need to ensure access to our intellectual property, even in the event of a sudden and total loss of one of our data centers. </p> <p>3.3 Read-only Replicas Read-only replicas are generally used to offload processing from live production servers. </p> <p>Tasks run against read-only replicas will not block read/write users from accessing the live </p> <p>production Perforce instance. One common use for this is with automated build farms, </p> <p>where large sync operations could otherwise cause users to have to wait to submit. </p> <p>Read-only servers are often created from a subset of depot metadata. They usually do not </p> <p>need db.have data, for example. That database tells the server which client workspaces </p> <p>contain which files information not needed for automated builds. </p> <p>Building a read-only replica typically involves using shared storage (typically a SAN) for </p> <p>archive files, so that the archive files written on the primary server are mounted read-only </p> <p>at the same location on the replica. In some cases, read-only replicas are run on the same </p> <p>server machine as the primary server. In that configuration, the replicas are run under a </p> <p>different login that does not have access to write to the archive files, ensuring they remain </p> <p>read-only. </p> <p>To obscure the details of the deployment architecture from users and keep things simple </p> <p>for them, p4broker can be used. When p4broker is used, humans and automated build </p> <p>processes set their P4PORT value to that of the broker rather than the real server. </p> <p>The broker implements heuristics to determine which requests need to be handled by the </p> <p>primary server, and which can be sent to the replica. For example, the broker might </p> <p>forward all sync command from the builder user to the replica, while the submit at the </p> <p>end of a successful build would go to the primary server. </p> <p>3.3.1 Read-only Replica Thinking </p> <p>The following are sample thoughts that lead to the deployment of read-only replica </p> <p>solutions: </p> <p>Copyright 2010 Perforce Software Page 4</p> <p> We have automation that interacts with Perforce, such as continuous integration build systems or reports, that impact performance on our primary server. </p> <p> We are willing to invest in a more sophisticated deployment architecture to improve performance and increase our scalability. </p> <p>4 Perforce Tools that Support Metadata Replication In any replication scheme, both the depot metadata and the archive files must be addressed. </p> <p>Perforce supports a number of different ways to replicate depot metadata. Each has </p> <p>different uses. Some are more suited to certain types of operations. </p> <p>4.1 Journal Truncation The classic way to replicate Perforce depot metadata is to truncate the running journal file, </p> <p>which maintains a log of all metadata transactions, and ship it to the remote server where it </p> <p>is replayed. This has several advantages. It results in the remote metadata being updated to </p> <p>a known point in time. It also allows the primary server to continue to run without any </p> <p>interruption. The truncated journal file can copied via a variety of methods, including </p> <p>rsync/Robocopy, FTP, block-level replication software, etc. Automating such tasks is not </p> <p>generally very difficult. For systems where a low RPO is needed, however, the number of </p> <p>journal file fragments shipped over may make this approach impractical. Truncating the </p> <p>journal every few minutes can result in a huge number of journal files, and confusion in the </p> <p>event of a system problem. Extending the time between journal truncations, on the other </p> <p>hand, causes the servers to become out of sync for greater periods of time. This solution, </p> <p>therefore, tends to be more of a DR one rather than a HA one. </p> <p>4.2 p4jrep (deprecated) </p> <p>An interim solution to the problem of repeated journal truncation was p4jrep. This utility, </p> <p>available from the Perforce public depot, provided a way to move journal data between </p> <p>servers in real time. Journal records were read and sent through a pipe from one server to </p> <p>another. This required some amount of synchronization, and servers needed to be </p> <p>connected via a stable network connection. It was not available for Windows. </p> <p>4.3 p4 replicate </p> <p>While p4jrep demonstrated that near real-time replication was possible, it also showed </p> <p>the need for a built-in solution. Starting with the 2009.2 version of Perforce Server, the </p> <p>p4 replicate command provides the same basic functionality. It works on all server </p> <p>platforms, and is a fully supported component of Perforce. </p> <p>p4 replicate is also much more robust than its predecessor. Transaction markers </p> <p>within the journal file are used to ensure transactional integrity of the replayed data, and </p> <p>the data is passed from the server by Perforce and not some outside process. Additionally, </p> <p>as the command runs...</p>

Recommended

View more >