migrating to git: rethinking the commit

Download Migrating to Git:  Rethinking the Commit

If you can't read please download the document

Upload: kim-moir

Post on 16-Apr-2017

5.552 views

Category:

Technology


0 download

TRANSCRIPT

Slide 1

Migrating to Git:Rethinking the Commit

@Kim_MoirIBM Ottawa

Hi Im Kim Moir and Im a release engineer for the the Eclipse and RT Equinox projects. Since last June, our team has been working on migrating our ten year old CVS repository to Git. Im going to talk about the process that we used to migrate, how our development processes changes to accommodate it, the challenges we faced and advice for other teams that are migrating. Along the way, I'm going to include some quotes from other committers with their thoughts on our git migration.

Git Happens

@Kim_MoirIBM Ottawa

In honour of the fact that our Git migration is almost complete, a more appropriate name for my talk might be Git happens.

Questions for AudienceShow of hands, how many of you use Git on a daily basis? How many use CVS or SVN?

Why Git?

The Eclipse foundation is in the process of phasing out support for CVS (December 2012) to reduce support costs. In theory, the summer months are our lowest activity period in terms of development due to the release we have every year in June. However the summer of 2011 was really busy for us as we began to plan our migration to Git. We wanted to minimize the disruption to the team, so we wanted to migrate as many projects as possible before fall and people returned from vacation. As well, we wanted to be able to migrate the bulk of the components before Indigo SR1 shipped at the end of September.

About Us

10 year old code repository

Around 40 active committers

Produce approximately 600 bundles a build

Limited Git experience before migration

300 bundles, 62 features, 84 fragmentsJDT, PDE, Platform and Equinox projects-Some committers had exposure to Git - Orion and OSGi Alliance experience-16 GB in Eclipse repo, 8 GB in Equinox repo

Active development streams: 4.2, 4.1.x, 3.8, 3.7.x

One of the first discussions as we planned for our migration: what would the granularity of our Git repositories be? The Eclipse project has several subprojects: Platform, JDT, PDE and Equinox. Our commit rights are quite specific. If you are a committer on jdt.core, this doesnt mean that you have rights on jdt.ui. With CVS, you can just check out the bundles you want into your workspace, you dont have to clone the entire repository to your machine. Thus we wanted to ensure that our repositories werent too big so that a contributor wasnt synchronizing a large repo to their machine with a lot of content that they would never use. How should our Git repositories be organized? We had a discussion with the PMC and decided that repositories should be organized by Unix group id.

Unix group (guid) -> Git repo

2 CVS repos -> ~25 Git repos

With CVS, we had two repositories, /cvsroot/eclipse and /cvsroot/rt/.

There is also currently a limitation in Git where you cannot assign multiple ACLs to the same repo. In order to preserve our project structure, we needed to have a repo for each Unix group. We couldn't have built larger repos without reorganizing our project structure and commit rights.

That being said, we would recommend minimizing the number of repos you create as working with multiple Git repositories can be painful.

Due our CVS repository size and our desire to preserve our history, we decided that this would be a gradual migration over several months instead of a migration over a few days. We ran test migrations on a component basis and letting the owners them look at them and determine if there were issues. Many teams took the opportunity to reorganize their repos into a more organized fashion, for instance separating features, bundles and test bundles into separate directories.

The Platform UI team were the first team to migrate to Git (July). Paul Webster spent about a month testing the Git migration of the platform UI bundles and writing scripts to assist in with the process.

One of the issues that we ran into is that, when you tag or branch a repo in Git, the entire repo is tagged or branched. You cant tag or branch a single project. In an effort to be good Eclipse citizens, during a release cycle, we only tag bundles that have changed. Thus only new bundles get downloaded as needed. For this reason, when first ran the migration tool on our CVS repos, a maintenance branch would only include bundles that had been branched for that release, and all the bundles that were not branched would be missing. Not good! To fix this, Paul wrote some scripts to precondition the repositories so that maintenance branches would include all the bundles in that release.

I know that other projects didnt have this problem, for instance CDT tags their bundles every time.

Another issue that we looked at during testing was that we had some rather large test repositories due to our binary files. Some background: Our build just compiles Java code. The SWT and Equinox Launcher teams have C code that must be compiled on native hardware for the 13 platforms we support and stored in the repository in binary form. Thus our initial test Git repositories were bloated with binaries, many of which had tags associated with old builds that we werent going to ever build again. Thus, we decided to 1) Have binary only repositories for these projects2) Clean the binary repositories of non-release binaries to reduce their size. (Run a git-filter branch operation to remove binaries)3) Update build scripts to fetch artifacts from binary repos

CVS:

Condition

Copy to temporary location

Run cvs2git

-Conditioned the repos back to 3.0 release

git fast-import CVS content into temporary Git repository

git-move-refs

git prune, repack, gc and repack

Run a script to remove delete only tags

git-move-refs = removes unneeded fix up branches after the conversion

Challenges during migration: Massaging tags that didnt meet git standards. For instance some JDT committers had tags with * in them. Applied regexp foo to modify them.Long running git filter branch operations - From 20 minutes to 16 hours. Eclipse webmasters created a local partition for me on the filesystem to avoid NFS timeout issues on the shared Eclipse filesystem. Otherwise git filter branch operations would timeout after a few hours due to stale NFS file handles.

Add gitignore file via git filter-branch

Clone into bare repo

git prune, repack, gc and repack

copy repo into final location

How long did the migration take?It depends on the size of the repo and the history associated with it. JDT Core 24 hours. 8 hours for filter branch.Time is correlated with repository size and history.Also, since we ran the migrations twice (1) test (2) real

the migrations took a long time in both machine and people time

Its one thing to migrate to Git, its another thing to use it-Olivier Thomann

Our committers had a number of problems when first using Git. If you delete a project from your workspace, its easy to push that change to the master repository as an delete by mistake. In addition, since we work in multiple branches, we have had cases where people switch to one branch for one bundle and inadvertently commit code to another bundle to the wrong stream. While switching streams, committers also inadvertently deleted changes in their local workspace.

Git: The command line is where its at-Bogdan Gheorge

Our developers experienced quite a learning curve when switching to Git. For many it was a surprise that they couldnt do everything in EGit like they had done in the CVS tooling. Several people reverted to using the command line or gitk. Which they found ironic because we are in the tooling business. So reverting to command line operations to manage your code contributions seemed like a step backward.

Another challenge was that the switch in focus to branches as opposed to patches. Traditionally, many teams created patches for every change and attached them to bugzillas that document the change. However, with Git, instead of creating patches, you would commit, and then add a link to the change in Bugzilla. So we had to adjust our mindset of commit to branch, instead of making a patch.

Patches are like dumping a database into a text file. You need to think in terms of releasing fixes to branches instead of passing around patches. Patches also lose some Git provenance information such as author and parent.-John Arthorne

Branches > Patches

Letting go of the patch mentality was a hurdle for many people. Several teams submit every change as a patch to bugzilla, and have done so for years. New committers were traditionally taught to write and refine patches as part of the process to become a committer. So it felt unnatural to commit changes in local branches.

That operation is four pay-grade levels above my current git-foo.-Paul Webster

I missed a bundle during one of the migrations and spent a day trying to integrate CVS content into a git repo while preserving history. I tried git-stitch and git-merge but to no avail, the history didnt look right. In the end, I ended up rerunning the CVS migration because it was too much work to fix all the tags to look right.

I have broken the platform-ui git repository.-Bug 361707

Better policy to guard against deleting all branches and tags from our public repos-Bug 362076

Friday afternoon, the 21st of October, before milestone week.Brian de Alwis was using bzr-git client. He pushed some changes to master branch and it wiped all but two of the active branches in the repo. It also triggered a gc which cleaned up the recently deleted branches.Initially, other committers tried to push back the changes but were not allowed to because of server side commit hooks.

They was then a mad scramble to find a committer with the latest copy of the repo that could be restored to eclipse.org. Paul found one his home machine.

comment 37

I'll just add the final fix.

We took a cloned repo that was up to date from Thursday and pulled Friday's 7commits into R4_development and R3_6_maintenance only. Denis disabled thecommit hooks. Then we pushed all tags and pushedrefs/remotes/origin/*:refs/heads/*

Pushing the refs also pushed back the GCed commits.

We should get that restored repo from the ISP and compare it with the publicrepo now, to confirm we've completely restored the repo.

PW

We recently ran into a problem where a push inadvertently removed most of the branches and tags from our public repo, eclipse.platform.ui.git and GCed the orphaned commits, leaving us in a bad state. This was done through normal git operations, and can be easily replicated from the command line or a little script.We'd like to discuss ways of preventing or limiting the damage to our public repos from this kind of situation in the future. Please adds your comments or insights to https://bugs.eclipse.org/bugs/show_bug.cgi?id=362076

Easier to branchRolling back a commit is easierSeeing the Eclipse project move to Git made Wayne Beaton happy. If Wayne is happy, everyone is happy.Cool graphs on GitHub. The EGit team lots of feedback. Bugzilla feedback is love. For instance, Dani and Markus opened over 110 EGit bugs.

Fork you is now a valid bugzilla resolution.

-We build with a mixture of PDE, p2 and Ant, as well as the Eclipse compiler.-In order to build against Git repositories we added the EGit fetch factory bundle to the subset of bundles that we use to build Eclipse.-Modified our map files to point to Git repositories-builder changes - fetch maps from Git repos, compare tags, create tag for build Id-changes to build scripts so binaries are fetched from the appropriate repos-Ran several test builds. Surprising low on the release engineering pain point scale.-backport changes to all four active developmen streams

The migration has also made us rethink our development and build processes. Today, we usually build from tags. Everyone releases to a branch and tags their contribution to the build. But with Git, you should be thinking of terms of branches.

For instance, we will be moving to a git flow model where our usual development occurs in a one develop branch and we merge changes into the master branch for the build. We will also change the builder to tag the branch automatically.

There are three categories of costs that we incurred during the Git migration:The migration process itself, the developer learning curve and dealing with EGit issues-Mike Wilson

The Git migration consumed a lot of time for us. However, if you look at it from an accounting perspective, its a sunk cost. Every year, we make a plan of major items for the release. Migrating to Git was a major item for us which meant that other items had to be deferred

Read

Test migration

Communicate

Advice for other projects contemplating their Git migration

Relax: You dont have as many bundles or as much history as we do. It wont be so painful or costly for you. And it wont take months. Unless youre WTP. Then it might take a while.Run test migrations and builds first before the actual migration date and get feedback from your community to see if you need to modify your strategy for the actual migration.

[email protected] is helpful for questions related to git migration. Other projects have been very helpful.

Paul Webster wrote a document Git workflows for CVS users which has been very useful. Inevitably when people have their repositories migrated to Git they have similar questions so its good to have the answers in a document you can point them to.-Minimize the number of repos you create. We have too many repos and cloning so many repos is not the most efficient way to work.

The complexity of our code is the barrier to contribution, not the SCM.-Paul Webster

The benefits from the Git migration are not yet realized. Proponents of distributed version control systems suggest that it makes it easier to fork and contribute.

I recently watched a talk by David Eaves, who has been helping out open source and open data communities prepare metrics on bug fix rate, how long patches wait and so on. Anyways, one of his points during his talk was that people think that open source is all about collaboration and working together. But really, it if we empower people to go off and work on a problem by themselves without having to interact with someone because of the reduced transaction costs, this is a huge benefit.

It would be an interesting academic study to analyze contributions to Eclipse projects using SVN and CVS, and contributions after the same projects convert to Git and see if there is a statistically significant increase in contributions. The most important thing is that you want to reduce the barriers to contribution in your community.

Questions?

Git happens-Kim Moir

Useful links

Managing large binary files with Git http://stackoverflow.com/questions/540535/managing-large-binary-files-with-git

Git workflows for CVS users http://wiki.eclipse.org/Platform-releng/Git_Workflows

[email protected] mailing list

EGit Documentation http://eclipse.org/egit/documentation/

Git parable http://tom.preston-werner.com/2009/05/19/the-git-parable.html

Pro Git Book http://progit.org/

Think like a git http://think-like-a-git.net/

Migration scripts we used git://git.eclipse.org/e4/org.eclipse.migration.git

Git flow discussion http://dev.eclipse.org/mhonarc/lists/eclipse-dev/msg09229.html

Image credits

Beach http://www.flickr.com/photos/archer10/2218592521/sizes/

Reading room in British Museum http://www.flickr.com/photos/cliveabrown/2258478542/

Blackbirds at dusk http://www.flickr.com/photos/moonjazz/1216783552/

Binary bridge at Georgia Tech http://www.flickr.com/photos/mcclanahoochie/5068845349/sizes/l/in/photostream/

Branches http://www.flickr.com/photos/e_phots/3012896283/

Git stash http://www.flickr.com/photos/dealingwith/4295488113/

Fork you http://www.flickr.com/photos/sunfox/4365495446/sizes/l/in/photostream/

Rethink http://www.flickr.com/photos/venegas/5549123/

Gears http://www.flickr.com/photos/wwarby/4782904694

Git gas station http://www.flickr.com/photos/soo/6047596987/

Advice http://www.flickr.com/photos/wurzle/659315/

Fraser river delta http://www.flickr.com/photos/ecstaticist/3055718118/

Women with laptops at Pop Life http://www.flickr.com/photos/yourdon/5157431891/in/photostream/

Legal notice

Copyright IBM Corp., 2007-2011. All rights reserved. This presentation and the source code in it are made available under the Creative Commons Att. Nc Nd 3.0 license.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Eclipse and the Eclipse logo are trademarks of Eclipse Foundation, Inc.

IBM and the IBM logo are trademarks or registered trademarks of IBM Corporation, in the United States, other countries or both.

Other company, product, or service names may be trademarks or service marks of others.

THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE