git basics with notes
TRANSCRIPT
Client-Server vs Distributed models
VCS SERVER
Version 1
Version 2
Version 3
Version 1 Version 1
Version 1
Version 2
Version 3
Version 1
Version 2
Version 3
Version 1
Version 2
Version 3
To see what a distributed source control system looks like, let us contrast it with a client-server model. In this model, you checkout one snapshot — the state of a file or files at a particular point in time. In a distributed model, you checkout everything locally.
Advantages of Git over P4
Perforce (Client-Server) Git (Distributed)
Version management system Source control system
Slow due to network latency and increased dependency on server calls Fast! Work locally, offline
Intermediate work cannot be easily saved to P4
Various checkpoints for saving intermediate work
Difficult to experiment Facilitates experimentation
A merger is typically responsible for merging between branches
The developer is responsible for merging their branch into master
Perforce model is centered around being able to MANAGE branches. One can restrict branches, setup policies for checking in, etc. Since changing the history of a branch in P4 is an admin-only privilege and is virtually never done, Perforce is good at keeping an audit trail of your commits. On the other hand, git allows you to change the history of a branch completely, as we will see later on. !Why people love Git? Almost all the work is done locally — lots of freedom when you’re doing work.
Server for Git
❖ Github, Stash, CloudForge, etc are code management and collaboration tools for Git repos!
❖ They provide fine grained control over permissions, audit of commit history.!
❖ The distributed model of Git facilitates open source projects since individuals can easily fork off repos and merge the changes back in.
You may ask why we need a server in a distributed model? The central server is just another Git repo that everyone has access to and that the team uses to synchronize their work. It is mainly used for collaboration and is designated as the ‘source of truth’. It can be switched out with another repo easily. Distributed model advantage for open-source projects: if a repo for an open-source project is no longer being maintained by the owner but there is interest in the community to keep it alive, someone can fork it off. Over time, changes will be contributed to this location and it will become the de-factor new home for the project.
Scope of the talk❖ Various roles require different levels of expertise in Git:!
❖ Manager !
❖ Software Engineer/QA Engineer !
❖ Merger/Release Engineer — consumer of git scripts!
❖ Develop scripts that extend git functionality — deep dive into git internals.!
❖ We will cover concepts and commands that will come in handy in your day-to-day work as a developer.!
❖ This talk is a road map of the Git world. Hopefully, it will whet your appetite for exploring the trails.
!Roles: managers: usage of Git will most likely be limited to checking out branches Developers require a working knowledge of git Merger - consumer of git scripts, such as those for bulk merging across releases Develop tools to extend git functionality — deep dive into git internals. !This talk is primarily designed for a developer.
Roadmap
❖ Content hashing!
❖ Blobs to Branches!
❖ Staging and committing !
❖ Remotes and pull requests!
❖ Merge conflicts!
❖ Git resources
Roadmap for the presentation.
Content Hashing
❖ Contents are referenced using their hashes: !
sha1(“blob ” + fileSize + “\0” + fileContent)!
echo “foobar” > foo.txt git hash-object foo.txt = sha1 (“blob 7\0foobar\n”)!
323fae03f4606ea9991df8befbb2fca795e648fa!
❖ Fun fact: Renames are not stored in the repo. They’re computed by commands such as git diff, git merge, etc.
SHA1: secure hash algorithm, used on the content of downloaded files to verify that the content is authentic !$ sha1("blob 7\0foobar\n") = "323fae03f4606ea9991df8befbb2fca795e648fa" $ echo "foobar" > foo.txt $ git hash-object foo.txt 323fae03f4606ea9991df8befbb2fca795e648fa !This is a low-level concept but it introduced you to the fundamental representations used by Git. It also helps you build intuition for the graph structures, as we will cover in the following slides. !Renames are computed based on the similarity between the contents of a ‘deleted’ and an ‘added’ file. mv a.txt b.txt git add -A . Output: renamed: force.txt -> fourth.txt
Blobs to trees❖ A tree is an object that stores !
a) blob!
b) subtree!
❖ Each of these contain metadata about their mode, type and name!❖ A tree object can contain objects of type “blob” or “tree”.!
❖ Example modes: 100755 means it’s an executable file, 120000 specifies a symbolic link
Trees are analogous to directories on a file system. Let us build upon the notion of blobs and see how they come together to form trees.
Commit from trees
❖ A commit is a pointer to a tree!
❖ It is pointed to by one or more parent commits!
❖ It also contains metadata about its:!
1) Author !
2) Committer
Example description of a commit object: tree 9acd01e7390a64900bde0b9749f462c53ccb3c65 parent 770479ca34ffd3450d406228f32aa1cb1d8564a0 author Joan Doe <[email protected]> 1421112508 -0800 committer John Doe <[email protected]> 1421112508 -0800 !Author is the person who originally authored the commit. Anyone who patches the commit after creation is a ‘committer’.
Reuse of objects
tree
tree blob
blob
tree’
blob’
blob
parent!commit
commit
Reusing blob/tree !from elsewhereor
… under-the-hood!object!
sharing
Since only blob was changed to blob’ in this commit, other git objects (trees and blobs) can be reused.
Reuse of objects within a tree
“B”“A” “C”
“A”
tree
Blobs can be shared within!a single tree.
The contents of the blob that is grayed out are identical to another blob. These two will there share a common underlying object.
Multiple parents
T1
B1
T2
B2
T3
B3
P1 P2
C
Commits with multiple parents!have a one-to-one relationship with trees, !
similar to commits with single parents
Gain familiarity with the idea of a commit having two parents.
Branch - pointer to a commitMaster
git branch
The branch pointer moves with the HEAD, as you make additional commits. Git branch command shows all the local branches.
HEAD - pointer to the current commit
HEAD
git checkout C
Master
HEAD
Master
C C
The checkout command allows you to specify any ref such as a commit SHA, a branch name or even a relative path such as HEAD~1.
All your codebase are belong to me
❖ git clone!
❖ git log
Version 1
Version 2
Version 3
Version 1
Version 2
Version 3
Version 1
Version 2
Version 3
Server/Remote
You Peer
Download a repo to your local machine using `git clone` !git branch -a to see both local and remote branches When a branch is checkout out for the first time, a local copy of the branch is created. There is nothing special about the repo hosted on the server from the perspective of git — in fact, you could set up a remote that is another git repo on your local machine and pull/push to it just like you would here.
Our first commit
❖ echo “May the 4th” >> “force.txt”!
❖ git status!
❖ git add force.txt!
❖ git diff —cached!
❖ git commit -m “May the force be with you”
After creating a new file, we need to add it to the git index before we can view the diff. Use git diff —cached to see the differences between the HEAD and the staging area. Use git diff to see the differences between the staged and the unstaged files.
C3
C2
C1
C4
C3
master
C2
C1
You
Remote
remotes/master
master
git branch -a will show all the local and the remote branches Master is tracking remotes/master Master is a branch and therefore, as we make a new commit on this branch, the pointer moves forward. Tag is a pointer to a commit that cannot be moved, while branches can.
C4
C3
C2
C1
You
git push
Remote
C4
C3
C2
C1
origin/master
master
master
You may ask, What if I made a mistake?
Undo unstaged changes
force.txt
git checkout — force.txt
echo “new” >> force.txt
Com
mitt
edSt
agin
g !
Are
aU
nsta
ged!
chan
ges
Unstage changes
force.txt
force.txt
git reset HEAD force.txt
git add force.txt
Com
mitt
edSt
agin
g !
Are
aU
nsta
ged!
chan
ges
git add is actually adding the changes to the index. The add command should be interpreted as “add any new updates” rather than “add new file”. force.txt is already being tracked in the Git index; `git add` stages the new addition to the file namely the word “new”. !Note: As mentioned previously, you can use `git diff —cached` to see the differences between the HEAD and the staging area. It will output ‘+new’ for the diagram on the left and will output nothing for the right diagram. Use git diff to see the differences between the unstaged and staged (or committed, if nothing is staged) versions of the file. It will output ‘+new’ for the diagram on the right and will output nothing for the left diagram.
Uncommit changes
force.txt
force.txt
git reset —soft HEAD^
git commit -m “Second commit”
Com
mitt
edSt
agin
g !
Are
aU
nsta
ged!
chan
ges
Note: git reset —soft HEAD^ will not change your local working directory. It will merely move the changes from a committed state to a staged state. git reset --hard HEAD^ which will completely blow away all changes between your current HEAD and the reference you specify. As we saw, there are a number of checkpoints in your git workflow. If used wisely, you will never have to wonder what the last “working” state of your codebase was before you made some breaking changes.
Typical workflow
Typically, if your team has more than one person, you wouldn’t commit to master directly. Recommended workflow:!
1) Check out a private branch!
2) Commit to the branch, and regularly push to remote.!
3) When the work is complete, get a code review (likely via a pull request) and merge the branch into master
Also, regularly rebase over master, assuming you are working in a private branch.
Checkout said branch
git checkout bugFix
bugFixHEAD
masterbugFix
HEAD
master
Current branch
Now your pointer is at bugFix. These two commands can be combined into one: git checkout -b bugFix. It is helpful to decompose a command when first learning git as it gives you a glimpse into the atomic actions being performed by git.
Step 2: Feature development
HEAD
master
B
CbugFix
masterB
C bugFix
D
Local Remote
A A
If you want to experiment with an alternate codeline, you can easily do this in a new branch off of master. git checkout master git checkout -b newDirection !Let us assume that while you’ve been working on bugFix, someone else has committed their changes to the master branch causing it to move forward. The common ancestor of bugFix and master is no longer master (diagram on the right).
Step 3: Merge into master
A
masterB
CbugFix
D
Remote
A
masterbugFix
B
E
C
New merge commit E
Remote after!merge
D
gitk - show git graph
As we mentioned in the introduction, within the Git model it is the responsibility of the developer to merge their changes into the mainline. It would be remiss not to mention merge conflicts. If there are no conflicts, then you will be able to merge in your changes via a pull request as shown in the right diagram. However, it is recommended that you rebase on top of master, especially If there are merge conflicts. In the latter case, you will need to resolve the conflicts and then run ‘git rebase —continue’. We will explore the graphical underpinnings of rebase in a couple of slides.
Can we do better?
A
masterB
CbugFix
D We would like to modify the commit history to make it
appear as if bugFix was based on commit D all along!
Rebase to the rescue
❖ Rebase allows you to replay a series of commits on top of a new base commit. !
❖ Helps keep the commit history clean
Your changes were based off of commit A. Commit D was introduced in parallel. Rebase allows you to modify commit history to make it appear as if you were working on top of D all along!
Rebase in action
A
masterB
CbugFix
D
bugFix
A
D
C*
B*
git rebase master bugFix
B
C
\
master
Note that commits C and D have been supplanted by C* and D* in the right diagram. If bugFix was a shared branch, you would not want to rebase it on top of master since anyone who was working off of C or D would have the rug pulled out from under them. It is possible to recover from this by cherry picking any changes made on top of C/D into C*/D*. However, it is best to avoid such situations altogether.
Merge bugFix with master
A
D
EmasterbugFix
A
master
C*bugFix
B*
D
C*
B*Merging the rebased branch bugFix !into master. This merge is typically!
triggered in the code management tool! (Github, Stash, etc) after a pull request!
is approved.
Note: the merge from a feature branch to the mainline (master) is usually done with an explicit “—no-ff” flag which will create a merge commit even when a fast forward is possible. The diagram on the right explains visually how this policy helps keep commits in the mainline have a one-to-one correspondence with features.
Merge conflicts
❖ Situation: Conflicting modifications to a file that has changed since we checked it out!
❖ Two options: merge, rebase!
❖ On a private branch, it is recommended that you rebase. !
❖ On a shared branch, merge is the way to go.
Let us take a moment to appreciate that a merge conflict cannot be automated away. There is no way for the source control system to know our intention.
Changing the commit history
❖ “git commit —amend” rewrites the your last commit with the current changes instead of creating a new commit!
❖ Interactive rebase: git rebase -i!
❖ Swiss army knife of modifying history!
❖ Allows you to amend, squash, split, or skip commits as they're applied
Many roads, one destination❖ There are often multiple ways to accomplish a task in Git, for example:
git branch <branchName> git checkout <branchName>
git checkout -b <branchName>
git checkout -b <branchName> <remoteName>/<remoteBranch>
git branch --track <branchName> <remoteName>/<remoteBranch>
git fetch!git merge git pull
Lots of facades -- actions that can be executed using one (or a combination of) flag(s) in some command may be pulled out into their own command. If you get into a bind, there is most probably a way to recover from the situation. Do not hesitate to seek help! git-users mailing list
Give It a Try
Explore the topics discussed so far by creating a new Git repository. Let us assumed it has one file foo.txt with the contents “foo bar”. Person A changes it to foo bar bas in the user/personA branch and creates a pull request to merge this change in. Meanwhile, person B changes the contents of foo.txt to “food bazaar. This commit gets merged into master first. For the purposes of this exercise, personB can commit directly to master. Keep in mind that in a real-life scenario, the conflicting change will be typically introduced by the pull request for personB getting merged into master before that of personA). PersonA’s pull request now has merged conflicts and will need to be resolved using rebase.
Git Resources❖ Learn by playing: http://pcottle.github.io/learnGitBranching/!
❖ Atlassian tutorial: https://www.atlassian.com/git/tutorials/setting-up-a-repository/!
❖ Free CodeSchool course on Git: https://www.codeschool.com/courses/git-real!
❖ StackOverflow is a great resource: http://stackoverflow.com/questions/2706797/finding-what-branch-a-commit-came-from!
❖ Pro Git by Scott Chacon and Ben Straub: http://git-scm.com/book/en/v2