hg version control bioinformaticians
DESCRIPTION
a short talk I gave to my group to explain the basics of HG and version controlTRANSCRIPT
![Page 1: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/1.jpg)
Giovanni Dall'Olio,IBE (UPF-CEXS)
Introduction to version control and hg for our bioinformatics
group
![Page 2: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/2.jpg)
What is hg?
● Programmers use software to keep track of all the versions of the code they write. These are called Version Control Systems (VCS)
● There are many software to make VCS; the most renown are cvs, subversion, git, hg, bazaar
● Git, hg and bazaar are newer and based on an improved paradigm called Distributed Version Control System (DVCS)
![Page 3: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/3.jpg)
How will hg be useful for us?
● Keep versions of the scripts we create● also for the datasets, results, etc..
● Have a common and official version of the pipeline and the scripts, on bitbucket.org
● Everybody will work on his computer on his version of the scripts; every once in a while, he will merge it with the official version
![Page 4: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/4.jpg)
Installing hg
● Hg can run on any operating system● On linux, install it through your software center
● sudo apt-get install mercurial
● On other OS, go to http://mercurial.selenic.com/ and download the installer
![Page 5: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/5.jpg)
Initial hg configuration
● Hg stores its configuration in a file called:● ~/.hgrc on Unix● C:\Documents and Settings\your_name\.hgrc
● Open it and write your username:
[ui]username = Giovanni Dall'Olio <[email protected]>
![Page 6: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/6.jpg)
![Page 7: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/7.jpg)
The basic operations of a VCS
● Creating a repository● Can be equivalent to 'start keeping track of the
version of the files in this project'
● Adding files to the repository● Files are not tracked unless you say so
● Committing changes● Saving a version of the actual state of the files
● Pushing the changes and merging them with the standard version
![Page 8: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/8.jpg)
Creating a repository
● Create a new directory and create the repo with:● hg init
![Page 9: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/9.jpg)
Effect of creating a new repo
● An hidden directory (.hg) will be created● From now on, it will be possible to give other hg
commands
![Page 10: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/10.jpg)
Adding files to the repo
● By default, no files are added to the repository● It means that if you create a new file in the
directory, hg will ignore it
![Page 11: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/11.jpg)
Creating a file
![Page 12: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/12.jpg)
Files are not added automatically to the repo
● The command:● hg log file.txt
● should return the historial of changes of the file file.txt. Since it is not in the repo yet, nothing is shown
![Page 13: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/13.jpg)
hg add
● To add a file to the repository, use hg add● This will mean that the software should record
all the changes on that file
![Page 14: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/14.jpg)
Committing changes
● The most important operation in VCS is the commit
● This operation saves the status of the files tracked and associate it with a version
● One commit → one version
![Page 15: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/15.jpg)
Committing a change
● We have added the file file.txt to the repo● This is a change compared to the previous
version (where this file was not present)● So we have to record it with a commit
![Page 16: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/16.jpg)
Our first commit
![Page 17: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/17.jpg)
Effects of adding a file and committing
● From now on, all the changes made to the file will be tracked
![Page 18: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/18.jpg)
What is being 'committed'?
● Every time you commit a new version, hg stores the set of changes since the previous version
● Other old VCS stored a copy of all the files for each version● => very big disk space occupation
● By storing only the changes, hg occupies less space and makes it easier to compare versions
![Page 19: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/19.jpg)
Hg diff
● The hg diff command will show the differences between the file and its last saved version
![Page 20: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/20.jpg)
Hg log
● Hg log will show the history of the changes in the repository
![Page 21: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/21.jpg)
Hg log
![Page 22: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/22.jpg)
The story continues..
● The basic operations in a VCS are adding files to the tracking, and commit changes
● Next week we will see how to keep a copy of our repository on a remote server, and how to collaborate with other people
● Now I will show you some example of using a version control system
![Page 23: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/23.jpg)
Example: backup
● Imagine that for error, you remove a file or a directory from your project
● With a VCS, you can revert to the previous version and get the files back
![Page 24: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/24.jpg)
Example: tracking code
● VCS have been developed to track changes in the code● Return to the point where you have made a mistake
or a typo● Implementing a parallel version of the code, like
trying a different library or approach (branching)● Remember what you have been doing, when you
have to change code written months ago
![Page 25: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/25.jpg)
Example: releasing a software
● Mr. Werewolf publishes a software to predict when the moon will be full
● The code gets adopted by the werewolf community. Papers got published using it
● At a certain point, another werewolf discover a bug in the code. It will be possible to seek the version where the error occurred and identify all the versions affected
![Page 26: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/26.jpg)
Example: tracking data
● Version control can be applied to a dataset ● Example: Mr Dracula wants to write a paper on
the quality of the blood in his neighborhood. Every time he gets new data, he commits a change
![Page 27: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/27.jpg)
Tracking everything else
● VCS can be applied to many kinds of file● Usually they do not support binary files● OpenOffice documents can be tracked (they
are XML)
![Page 28: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/28.jpg)
Tracking huge files
● Hg stores the differences between two versions● Storing all the 1000g will take:
● Some gigabytes to store a compressed version of the files
● Less space to store the following commits (but these commits will take time)
● Maybe it is not worth to put gigabytes of data under version control● No solution to date● Some hg extensions for big files
![Page 29: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/29.jpg)
How frequently should I commit?
● Everybody has his/her own phylosophy● Some people prefer to commit every smallest
change● Others prefer to make only a big commit every day
● As a general rule:● The biggest the commit is, the most difficult is to
integrate it if there are conflicts● It's up to you to decide
![Page 30: Hg version control bioinformaticians](https://reader034.vdocuments.site/reader034/viewer/2022051323/547d3eee5906b552378b4601/html5/thumbnails/30.jpg)
How to write the perfect commit messages
● One or two sentences● Avoid generic messages
● “new changes”, “fixed bugs”
● Use tags like 'Fix', 'Add', 'Config', etc..:● “Fix: error when reading file”● “Add: new function for plotting results”
● Cite the files changed if you think it may be useful:● Implemented new sorting algorithm for sorting.py