linux command line basics ii: downloading data and...
TRANSCRIPT
Linux command line basics II: downloading data and
controlling filesYanbinYin
1
2
Learningprogramminghastogothroughthehands-onpractice,alotofpractice
HearingwhatIdescribeaboutacommandoraprogramhelps,butyouwillnotbeabletodoitunlessyoutypeinthecodesandrunittoseewhathappens
Readingothers’codeshelpsbutoften isharderthanwritingitbyyourself fromscratch
Althoughpainfulandfrustrating,trouble-shootingisnormalandpartofthelearningexperience(askexperiencedpeopleorgoogle)
Toavoiderrors,youhavetofollowrules;mosterrorsoccurredinprogrammingarebecauseofnotknowingrulesorforgettingrules
Usecommentsincaseyouforgetwhatyou’vewrittenmeans
write->run->errors->edit->errors->………………………………….. ->run->success
Goodnews:finishedscriptscouldbereusedoreditedforlateruse
Thingsyoushouldknowaboutprogramming
Homework#71.Createafolderunderyourhomecalledhw7
2.Changedirectorytohw7
3.GotoNCBIftpsite,findthegenome,bacteria,ecoli MG1655folder,anddownloadtheptt fileandthefaa fileinthere
4.Createacopyoftheppt file,iftheoriginalfileiscalledA.ptt,namethecopiedfileA.ptt.bakDothesamethingforthefaa file
5.Readthechapter5ofhttp://edu.isb-sib.ch/pluginfile.php/2878/mod_resource/content/3/couselab-html/content.html andfinishallquizzesinthere
6.Usewhatyoulearnedinchapter5tocounthowmanyproteinsequencesareinthefaafle ofstep4.
3
Writeareport(inwordorppt)toincludealltheoperations/commands andscreenshots.
DueonNov10(sendbyemail)Officehour:Tue,ThuandFri2-4pm,MO325AOremail:[email protected]
Whatwelearnedlastclass:
filesystem,relative/absolutepaths,workingfolder,homefolder
ssh,pwd,lscd,mkdir,rmdir,rm,mancp,mv
Ifthingsgowrong, try:
Ctrl+c (sometimesmultiple times)
qtoexitfrommanpage
4
http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf5
Viewfiles:more, less, head, tail
HowyouuseTabkeytoautocomplete
less /home/ thenhittabtwice,youwillseeallfiles/foldersunder /home/less /home/yyin/ thenhittabtwice,youwillsee…
less /home/yyin/U thenhittabonce,Unix_and_Perl_course willbeautocompleted
less /home/yyin/Unix_and_Perl_course/ keepdoing thisuntilyouget
less /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_proteins.fasta
See next page for screen shots
q:quitviewing ↑or↓:moveupordownalinespace:nextpage />:searchfortext‘>’BorPgUp:backapage ForPgDn: forwardapagen:findnextoccurrenceof‘abc’G:gototheend ?:findpreviousoccuence of ‘abc’
6
HowyouuseTabkeytoautocomplete
7
more /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff
moreissimilartoless,butcandolessthanless
head /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/chr1.fasta
head -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/chr1.fasta
headtodumpthetopfewlinestothescreen
tail /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/intron_IME_data.fasta
tail -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/intron_IME_data.fasta
tailtodumpthelastfewlinestothescreen
more, less,head, taildonot loadallfilecontenttothememoryYoucaneditthefilecontenteither, theyarejustviewers 8
9
CreateoreditfilesTexteditors:nanopicovi
Supposeyouareatyourhome:
WritethetoppartoftheintAt_genes.gff filetoanewfilehead -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff > head
Trynano (Intuitiveuserinterface)nano head
Tryvi(command-driven interface,butmuchmorepower)vi head
Createafilefromscratchusingvi.1) youtypevi filename andhitenter2) afteryouareinvi,typei togetintoeditmodeandcopy&pastecontentinvi3) hitEsc toexiteditmodeandthen:x tosavethefileandexitvi.
10
Inputandoutputredirection:thegreater-thansign
Unixhasaspecialwaytodirectinputandoutputfromcommandsorprograms.
Bydefault,theinput isfromkeyboard (calledstandardinput,stdin):youtypeinacommandandShelltakesthecommandandexecutesit.
Thestandardoutputbydefaultistotheterminalscreen(stdout);
ifthecommandorprogramfailed,youwillalsohavestandarderrorsdumped totheterminalscreen(stderr).
However, ifyoudonotwanttheoutputdumped tothescreen,youcanuse“>”toredirect/writetheoutput intoafile.Forexample,try
ls /home/yyinls /home/yyin > listls /home/yyimls /home/yyim 2> err
“2>”todumptheerrormessageNospacehere!
11
vibasics
commandmode editmodeiEsc
Thefollowing commandsoperateincommandmode(hit Esc beforeusingthem)x deleteonecharacteratcursorpositionu undodd deletethecurrentlineG gotoendoffile1G gotobeginning offile10G gotoline10$ gotoendofline1 gotobeginning ofline:q! exitwithoutsaving:w save(butnotexit):wq or:x saveandexitArrowkeys: movecursoraround (inbothmodes)
http://cbsu.tc.cornell.edu/ww/1/Default.aspx?wid=36
12
Searchandsubstitutioninvi
Incommandmode,youcandoanumber offancythings.Themostusefulare:
- Search: hitslash(“/”)togetthecursortotheleft-bottomcorner;youcantypeanywordorlettertosearchit;typentogotothenextinstance
- Replace:hitEsc(atanytime,hittingEsctogetbacktothedefaultstatusisthesafestthing todo)andtype“:1,$s/+/pos/g”andthenenterwillreplaceall“+”to“pos”.
Trythisinvi head
:1,$s/+/pos/gReady to type in command
From the first line to the last
Substitution
The first field: to be replaced
The second field: to replace with
all instances in a row
1) hitEsc toexiteditmodeandthen:q!toNOTsavethefileandexitvi.
Wildcardsandregularexpression
Regularexpression(regexorregexp)isaverypowerfultoolfortextprocessingandwidelyusedintexteditors(e.g.vi)andprogramminglanguages(e.g.Shellcommands:sed,awk,grep andperl,python,PHP)toautomaticallyedit(matchandreplacestrings)texts.
Findingandreplacingexactwordsorcharactersaresimple,e.g.theviexampleshownabove
However,ifyouwanttomatchmultiplewordsorcharacters,youwillneedwildcardsorpatterns.
13
alistofcommonlyusedwildcardsandpatterns:
* anynumbersofletters,numbersandcharactersexceptforspacesandspecialcharacters,e.g.()[]+\/$@#%;,?
. anysingleletter,numberandcharacterincludingspecialcharacters^ startofaline$ endofaline^$ anemptyline,i.e.nothingbetween̂ and$[] createyourownpattern,e.g.[ATGC]matchesoneofthefourlettersonly,
[ATGC]{2}matchestwosuchletters;[0-9]:anynumbers
\w anyletter(a-zandA-Z)\d anynumber(0-9)+ previousitemsatleastonetimes,e.g.\w+matcheswordsofanysizes{n} previousitemsntimes,e.g.\w{5}matcheswordswithexactlyfiveletters\s space\t tabularspace\n newline
caret
http://www.bsd.org/regexintro.html
Curlybrackets
14
This overwrite the head file:head -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_proteins.fasta > head
vi head
Inside vi, try :1,$s/ *//g
Hit u to undo
What about :1,$s/ .*//g
1) hitEsc toexiteditmodeandthen:x tosavethefileandexitvi.
Useregex insidevi
15
16
Getdatafromremoteftp/httpwebsiteftplftpsftpncftp
lftp addr command to connect to a remote ftp servercd dir change to the directorycd .. change to the upper folder (..)ls list files and folders in the current directory at oncels dir list files and folders in dir at oncels | less list page by page (good if the list is too long)get file get a filemirror dir get a folderzless file view the file contentby or bye exit lftp
NCBIftpsite:
ConnecttoNCBIftpsite:lftp ftp.ncbi.nih.gov
Thepromptwillchangeto:lftp ftp.ncbi.nih.gov:/>
After‘>’youcantypeincommandandhitenter:lftp ftp.ncbi.nih.gov:/>ls
Theftpsitecanalsobeaccessedthroughawebbrowser
17
ls command:
listfilesandfolders
18
Wherebacterialgenomesareintheftpsite?
19
Theendofthepageafterls
20
cdne
Thenpresstabkeytoauto-completeorlist
21
Howtotransferfilebetweenalinux andawindowsmachine?UseSSHsecurefiletransferclient
OpenthesoftwareHitenter
PutIPaddress(10.157.217.4)PutusernameHitconnect
Chooseyes
PutpasswordHitok
22
Iftransferfrom localtoremote:locateyourfileanddragtotherightIftransferfromremotetolocal:locateyourfileanddragtotheleft
23
TransferfilesbetweentwoLinuxmachines(ormac andlinux)
scp:securecopyfiles/foldersbetweenhostsonanetwork
YouareataLinuxorMacmachine,e.g.yourlaptopwithUbuntu installedandyouwanttocopysomefilefromser
Openaterminalinyourmachine
scp [email protected]:/home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff .
scp username@IP:/path .
Youwillbeaskedforpasswordonser
24
25
wget isaprogramuseful fordownloading filesfrombothFTPandHTTPsites.
wget isnon-interactive:yousimplyenterthenecessaryoptions andargumentsonthecommandlineandthefileisdownloaded foryou.
Youmustidentify thelinksfirst:browseahttpwebpageoraftpsiteandlocatetheremotefiles/foldersyouwanttodownload andthengototheterminalandtype
wget -q ftp.ncbi.nih.gov/blast/db/FASTA/yeast.aa.gz
wget -r -q ftp://ftp.ncbi.nih.gov/genomes/archive/old_refseq/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779
wget –q ftp.ncbi.nih.gov:/blast/executables/LATEST/ncbi-blast-2.2.27+-x64-linux.tar.gz
wget ftp://emboss.open-bio.org/pub/EMBOSS/emboss-latest.tar.gz
wget
-qquiet-rrecursive(for folders)
IttaketimetodownloadPut& attheendofcommand linetoputthejobtothebackground
26
Archiveandcompressfiles/folders
Tosavediskspace,wecancompresslargefilesifwedonotintendtousethemforawhile.Alotoffilesdownloaded fromthewebarecompressedandneedtobeuncompressedbeforeanyprocessingcantakeplace.
Commoncompressedformats:•gzip (gz)
gzip my_file (compressesfilemy_file,producing itscompressedversion,my_file.gz)
gzip –dmy_file.gz(decompressmy_file.gz,producing itsoriginalversionmy_file)
•bzip2bzip2my_file (compressesfilemy_file,producing itscompressedversion,
my_file.bz2)bunzip2my_file.bz2(decompressmy_file.bz2,producing itsoriginal
versionmy_file)
zless toviewzipped files
27
Commoncompressedformats(continued):•zip
zipmy_file.zipmy_file1my_file2my_file3(createacompressedarchivecalledmy_files.zip, containing threefiles:my_file1,my_file2,
my_file3)zip-rmy_file.zipmy_file1my_dir (ifmy_dir isadirectory,createan
archivemy_file.zipcontaining thefilemy_file1andthedirectorymy_dir
withallitscontent)zip–lmy_file.zip(listcontentsoftheziparchivemy_file.zip)unzipmy_files.zip(decompressthearchiveintotheconstituentfilesand
directories•tar
tar-cvf my_file.tarmy_file1my_file2my_dir (createacompressedarchivecalledmy_files.tar,containing filesmy_file1,my_file2
andthedirectorymy_dir withallitscontent)
tar–tvf my_file.tar(listcontentsofthetararchivemy_file.tar)tar-xvf my_files.tar(decompressthearchiveintotheconstituentfiles
anddirectories)
Usemantartolearnmore
28
Commoncompressedformats(continued):•tgz (also,tar.gz – essentiallyacomboof“tar”and“gzip”)
tar-czvf my_file.tgzmy_file1my_file2my_dir (createacompressedarchivecalledmy_files.tgz,containing filesmy_file1,my_file2
andthedirectorymy_dir withallitscontent)
tar–tzvf my_file.tgz(listcontentsofthetararchivemy_file.tar)tar-xzvf my_files.tgz(decompress thearchiveintotheconstituentfiles
anddirectories)
Wget thebookmaterialsofUnixandPerlPrimerforBiologistshttp://korflab.ucdavis.edu/Unix_and_Perl/
mkdir book
cd bookwget http://korflab.ucdavis.edu/Unix_and_Perl/current.zip
unzip current.zip
Unpackage theembosspackage
cdmkdir toolscd toolsmv ../emboss-latest.tar.gz toolstar –zxf emboss-latest.tar.gz &
29
30
CheckdiskusageDiskspaceisalimited resource,andyouwanttofrequentlymonitorhowmuchdiskspaceyouhaveused.Tocheckthediskspaceusageforafolder,usethedu(diskusage)commandyyin@ser:~$ du -hs .318M .yyin@ser:~$ du -hs Unix_and_Perl_course/131M Unix_and_Perl_course/
Tocheckhowmuchspaceleftontheentirestoragefilesystem,usethedf command
31
- Savehistoryofyourcommands:history > hist1less hist1
- Sendmessagetootheronlineuserswrite username(ctrl+c toexit)
- Changeyourpasswordpasswd
Ctrl+c totellShelltostopcurrentprocessCtrl+z tosuspendbg tosendtobackgroundCtrl+d toexittheterminal(logout)