tutorial 1: using excel to find unique values in a...
TRANSCRIPT
Last updated: May, 2017
BCHM 6280 2017 Excel Tutorial Page 1 of 5
Tutorial 1: Using Excel to find unique values in a list Itisnotuncommontohavealistofdatathatcontainsredundantvalues.Geneswithmultipletranscriptisoformsisoneexample.Ifyouareonlyinterestedinthegenesandnotthedifferenttranscripts,thenyouwillprobablywanttofilterthelisttoremovetheredundantvalues.IdidasearchoftheUCSChumangenomebrowserwiththequery“coloncancer”andgotback>500matches.Icreatedatextfilelistingthefirst500matches.YoucandownloadthisdatafromtheExercise1homepagebyclickingonthelinkListofGenesfromUCSC.txt.Thefilehas2columns:GeneNameandChromosomeLocation.YouwillfilteronGeneName.Onceyou’vedownloadedthetextfile,dothefollowing:
• OpenExcelandfromwithinExcelopenthetextdocument.Ifthefileyouwanttoopenisgreyedout,changethedropdownmenutoEnable:AllReadableDocuments.
• Double-clickthefileyouwanttoopenandthisshouldbringuptheTextImportWizard• Itshouldrecognizeitasdelimited.ClicktheNextbuttontodefinethedelimiters.• Bydefault,Excelassumesa.txtfileistab-delimited• ClickNextandthenFinishtofinishtheimport.
Advancedfilter:SelectthecolumnofgenenamesClickontheDatamenuandselectAdvancedfilter(ifyougetawarningaboutbeingunabletodeterminewhichrowcontainscolumnlabelsandyouhaveacolumnheaderinrow1,justclickOK).Checktheradiobutton“Copytoanotherlocation”Thisshouldmoveourmousetothe“Copyto”textbox.Selectacolumn(notColumnsA-C)Checkthebox“Uniquerecordsonly” ClicktheOKbutton.Thisshouldproducealistof208genesfromtheoriginal500genes.
Last updated: May, 2017
BCHM 6280 2017 Excel Tutorial Page 2 of 5
Tutorial 2: Using Excel to manage text data Anissuecommontogenenamesorgeneidentifiersisslightvariationsthatcanpreventtheiridentificationviaadatabaselookup.Anexampleisthatasgeneortranscriptrecordsarereviewedbycurators,theyareoftengivenanappendednumbersuchasNM_0012345.1orNM_0012345.3indicatingwhichversiontheyare.ThebaseidentifierofNM_0012345isthesamebetweenthembutifyourlisthastheappendedversionnumber,thedatabaselookuporExcellookupwon’trecognizethetwoasbeingthesamerecord.Inthisexample,therearetwoExcelfilesavailablefromtheExercise2homepage:ExpressionData.xlsxandGeneInfo.xlsxTheExpressionDatafilehastwocolumns.ThefirsthasEnsemblGeneIDswiththeversionnumber.ThesecondcolumncontainsgeneexpressioninformationintheformofLog2ratiooftreatment/control.TheGeneInfofilehasfourcolumns.ThefirsthasEnsembleGeneIDs,butasthestableidentifierratherthanasaversion.Theremainingcolumnshavethegenesymbol,NCBIGeneIDandgenedescription.YouwanttobeabletobringininformationfromtheGeneInfofileintotheExpressionDatafilebutatthemoment,theydonotsharethesameidentifiers.Tocorrectthis,youwilluseatext-relatedfunctioncalledLEFTtochangetheGeneIDsintheExpressionDatafiletomatchthoseintheGeneInfofile.
1. InsertacolumntotheleftoftheGeneIDcolumnintheExpressionDatafile.2. IncellA2,type=andselecttheLEFTfunction3. SelectcellB2forthetextboxintheFormulaBuilderdialogbox4. Tabtothenum_charsboxandtypein155. ThisshouldreturntheENSG##uptothe.asitwasoriginially6. SelectthenewlygeneratedIDinA2,thencopydowntotheendofthecolumn.TypeCtrl-D
tocopythefunctiondowntherestofthecolumn.7. ThenEdit->copythenewlygeneratedIDsanduseEdit->Paste->Special->Valuestoreplace
theformulawithvalues.8. NowyoucanusethetwofilesinthenextsectiontobringthedatafromGeneInfointothe
ExpressionDatafile Tutorial 3: Using Excel to compare lists of data. Averycommonprobleminbioinformaticsorinformationprocessingofanykindishavingmultiplelistsofdatathatyouwanttocomparetoeachother.InExcelisafunctioncalledVLOOKUPthatmakesthiseasytodo.Itisalsousefulfortransferringdatafrom1worksheettoanother.Forthispartofthetutorial,youwillusetheGeneInfoandyourmodifiedExpressionDatafilefromtheprevioussection.YoucandeletethecolumnfromtheExpressionDatafilethathadtheGeneIDswithversionnumberinthem.Inthispartofthetutorial,youwillbringintheGeneNameandNCBIGeneIDintotheExpressionDatafile.
Last updated: May, 2017
BCHM 6280 2017 Excel Tutorial Page 3 of 5
OpenbothworksheetsinExcel.o IntheExpressionDatafile,insertacolumnbetweencolumns1and2.o Inthesecondrowofcolumn2(cellB2),typeand“=”sign.Thengotothedropdownmenuin
theupperleftoftheworksheet,findthefunction“VLOOKUP”andselectit.IfyoudonotseeVLOOKUPonthemainmenu,scrolldownto“morefunctions”whichopensadialogboxwithalloftheavailableExcelfunctions.Under“lookupandreference”youwillfindVLOOKUP.
o Onceyou’veinsertedthefunction,youmustfillouttheargumentsforthefunctionusingthedialogboxthatopensup.SelectcellA2asthelookupvalue.
o Thenclickintothebox“Table_array”.GouptothewindowmenuandselectGeneInfor_ExcelTutorial.xlsxasshowninFigure2.
o ThiswillactivateGeneInfo.xlsx.
Figure1:InsertingaVLOOKUPfunctionintocolumn2ofExpressionDataworksheet.
Figure2:Selectingsecondworksheetforastable_arrayintheVLOOKUPfunction.
Last updated: May, 2017
BCHM 6280 2017 Excel Tutorial Page 4 of 5
o Selectthefirst2columnsofGeneInfo.xlsx.o Taborclickonthebox“Col_index_num.”Thistellstheargumentwhichcolumnofdatato
bringovertothefirstworksheet.Typeina2.o Inthefinalbox,“Range_lookup,”type“false”.IfA2intheExpressionDataworksheetmatches
A2inGeneInfoworksheet,thenthevaluefromcolumn2ofGeneInfowillbeenteredintocellB2ofExpressionData.Ifthe2cellsdonotmatch,itwillfillin“N/A”.
o Tofillintherestofthecolumn,selectfromcellB2throughthenendofthedataandundertheEditmenu,selectFillDownorusethekeyboardshortcutof“Ctl+D”.
Figure5:Fillingintherestofthecolumnwiththesamefunction. Whenyouaredone,yourExpressionDataworksheetshouldlooklikethatshowninFigure4:
Figure3:Fillingintherestofthecolumnwiththesamefunction.
Figure4:GeneExpressionworksheetaftercompletingVLOOKUP
Last updated: May, 2017
BCHM 6280 2017 Excel Tutorial Page 5 of 5
Atthispoint,thedataincolumn2isstilllinkedtotheGeneInfoworksheet.Youcanseethisifyouclickononeofthegenenamesandlookatwhatisdisplayedinthetextboxatthetopofthesheet.Youdonotwanttoleaveyourfilelikethat,otherwiseeverytimeyouopenitwillgothroughthedatalookupfunctionagain.Toavoidthis,selecttheentirecolumn,copyitandthendoaEdit->PasteSpecialandselect“values”inthe“Pastespecial”dialogbox.Thiswillreplacethefunctionwiththevalueofthefunction.Afteryoucompletethat,clickonagenename.Youshouldseejustthegenenamedisplayedinthetextboxatthetop.
TobringintheNCBIgeneID,justinsertanothercolumnintheExpressionDataworksheetandrepeattheVLOOKUPprocessbringingincolumn3datafromGeneInforatherthancolumn2.
Figure5:GeneExpressionworksheetaftercopyingandpastespecialwithvalues