data submission services of embl australia bioinformatics resource (embl-abr) · 2019-04-30 ·...
TRANSCRIPT
ActivityReport-June2018
DatasubmissionservicesofEMBLAustraliaBioinformaticsResource(EMBL-ABR)
OverviewSince January 2016, theQFAB@QCIF teamprovidesdata submission servicesonbehalf ofEMBL-ABR.Theseservices refer to theguidanceandsupportprovidedtohelpBioPlatformAustralia (BPA) and Australian researchers with the process of curating, formatting andmanagingresearchdatafortransfertoexistinginternationaldatarepositories,whereitwillbepubliclyaccessibleforreuse.
ActivityReport-June2018
page2of6
EMBL-ABR’sDataSubmissionServiceTheQFABteamfromtheEMBL-ABR:QCIFNodeusesarangeofscriptsandstandardoperatingprocedurestosupportthesubmissionofAustraliansequence-baseddatatotheEBIEuropeanNucleotideArchive(ENA)andtheNCBISequenceReadArchive(SRA).Theserviceincludes:
• managementofENAandSRAdatasubmissionaccountsaccessiblebyresearchers• automationofuploadprocessessavingresearcher’stime• optimisationofdatatransferprocessestoensuredataintegrityandreducetransfer
failure• provision of staging infrastructure to facilitate submissions from the researcher’s
perspective• verificationandcollationofrequiredmetadatapriortosubmission• submissionofresearcherdatatoENAandSRA• submissionofselectedBioplatformsAustraliadata• maintaining boutique data submission tools such as Tox|Note (for venom-gland
transcriptomedata submission and toxin card creationonArachnoserver) and thesystemdevelopedfortheBeatsonGroupandcollaborators
• ahelpdeskforsupport.
Testimonials–2018EMBL-ABRsequencesubmissionservicewasanimmensehelpforthesubmissionofdataforourmanuscript.Giventhenumberofsamplesthatweneededtosubmit,havingsomehelpongetting itup toEBI savedusvaluable time.Given thesharedaccess toNeCTARcomputingresourceswewereabletotransferdatabetweeninstituteseasilyaswell.BothmyselfandothermembersoftheUniversityofAdelaideBioinformaticsHubhavealreadyrecommended students and PIs to make use of the service when submitting data forpublication.JimmyBreenRobinsonResearchInstitute,CoreFacilityLeader(Bioinformatics)atUniversityofAdelaide
DearDominiqueandteam,ThankyouSOmuchforyourworkonthistodate–ithasbeenbrilliant.RebeccaJohnsonAustralianMuseumResearchInstitute(AMRI)
IhadaverygoodexperiencewithusingtheEMBL-ABRsequencesubmissionservice.NickandGarethmadeitveryeasytocollateandsubmitourdatasetstotheENAdatabaseaheadofourpublicationinGenetics.Iwouldhighlyrecommendthemtoanyresearcherthatdealswitharchivingandsubmittinglargedatasets.DavidSchlipaliusSchoolofBiologicalSciences,TheUniversityofQueensland
ActivityReport-June2018
page3of6
RecentRepresentativePublicationsGlobalDNAMethylationPatternsCanPlayaRoleinDefiningTerroirinGrapevine(Vitisviniferacv.Shiraz)HXie,MKonate,NSai,KGTesfamicael,TCavagnaro…-Frontiersinplantscience,2017Variant linkage analysis using de novo transcriptome sequencing identifies a conservedphosphineresistancegeneininsects.Schlipalius, David I., Tuck, Andrew G., Jagadeesan, Rajeswaran, Nguyen, Tam, Kaur,Ramandeep, Subramanian, Sabtharishi, Barrero,Roberto,Nayak,Manoj andEbert, PaulR.(2018).Genetics209(1)281-290.ArachnoServer3.0:anonlineresourceforautomateddiscovery,analysisandannotationofspidertoxins.Pineda SS, Chaumeil PA, KunertA, KaasQ, ThangMWC, Le L,NuhnM,HerzigV, SaezNJ,Cristofori-ArmstrongB,AnangiR,SenffS,GorseD,KingGF.Bioinformatics.2018Mar15;34(6):1074-1076
Submissionstatistics
ActivityReport-June2018
page4of6
QFAB@QCIFteamVariousmembers of theQFAB team are involved in the provision of the data submissionservices:NickRhodes
• Contactpersonforallusersandstakeholders,includingENA• Processdesignandimprovement• Managementofuseraccounts• QualityControlofmetadatapriorofsubmission• Submissionofdata• Technicalsupport
MikeThangandThomCuddihy
• ManipulationofBAMfilesusingSAMtools• Processimplementation• Submissionofdata
JeffChristiansen
• Broadeningoftherangeofsupportedsubmissions• Identificationofmetadatarequirements• Investigationofmetadatamanagementsystems
Development and improvement of processes for datasubmissionTheQFAB@QCIFhasimprovetheefficiencyandeaseofuseofthedatasubmissionprocessby
• Automatingsomemanualsteps• DeployingandmaintainingadedicatedVMonQRIScloudfordatasubmission
o Linuxaccountsforuserstouploaddatao User“handholding”,asrequiredo Volumestorageallocatedasrequiredo NFSaccesstotheBPAcollectionsonQRIScloudRDSstorageo AsperaSecureCopyclient
ActivityReport-June2018
page5of6
SupportingBPAwiththesubmissionofdatatoENATheQFAB@QCIFissupportingsubmissionsofdataforthefollowingBPAprojects:
• BASEproject• GreatBarrierReefproject• MarineMicrobesproject(notyetstarted)
SupportingAustralianresearcherswiththesubmissionofdatatoENAandSRATheQFAB@QCIFteamissupportingdatasubmissionactivitiesfortheAustraliancommunity:
• Bacteriagenomes-ScottBeatson,UQ• Spiderandothertoxins–GlennKing,UQ• TasmanianDevil–BelindaWright,UniversityofSydney• Sponge–DegnanLab,UQ• Porphyromonasgingivalis-HelenMitchell,UoM• Streptococcuspneumoniae-BioinformaticsHub,UniversityofAdelaide• IndianMynagenomeassemblyassessment(AustralianMuseumResearchInstitute,
AMRI)• MSGBSsamplesfromBarossagrapes-BioinformaticsHub,UniversityofAdelaide• MSGBS samples, salt-induced alterations of DNAmethylation in barley – Stephen
Pederson,BioinformaticsHub,UniversityofAdelaide
Wehaverecentlyadoptedamoreproactiveapproachtopromotingtheserviceincludinghigh-profilelinksontheEMBL-ABRwebpage.WeanticipatethatmoreresearchersacrossAustraliawillbeinterestedinthedatasubmissionservicesasvisibilityincreases.
Maintainingdevelopedboutiquedatasubmissiontools
Tox|Note
In2014/2015Tox|Note,atoxinanalysisworkflow,wasdevelopedincollaborationwithGlennKing’sGroup(UQ),EMBL-ABR(formerlyBRAEMBL)andQFABBioinformaticstosignificantlyfast track theanalysisof venom-gland transcriptomesgeneratedby largeNextGeneration(NG)sequencingprojectsandallowaneasyandsimplesubmissionofthefindingsviaEMBL-ABRasdatabrokertoENA/UniProt.
Forthispurpose,EMBL-ABRandQFABBioinformaticsworkedcloselytogethertointegrateadatasubmissionmoduleintoTox|Noteallowingresearcherstosubmittheirsequenceswiththe requiredmetadata,obtainaccessionnumbersandautomatically create toxin cardsonArachnoServer,aglobalandpublicrepositoryforspidertoxinandstructureresearchavailableathttp://www.arachnoserver.org.
ActivityReport-June2018
page6of6
SRAuploadworkflow
AnSRAuploadworkflowwascreatedforsubmissionsofbacterialgenomestotheSRAfromtheBeatsonGroup.Thetoolintegratesauthentication,proxyhandling,messagingprotocols(Slack)andrecursivefilehandling,builtontheLinux-standardvsftpd(VerySecureFileTransferProtocolDaemon).
TheQFABhascontinuedtosupporttheseboutiquedatasubmissionservices:
• MaintenanceoftheTox|Noteworkflow
• SubmissionofnewlyidentifiedtoxinsbyTox|NotetoENAandUniProt
• Submission of bacterial genomes from the Beatson Group and maintenance ofbespokeuploadtooldeployedforthispurpose
SubmissionrequesttrackingspreadsheetData submission requests are tracked and shared with EMBL-ABR Hub through a Googlespreadsheetavailableat:https://docs.google.com/spreadsheets/d/1WtGL7IQf-a09kEVH79yqvC09HnTGT4KuC74_hZEiF3w/edit?usp=sharingEach submission request isdifferentwith some requestsbeing forone sampleonlywhilstothercouldbeforhundredsoreventhousandsofsamples.Assuch,theamountofsupportrequired for each request in the tracking spreadsheet varies vastly. We believe that ourinteractive,personalapproachtoclientrequirementsisfundamentaltoitsappealtousers.Experiencedwet-labscientistsmaylackthetimeorskillstonegotiatethesubmissionprocess,indeed thedelays observedbetween thedates of sequencing runs andwhenwe are firstapproachedindicatesthereisanaccumulatedback-logofsubmissions.
NickRhodes&DominiqueGorse
QFABBioinformatics,QCIF
BIOINFORMATICS|BIOSTATISTICS|BIODATA