scientific discovery
DESCRIPTION
Scientific DisTRANSCRIPT
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
The Fourth Paradigm: Data-Intensive Scientific Discovery
TonyHeyCorporateVicePresident
MicrosoftExternalResearch
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
TonyHeyTonyHey AnIntroductionAnIntroduction
CommanderoftheBritishEmpire
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
The Fourth Paradigm
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Data collection Sensor networks, satellite
surveys, high throughput laboratory instruments, observation devices, supercomputers, LHC
Data processing, analysis, visualization Legacy codes, workflows,
data mining, indexing, searching, graphics
A rchiving Digital repositories,
libraries, preservation,
SensorMapFunctionality: Map navigationData: sensor-generated temperature, video camera feed, traffic feeds, etc.
Scientific visualizationsNSF Cyberinfrastructure report, March 2007
ADigitalDataDelugeinResearchADigitalDataDelugeinResearch
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
1. Thousand years ago Experimental Science Description of natural phenomena
2. Last few hundred years Theoretical Science Newtons Laws, Maxwells Equations
3. Last few decades Computational Science Simulation of complex phenomena
4. Today Data-Intensive Science Scientists overwhelmed with data sets
from many different sources Data captured by instruments Data generated by simulations Data generated by sensor networks
eScience is the set of tools and technologiesto support data federation and collaboration
For analysis and data mining For data visualization and exploration For scholarly communication and dissemination
EmergenceofaFourthResearchParadigmEmergenceofaFourthResearchParadigm
WiththankstoJimGray
AstronomyhasbeenoneofthefirstdisciplinestoembracedataintensivesciencewiththeVirtualObservatory(VO),enablinghighlyefficientaccesstodataandanalysistoolsatacentralizedsite.TheimageshowsthePleiadesstarclusterformtheDigitizedSkySurveycombinedwithanimageofthemoon,synthesizedwithintheWorldWide
Telescopeservice.
Sciencemustmovefromdatato
informationtoknowledge
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
TheimpactofJimGraysthinkingiscontinuingto
getpeopletothinkinanewwayabouthowdata
andsoftwareareredefiningwhatitmeanstodo
science."
BillGates,Chairman,MicrosoftCorporation
Oneofthegreatestchallengesfor21stcentury
scienceishowwerespondtothisneweraof
dataintensivescience.Thisisrecognizedasanew
paradigmbeyondexperimentalandtheoretical
researchandcomputersimulationsofnatural
phenomenaonethatrequiresnewtools,
techniques,andwaysofworking.
DouglasKell,UniversityofManchester
Thecontributingauthorsinthisvolumehave
doneanextraordinaryjobofhelpingtorefinean
understandingofthisnewparadigmfroma
varietyofdisciplinaryperspectives.
GordonBell,MicrosoftResearch
http://research.microsoft.com/fourthparadigm/
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Listed7keyareasforactionbyFundingAgencies:1.Fundbothdevelopmentandsupportofsoftware
tools2.Investatalllevelsofthefindingpyramid3.Funddevelopmentofgeneric
Laboratory
InformationManagementSystems4.Fundresearchintoscientificdatamanagement,
dataanalysis,datavisualization,newalgorithms andtools
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Remainingthreekeyareasforactionrelateto thefutureofScholarlyCommunicationand
Libraries:5.EstablishDigitalLibrariesthatsupporttheother
sciencesliketheNLMdoesforMedicine6.Funddevelopmentofnewauthoringtoolsand
publicationmodels7.Exploredevelopmentofdigitaldatalibraries
thatcontainscientificdata(notjustthe metadata)andsupportintegrationwith publishedliterature
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Developing a Sustainable e-Infrastructure
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
AcceleratingtimetoinsightAcceleratingtimetoinsight withAdvancedResearchToolsandServiceswithAdvancedResearchToolsandServices
Ourgoalisto
accelerateresearchbycollaboratingwith academiccommunitiestouseadvancedcomputer
scienceresearchtechnologies
AimtohelpscientistsspendlesstimeonITissuesand moretimeonsciencebycreatingopentoolsand
servicesbasedonMicrosoftplatformsandproductivity software
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
DataAcquisitionandModelingDataAcquisitionandModeling
LifeUnderYourFeetResearchersatTheJohnsHopkinsUniversity
aredeployinglargearraysofwirelesssoil sensorsinavarietyofenvironmentalsettings,
includingapark,anurbanforestandawetland. Thenetworksenablescientiststomonitor
ecologicalchangesonanunprecedentedscale andofferinsightsintohydrology,greenhouse
gasesandtheactivityoforganismsinthesoil.
TheSwissExperimentPowerfulSoftwareImprovesEnvironmental
ForecastingEnvironmentalscientistsfacemanychallenges inmonitoringandunderstandingourplanets
changingclimate.Throughaninternational collaborationcalledtheSwissExperiment,
environmentalscientistsandcomputerscience expertsaredeployingadvancedsensornetworks
anddatamanagementtoolstoimprove environmentalmonitoringandforecasting.
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
CollaborationandVisualizationCollaborationandVisualization
SciScope
SpeedsDataRetrievalfrom
MultipleRepositoriesForenvironmentalscientistsandengineers,
findingandretrievingrelevantdatacanbea dauntingandtedioustask.MicrosoftResearchis
developinganonlinesearchenginecalled SciScope
thatenablesresearcherstosearch
multipledatarepositoriessimultaneouslyand retrieveinformationinaconsistentformat.
ResearchInformationCenterCollaborationandinformationsharingamong
researchersareamongthemostimportantbut challengingaspectsofscientificresearch.In
recentyears,scientistshavebegunusing virtualresearchenvironments
toexchange
informationwithcolleaguesinspecificareasof study.MicrosoftResearchandTheBritish
LibraryareteaminguptobuildtheResearch InformationCentre.
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
AnalysisandDataMiningAnalysisandDataMining
PhyloD
StatisticaltoolusedtoanalyzeDNAofHIVfrom
largestudiesofinfectedpatientsTypicaljob,10 20CPUhourswithextreme jobsrequiring1K 2KCPUhours VeryCPUefficient
Requiresalargenumberoftestrunsfora
givenjob(1 10Mtests) Highlycompresseddataperjob(~100KB perjob)
Trident
AScientificWorkflowWorkbenchBringsClarity
toDataScientistsattheUniversityofWashingtonare workingwithMicrosoftExternalResearchto
demonstratehowmarryingvisualizationand workflowtechnologiescanallowresearchersto
bettermanage,evaluateandinteractwitheven themostcomplexscientificdatasets.
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
DisseminateandShareDisseminateandShare
Chem4WordChemistryDrawinginWordCreatedincollaborationwithUniversityof
Cambridge;PeterMurrayRust,et.al.
Relationships:Navigateand
linkreferencedchemistry
Relationships:Navigateand
linkreferencedchemistry
Data:Semantics
storedinChemistry
MarkupLanguage
Data:Semantics
storedinChemistry
MarkupLanguage
Intent:Recognizes
chemicaldictionaryand
ontologyterms
Author/edit1Dand2Dchemistry.
Changechemicallayoutstyles.
Author/edit1Dand2Dchemistry.
Changechemicallayoutstyles.
Intelligence:Verifiesvalidityof
authoredchemistry
Intelligence:Verifiesvalidityof
authoredchemistry
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
DisseminateandShareDisseminateandShare
OntologyPlugInforWord
PhilBourne LynnFink
Relationships:
Ontology browser
Relationships:
Ontology browser
Intent:Term
recognition &disambiguation
Intent:Term
recognition &disambiguation
JohnWilbanks
Services:
Ontology
downloadwebservice
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
ArchivingandPreservationArchivingandPreservation
Asemanticcomputingplatformtostore
andexposerelationshipsbetweendigital assets
Asemanticcomputingplatformtostore
andexposerelationshipsbetweendigital assets
Flexibledatamodel
enablesmanyscenarios andcanbeeasilyextended
overtime
Flexibledatamodel
enablesmanyscenarios andcanbeeasilyextended
overtime
NativesupportforRSS,OAIPMH,OAI
ORE,AtomPubandSWORDDefaultwebUIwithCSS
supportandcustomASP.Net controls
DefaultwebUIwithCSS
supportandcustomASP.Net controlsZentity
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
ArchivingandPreservationArchivingandPreservation
moleculestext
experiments
measurementsdocuments
datamolecules
data
scientists
Mashup(reuse)dataMashup(reuse)data
SemanticstorageSemanticstorage
Compounddocument
authoring
Compounddocument
authoring
oreChem
theChemicalSemanticWeb
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Networkanalysisisofgrowing importanceinacademic,
commercial,andInternet socialmediacontexts
ExistingSocialNetworkTools arechallengingformany
noviceusers ToolslikeExcelarewidely
used Leveragingaspreadsheetasa
hostforSocialNetwork Analysislowersbarriersto
networkdataanalysisand display
Leveragespreadsheetforstorageofedgeandvertexdata
Applydynamicfilterstothedata
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Intent:InsertCreativeCommons
licensesfromwithinOffice2007
Intent:InsertCreativeCommons
licensesfromwithinOffice2007
Relationships:licenseinformation
storedasRDFXMLwithinthe documentOOXML
Relationships:licenseinformation
storedasRDFXMLwithinthe documentOOXML
http://ccaddin2007.codeplex.com
Services:Integrateswith
CreativeCommonsWebAPI tocreatenewlicenses
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
The Future Research e-Infrastructure: Client + Cloud
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
StatisticaltoolusedtoanalyzeDNAofHIVfrom largestudiesofinfectedpatients
PhyloD
wasdevelopedbyMicrosoftResearchand hasbeenhighlyimpactful
Smallbutimportantgroupofresearchers 100sofHIVandHepC
researchersactivelyuseit
1000sofresearchcommunitiesrelyontheseresults
Typicaljob,10 20CPUhourswithextremejobsrequiring1K 2KCPUhours VeryCPUefficient Requiresalargenumberoftestrunsforagivenjob(1
10Mtests)
Highlycompresseddataperjob(~100KBperjob)
PhyloD nowportedasWindowsAzureCloudService Cloudenablesagiledeploymentofscalablescientificservices
CoverofPLoS
Biology
November2008
CourtesyofRogerBarga
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Scientist
Source
Metadata
ScientificResults
AzureMODISServiceWebRolePortal
RequestQueue
SourceImageryDownloadSites
...
Sciencepipelinefordownload,initialprocessing
andreductionofsatelliteimagery.Developedby
MSR,UVa,UCB.
Dramaticallylowersresourceandcomplexity
barrierstousesatelliteimageryforterrestrial
hydrologyandgeoscience.
Commonimagerylocationdeterminationand
uploadfromdiversesources
Commonreprojection
andharmonizationto
producesciencereadyimagerywiththesame
length,timeandqualityattributes
Optionalscientistprovidedreductionalgorithm
(.NET,Java,orMatLab)
Ondemandscalabilitybeyondlocaldesktopor
cluster
Inusenowtocompute10yearcontinentalscale
waterbalanceforNorthAmerica.Peryear:
500GB(~60Kfiles)uploadof9differentsource
imageryproductsfrom15differentlocations
400GBreprojected
harmonizedimagery
consuming~3500cpu
hours
5GBreducedscienceresultleveragingreported
fielddataaggregatesconsuming~60cpu
hour
Additionalsciencerequestspending ExpandingabovetoEurope Additionalsourceimageryproductsand
formats
Reprojection
Queue
ReductionQueue
DataCollectionStage
Reprojection
Stage
Analysis/ReductionStage
CatharinevanIngen(MSR),JieLi,MartyHumphreys
(UVA),YoungryelRyu(UCB),DebAgarwal(BWC/LBL)
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
LedbyNewcastleUniversity,UK(PaulWatson), projectsupportedbyER
Investigatingapplicabilityofcommercialcloudsforscientific
research Buildaworkingprototypeforusecasesinchemo
informatics UsesMicrosofttechnologiestobuildsciencerelated
services(WindowsAzure,Silverlight)
Builtinitialproofofconcept Silverlight
UIforbasicQuantitativeStructure
AnalysisRelationship(QSAR)modeling DemonstratedabilitytoscaleQSARcomputations
inWindowsAzure
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Aknowledgeecosystem: Aricherauthoringexperience Anecosystemofservices Semanticstorage Open,Collaborative,
Interoperable,andAutomatic
Data/informationisinter connectedthroughmachine
interpretableinformation(e.g. paperX
isaboutstarY)
Socialnetworksareaspecialcase ofdatameshes
Attribution:ChrisBizer
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
scholarly
communications
scholarly
communications
domainspecificservicesdomainspecificservices
instant
messaging
instant
messaging
identityidentity
documentstoredocumentstore
blogs&
socialnetworking
blogs&
socialnetworking
mailmail
notificationnotification
searchbooks
citations
searchbooks
citations
visualizationand
analysisservices
visualizationand
analysisservices
storage/data
services
storage/data
services
computeservices
virtualization
computeservices
virtualization
Project
management
Project
management
Reference
management
Reference
management
knowledge
management
knowledge
management
knowledge
discovery
knowledge
discovery
VisionofFutureResearcheInfrastructureusingClient+Cloudresources
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
Thesitecontainsaccessanddownloadsofrelevantopentoolsand
resourcesfortheworldwideacademicresearchcommunity.Examplesof ouropentoolsandservices:
PluginsforOfficeOntologyAddinforWordArticleAuthoringAddinforWordChem4Word ChemistryDrawinginWord
MicrosoftBiologyFoundationMBFEnablesandacceleratesfundamentaladvancesinbiology
F#CollaborationwiththeacademicandresearchcommunityonF#stypedfunctionaland
objectorientedprogrammingonthe.NETplatform
SoftwareEngineeringToolsSpec#:ProgramverifierforC#extendedwithdesignbycontractVCC:ProgramverifierforConcurrentCPEX:automaticunittestingtoolfor.NETCHESS:UnittestingtoolsforconcurrentWin32executableand.NET
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.
MicrosoftResearch http://research.microsoft.com MicrosoftResearchdownloads:http://research.microsoft.com/research/downloads
MicrosoftExternalResearch http://research.microsoft.com/externalresearch
ScienceatMicrosoft http://www.microsoft.com/science
CodePlex http://www.codeplex.com
TheFacultyConnection http://www.microsoft.com/education/facultyconnection
MSDNAcademicAlliance http://msdn.microsoft.com/enus/academic
-
ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.ThisworkislicensedunderaCreativeCommonsAttribution3.0UnitedStatesLicense.