knime for the life sciences cambridge meetup · copyright © 2016 knime.com agcopyright © 2014...
TRANSCRIPT
Copyright © 2016 KNIME.com AG
KNIMEforthelifesciencesCambridgeMeetup
GregLandrum,Ph.D.
KNIME.comAG
12July2016
Copyright © 2014 KNIME.com AG 2Copyright©2016KNIME.com AG
• WhatisKNIME?
• Abitofmotivation:toolblending,datablending,documentation,automation,reproducibility
• Moreaboutthecompanyandthecommunity
• Somehighlightsfromrecentreleases
Copyright © 2014 KNIME.com AG 3Copyright©2016KNIME.com AG
TheKNIME®AnalyticsPlatform
Copyright © 2014 KNIME.com AG 4Copyright©2016KNIME.com AG
• NODESperformtasksondata
• NodesarecombinedtocreateWORKFLOWS
Status
VisualKNIMEWorkflows
Inputs OutputsNotConfigured
Idle
Executed
Error
Copyright © 2014 KNIME.com AG 5Copyright©2016KNIME.com AG
Analysis &MiningStatisticsDataMiningMachineLearningWebAnalyticsTextMiningNetwork AnalysisSocialMedia AnalysisR,Weka, PythonCommunity /3rd
Data AccessMySQL,Oracle, ...SAS,SPSS, ...Excel, Flat, ...Hive, Impala, ...XML,JSON,PMMLText, Doc,Image, ...WebCrawlersIndustrySpecificCommunity /3rd
TransformationRow,ColumnMatrixText, ImageTime SeriesJavaPythonCommunity /3rd
VisualizationRJFreeChartJavaScriptCommunity /3rd
DeploymentviaBIRTPMMLXML,JSONDatabasesExcel, Flat,etc.Text, Doc,ImageIndustrySpecificCommunity /3rd
Over1000nativeandembeddednodesincluded:
Copyright © 2014 KNIME.com AG 6Copyright©2016KNIME.com AG
Whyisthisimportant?
Realworlddataanalysis:
• Lotsofheterogeneousdatafrommultiplesources(datablending)
• Complexquestionstoaskofthedata
• Needtoapplymultipletools(toolblending)
Copyright © 2014 KNIME.com AG 7Copyright©2016KNIME.com AG
Theproblemisthatwedon’thaveoneoftheseforworkingwithourdata
Copyright © 2014 KNIME.com AG 8Copyright©2016KNIME.com AG
Theproblemisthatwedon’thaveoneoftheseforworkingwithourdata
It’samazinghowsimpleyoucanmakecomplexthingsifyoucontroltheentireprocess
Ifallofyourproblemslook
likethis: thenthisistheperfecttool:
Copyright © 2014 KNIME.com AG 9Copyright©2016KNIME.com AG
Wetendtoneedabroaderassortmentoftoolsforourdata
https://www.flickr.com/photos/mtneer_man/5247813293
Ifwe’reluckytheyarethiswellorganized…
Copyright © 2014 KNIME.com AG 10Copyright©2016KNIME.com AG
…butthisisalotmorecommon
10
https://www.flickr.com/photos/tilde-lifestyle-photog raphy/6906117081/
Copyright © 2014 KNIME.com AG 11Copyright©2016KNIME.com AG
Whyisthisimportant?
Realworlddataanalysis:
• Lotsofheterogeneousdatafrommultiplesources(datablending)
• Complexquestionstoaskofthedata
• Needtoapplymultipletools(toolblending)• Thingsthatwouldbegreat:
– Wedidn’thavetospendhalfourtimeconvertingfileformats
– Wecouldfigureoutlaterwhatwedid,repeatit,andsharethatwithothers
ThisiswhereKNIMEcomesin
Copyright © 2016 KNIME.com AG 12
KNIME:thecompany
Copyright © 2014 KNIME.com AG 13Copyright©2016KNIME.com AG
KNIME
• KNIME.comAGfoundedin2008• OfficesinZurich(HQ),Konstanz,Berlin,andSanFrancisco• 20+employees• MaintaineroftheOpenSourceKNIMEAnalyticsPlatform
– comprehensivedataloading,processing,analysis,modelingplatform– visualfrontend– open:toallsortsofdata,othertools(RandPython,a.o.),various
userpersonas– 20opensourcereleasessince2006– opensource.
• KNIMECommercialExtensionsforCollaboration,Productivity,Performance– 14commercialproductreleasessince2008
Copyright © 2014 KNIME.com AG 14Copyright©2016KNIME.com AG
BroadRangeofKNIMEApplicationAreas&Customers
AdvancedAnalytics
Pharma
HealthCare
Finance
Retail
CustomerIntelligence
Manu-facturing
Copyright © 2014 KNIME.com AG 15Copyright©2016KNIME.com AG
Happyusers!
Source:http://r4stats.com/2016/06/06/rexer-data-science-survey-satisfaction-results/
Copyright © 2014 KNIME.com AG 16Copyright©2016KNIME.com AG
KNIMEAnalyticsPlatform:TryitNow!
1. Downloadfromwww.knime.com2. BrowsetheKNIMELearningHub
atwww.knime.com/learning-hub3. Downloadyourfreecopyof
the“KNIMEBeginner’sGuide”from:www.knime.com/knimepress(usecode:KNIME_Boston2016)
4. VisitushereoratourForum:www.knime.com/forum
Copyright © 2016 KNIME.com AG 1717
TheKNIMEEcosystem
Copyright © 2014 KNIME.com AG 18Copyright©2016KNIME.com AG
KNIMESoftware
KNIMEcommercialextensionstotheplatformforcollaboration,productivity,performance
Copyright © 2014 KNIME.com AG 19Copyright©2016KNIME.com AG
KNIMEServer
Copyright © 2014 KNIME.com AG 20Copyright©2016KNIME.com AG
KNIMEBigDataExtensions(commerciallicenserequired!)
• KNIMEBigDataConnectors– Packagerequireddrivers/librariesforspecificHDFS,Hive,Impalaaccess
– Hive(BigDataExtension)
– ClouderaImpala(BigDataExtension)
– Extendstheopensourcedatabaseintegration
• KNIMESparkExecutor– PackagerequireddriverstosubmitSparkjobs
– WrapsSparkDBmanipulationsandMLlib modules
Copyright © 2014 KNIME.com AG 21Copyright©2016KNIME.com AG
BigDataConnectors
• SamemodeofoperationasthestandardKNIMEdatabaseconnectors
• Operationsareperformedwithinthedatabase
Copyright © 2014 KNIME.com AG 22Copyright©2016KNIME.com AG
KNIMESparkExecutor
• BasedonSparkMLlib
• Scalablemachinelearninglibrary
• RunsonHadoop
• Algorithmsfor– Classification(decisiontree,naïvebayes,…)
– Regression(logisticregression,linearregression,…)
– Clustering(k-means)
– Collaborativefiltering(ALS)
– Dimensionalityreduction(SVD,PCA)
Copyright © 2014 KNIME.com AG 23Copyright©2016KNIME.com AG
• Usagemodelanddialogssimilartoexistingnodes
• Nocodingrequired
FamiliarUsageModel
Copyright © 2016 KNIME.com AG 24
TheKNIMEcommunity
Copyright © 2014 KNIME.com AG 25Copyright©2016KNIME.com AG
Opennessandthecommunity
• Veryactiveusercommunity(check theforums)• >250peopleatthe2016KNIMESummitinBerlin
• TheKNIMEAnalyticsPlatformisbothopensourceandanopenplatform.
• Technologypartners:provideandsupportnodesfortheir(usuallycommercial)softare.Someexamples:Schrodinger,ChemAxon/InfoCom,CCG,Cresset
• Weencouragethecommunitytoproducenodes(orsetsofnodes)andsharethemwitheachother.
• “TrustedCommunityExtensions”forcommunitycontributionsthatmeetacertainqualitylevel.
Copyright © 2014 KNIME.com AG 26Copyright©2016KNIME.com AG
Someofthecommunitycontributions:
Thisisthesubsetmorerelevanttodrugdiscovery
Copyright © 2016 KNIME.com AG 27
HighlightsofrecentadditionsinKNIME3.1and3.2
Completelists:https://tech.knime.org/whats-new-in-knime-31https://tech.knime.org/whats-new-in-knime-32
Copyright © 2014 KNIME.com AG 28Copyright©2016KNIME.com AG
• DefaultExecution
• StreamingExecution– Row-wise
– Process,pass&forget→FasterwithlessI/Ooverhead
– Concurrentexecution
Streaming
Copyright © 2014 KNIME.com AG 29Copyright©2016KNIME.com AG
Streaming– Prosand Cons
Advantages
• LessI/Ooverhead(process,pass&forget)
• Parallelization
Disadvantages
• Nointermediateresults,nointeractiveexecution
• Notallnodescanbestreamed
Copyright © 2014 KNIME.com AG 30Copyright©2016KNIME.com AG
Trees/Forest/Ensembles
• Randomforestnode(simplificationofthetree-ensemblenode)
• Supportofbinarysplitsfornominalattributes
• Missingvaluehandling
• Supportofbytevectordata(high-dimensioncountfingerprints)
• Codeoptimization– Runtime
– Memory
Copyright © 2014 KNIME.com AG 31Copyright©2016KNIME.com AG
TreesandTreeEnsembles:Newnodes
• GradientBoosting– Alsobasedontreeensembles
– Boosting:Improvinganexistingmodelbyaddinganewmodel
– Shallowtrees
• RandomForestDistance– Distancemeasureinducedbyarandomforest
– Basedonproximity
Copyright © 2014 KNIME.com AG 32Copyright©2016KNIME.com AG
FeatureSelection
• Automatedhelpfornarrowingdownthebestsetoffeaturesforamodel
• Supportsforwardandbackwardselection
Copyright © 2014 KNIME.com AG 33Copyright©2016KNIME.com AG
Deeplearning4j– KNIMEIntegration
• Easynetworkarchitecturedesign
• Modular– Layerwise designofnetworks
• ModelImport/Export– Caffe Import
• Beginnerfriendly– Importpretrained networks
• Highlyconfigurable
• Supportsword2vecanddoc2vec
Copyright © 2014 KNIME.com AG 34Copyright©2016KNIME.com AG
Deeplearning4j– KNIMEIntegration
Copyright © 2014 KNIME.com AG 35Copyright©2016KNIME.com AG
ActiveLearning
• LabsExtension
• Involveusertoconstructtrainingdataset
• Workflowlooptoqueryandlabel‘interesting’datapoints
• Useduser-labeleddatasetonremainingdata
Copyright © 2014 KNIME.com AG 36Copyright©2016KNIME.com AG
RIntegration
• Rewriteofinfrastructure– Significantlyfaster
– Concurrentexecution
• Nochangeofusagemodel
Copyright © 2014 KNIME.com AG 37Copyright©2016KNIME.com AG
MongoDBandJSON(I)
• MongoDBisaNoSQLdatabasebasedonJSON• Specialsetofnodes
– duetolackofastandardSQLinterface
Copyright © 2014 KNIME.com AG 38Copyright©2016KNIME.com AG
MongoDBandJSON(II)
• JSONnodesforworkingwithJSONdata– SimilartotheXMLnodes
• UsecombinationofMongoDBandJSONnodes
Copyright © 2014 KNIME.com AG 39Copyright©2016KNIME.com AG
SemanticWeb/LinkedDataIntegration
• Accessandmanipulatesemanticwebresourcese.g.DBpedia
• ExecutesemanticqueriesviaSPARQL
• Usagemodelsimilartodatabaseintegration
Copyright © 2014 KNIME.com AG 40Copyright©2016KNIME.com AG
Othercoolstuff
• Workflowcoach:suggestsnextnodestouse
Copyright © 2014 KNIME.com AG 41Copyright©2016KNIME.com AG
Takehomes
• Openplatformbasedonopen-source softwarebackedbyacommercialentityprovidingenterpriseextensionsandsupport
• Strongfocusondatablending andtoolblending• Activeandengagedcommunity
• Greatsupportforlifesciences/chemistry fromthecommunity
Copyright © 2016 KNIME.com AG 42
Thanks!
Enjoytheothertalks.
Copyright © 2016 KNIME.com AG 43
14-16September,2016SanFranciscohttps://www.knime.org/fall-summit2016