knime for the life sciences cambridge meetup · copyright © 2016 knime.com agcopyright © 2014...

43
Copyright © 2016 KNIME.com AG KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016

Upload: others

Post on 18-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG

KNIMEforthelifesciencesCambridgeMeetup

GregLandrum,Ph.D.

KNIME.comAG

12July2016

Page 2: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 2Copyright©2016KNIME.com AG

• WhatisKNIME?

• Abitofmotivation:toolblending,datablending,documentation,automation,reproducibility

• Moreaboutthecompanyandthecommunity

• Somehighlightsfromrecentreleases

Page 3: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 3Copyright©2016KNIME.com AG

TheKNIME®AnalyticsPlatform

Page 4: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 4Copyright©2016KNIME.com AG

• NODESperformtasksondata

• NodesarecombinedtocreateWORKFLOWS

Status

VisualKNIMEWorkflows

Inputs OutputsNotConfigured

Idle

Executed

Error

Page 5: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 5Copyright©2016KNIME.com AG

Analysis &MiningStatisticsDataMiningMachineLearningWebAnalyticsTextMiningNetwork AnalysisSocialMedia AnalysisR,Weka, PythonCommunity /3rd

Data AccessMySQL,Oracle, ...SAS,SPSS, ...Excel, Flat, ...Hive, Impala, ...XML,JSON,PMMLText, Doc,Image, ...WebCrawlersIndustrySpecificCommunity /3rd

TransformationRow,ColumnMatrixText, ImageTime SeriesJavaPythonCommunity /3rd

VisualizationRJFreeChartJavaScriptCommunity /3rd

DeploymentviaBIRTPMMLXML,JSONDatabasesExcel, Flat,etc.Text, Doc,ImageIndustrySpecificCommunity /3rd

Over1000nativeandembeddednodesincluded:

Page 6: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 6Copyright©2016KNIME.com AG

Whyisthisimportant?

Realworlddataanalysis:

• Lotsofheterogeneousdatafrommultiplesources(datablending)

• Complexquestionstoaskofthedata

• Needtoapplymultipletools(toolblending)

Page 7: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 7Copyright©2016KNIME.com AG

Theproblemisthatwedon’thaveoneoftheseforworkingwithourdata

Page 8: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 8Copyright©2016KNIME.com AG

Theproblemisthatwedon’thaveoneoftheseforworkingwithourdata

It’samazinghowsimpleyoucanmakecomplexthingsifyoucontroltheentireprocess

Ifallofyourproblemslook

likethis: thenthisistheperfecttool:

Page 9: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 9Copyright©2016KNIME.com AG

Wetendtoneedabroaderassortmentoftoolsforourdata

https://www.flickr.com/photos/mtneer_man/5247813293

Ifwe’reluckytheyarethiswellorganized…

Page 10: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 10Copyright©2016KNIME.com AG

…butthisisalotmorecommon

10

https://www.flickr.com/photos/tilde-lifestyle-photog raphy/6906117081/

Page 11: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 11Copyright©2016KNIME.com AG

Whyisthisimportant?

Realworlddataanalysis:

• Lotsofheterogeneousdatafrommultiplesources(datablending)

• Complexquestionstoaskofthedata

• Needtoapplymultipletools(toolblending)• Thingsthatwouldbegreat:

– Wedidn’thavetospendhalfourtimeconvertingfileformats

– Wecouldfigureoutlaterwhatwedid,repeatit,andsharethatwithothers

ThisiswhereKNIMEcomesin

Page 12: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 12

KNIME:thecompany

Page 13: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 13Copyright©2016KNIME.com AG

KNIME

• KNIME.comAGfoundedin2008• OfficesinZurich(HQ),Konstanz,Berlin,andSanFrancisco• 20+employees• MaintaineroftheOpenSourceKNIMEAnalyticsPlatform

– comprehensivedataloading,processing,analysis,modelingplatform– visualfrontend– open:toallsortsofdata,othertools(RandPython,a.o.),various

userpersonas– 20opensourcereleasessince2006– opensource.

• KNIMECommercialExtensionsforCollaboration,Productivity,Performance– 14commercialproductreleasessince2008

Page 14: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 14Copyright©2016KNIME.com AG

BroadRangeofKNIMEApplicationAreas&Customers

AdvancedAnalytics

Pharma

HealthCare

Finance

Retail

CustomerIntelligence

Manu-facturing

Page 15: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 15Copyright©2016KNIME.com AG

Happyusers!

Source:http://r4stats.com/2016/06/06/rexer-data-science-survey-satisfaction-results/

Page 16: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 16Copyright©2016KNIME.com AG

KNIMEAnalyticsPlatform:TryitNow!

1. Downloadfromwww.knime.com2. BrowsetheKNIMELearningHub

atwww.knime.com/learning-hub3. Downloadyourfreecopyof

the“KNIMEBeginner’sGuide”from:www.knime.com/knimepress(usecode:KNIME_Boston2016)

4. VisitushereoratourForum:www.knime.com/forum

Page 17: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 1717

TheKNIMEEcosystem

Page 18: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 18Copyright©2016KNIME.com AG

KNIMESoftware

KNIMEcommercialextensionstotheplatformforcollaboration,productivity,performance

Page 19: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 19Copyright©2016KNIME.com AG

KNIMEServer

Page 20: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 20Copyright©2016KNIME.com AG

KNIMEBigDataExtensions(commerciallicenserequired!)

• KNIMEBigDataConnectors– Packagerequireddrivers/librariesforspecificHDFS,Hive,Impalaaccess

– Hive(BigDataExtension)

– ClouderaImpala(BigDataExtension)

– Extendstheopensourcedatabaseintegration

• KNIMESparkExecutor– PackagerequireddriverstosubmitSparkjobs

– WrapsSparkDBmanipulationsandMLlib modules

Page 21: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 21Copyright©2016KNIME.com AG

BigDataConnectors

• SamemodeofoperationasthestandardKNIMEdatabaseconnectors

• Operationsareperformedwithinthedatabase

Page 22: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 22Copyright©2016KNIME.com AG

KNIMESparkExecutor

• BasedonSparkMLlib

• Scalablemachinelearninglibrary

• RunsonHadoop

• Algorithmsfor– Classification(decisiontree,naïvebayes,…)

– Regression(logisticregression,linearregression,…)

– Clustering(k-means)

– Collaborativefiltering(ALS)

– Dimensionalityreduction(SVD,PCA)

Page 23: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 23Copyright©2016KNIME.com AG

• Usagemodelanddialogssimilartoexistingnodes

• Nocodingrequired

FamiliarUsageModel

Page 24: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 24

TheKNIMEcommunity

Page 25: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 25Copyright©2016KNIME.com AG

Opennessandthecommunity

• Veryactiveusercommunity(check theforums)• >250peopleatthe2016KNIMESummitinBerlin

• TheKNIMEAnalyticsPlatformisbothopensourceandanopenplatform.

• Technologypartners:provideandsupportnodesfortheir(usuallycommercial)softare.Someexamples:Schrodinger,ChemAxon/InfoCom,CCG,Cresset

• Weencouragethecommunitytoproducenodes(orsetsofnodes)andsharethemwitheachother.

• “TrustedCommunityExtensions”forcommunitycontributionsthatmeetacertainqualitylevel.

Page 26: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 26Copyright©2016KNIME.com AG

Someofthecommunitycontributions:

Thisisthesubsetmorerelevanttodrugdiscovery

Page 27: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 27

HighlightsofrecentadditionsinKNIME3.1and3.2

Completelists:https://tech.knime.org/whats-new-in-knime-31https://tech.knime.org/whats-new-in-knime-32

Page 28: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 28Copyright©2016KNIME.com AG

• DefaultExecution

• StreamingExecution– Row-wise

– Process,pass&forget→FasterwithlessI/Ooverhead

– Concurrentexecution

Streaming

Page 29: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 29Copyright©2016KNIME.com AG

Streaming– Prosand Cons

Advantages

• LessI/Ooverhead(process,pass&forget)

• Parallelization

Disadvantages

• Nointermediateresults,nointeractiveexecution

• Notallnodescanbestreamed

Page 30: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 30Copyright©2016KNIME.com AG

Trees/Forest/Ensembles

• Randomforestnode(simplificationofthetree-ensemblenode)

• Supportofbinarysplitsfornominalattributes

• Missingvaluehandling

• Supportofbytevectordata(high-dimensioncountfingerprints)

• Codeoptimization– Runtime

– Memory

Page 31: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 31Copyright©2016KNIME.com AG

TreesandTreeEnsembles:Newnodes

• GradientBoosting– Alsobasedontreeensembles

– Boosting:Improvinganexistingmodelbyaddinganewmodel

– Shallowtrees

• RandomForestDistance– Distancemeasureinducedbyarandomforest

– Basedonproximity

Page 32: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 32Copyright©2016KNIME.com AG

FeatureSelection

• Automatedhelpfornarrowingdownthebestsetoffeaturesforamodel

• Supportsforwardandbackwardselection

Page 33: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 33Copyright©2016KNIME.com AG

Deeplearning4j– KNIMEIntegration

• Easynetworkarchitecturedesign

• Modular– Layerwise designofnetworks

• ModelImport/Export– Caffe Import

• Beginnerfriendly– Importpretrained networks

• Highlyconfigurable

• Supportsword2vecanddoc2vec

Page 34: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 34Copyright©2016KNIME.com AG

Deeplearning4j– KNIMEIntegration

Page 35: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 35Copyright©2016KNIME.com AG

ActiveLearning

• LabsExtension

• Involveusertoconstructtrainingdataset

• Workflowlooptoqueryandlabel‘interesting’datapoints

• Useduser-labeleddatasetonremainingdata

Page 36: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 36Copyright©2016KNIME.com AG

RIntegration

• Rewriteofinfrastructure– Significantlyfaster

– Concurrentexecution

• Nochangeofusagemodel

Page 37: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 37Copyright©2016KNIME.com AG

MongoDBandJSON(I)

• MongoDBisaNoSQLdatabasebasedonJSON• Specialsetofnodes

– duetolackofastandardSQLinterface

Page 38: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 38Copyright©2016KNIME.com AG

MongoDBandJSON(II)

• JSONnodesforworkingwithJSONdata– SimilartotheXMLnodes

• UsecombinationofMongoDBandJSONnodes

Page 39: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 39Copyright©2016KNIME.com AG

SemanticWeb/LinkedDataIntegration

• Accessandmanipulatesemanticwebresourcese.g.DBpedia

• ExecutesemanticqueriesviaSPARQL

• Usagemodelsimilartodatabaseintegration

Page 40: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 40Copyright©2016KNIME.com AG

Othercoolstuff

• Workflowcoach:suggestsnextnodestouse

Page 41: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2014 KNIME.com AG 41Copyright©2016KNIME.com AG

Takehomes

• Openplatformbasedonopen-source softwarebackedbyacommercialentityprovidingenterpriseextensionsandsupport

• Strongfocusondatablending andtoolblending• Activeandengagedcommunity

• Greatsupportforlifesciences/chemistry fromthecommunity

Page 42: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 42

Thanks!

Enjoytheothertalks.

Page 43: KNIME for the life sciences Cambridge Meetup · Copyright © 2016 KNIME.com AGCopyright © 2014 KNIME.com AG 5 Analysis & Mining Statistics Data Mining Machine Learning Web Analytics

Copyright © 2016 KNIME.com AG 43

14-16September,2016SanFranciscohttps://www.knime.org/fall-summit2016