there’s no avoiding it: programming skills you’ll need

Post on 18-Jan-2015

70 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Why bioresearchers need to learn SOME programming, and how to go about it

TRANSCRIPT

Yannick Pouliot, PhD 10/14/2011

There’s No Avoiding It: Programming Skills You’ll Need

Three Things I want To Impress

•Why software programming is essential for bioresearch▫… as essential as knowing how to use a

pipette•Why you should partially dump Excel and

use a relational database•Why the Cloud is your friend

•Free software!•Free algorithms!•Pre-coded algorithms (i.e., packages)!•Very cheap computing power!

The Good News

The Bad News

•Dunno how to use•“Not talented”•“Not enough time”•(can’t be bothered)

▫e.g., reading the paper describing the software tool one is relying on

More Good News

•Not that hard•Lots and lots of good resources•Read a book, dammit•Find a buddy•Use Cloud instances (preconfigured

machines)▫Can even be free!

The Quest For Situation-Appropriate Storage & Computation

Or, when Excel fails you

Some Questions…

1. Do you use MS Excel?2. How much time do you spend using it?3. Are you good at it? Be honest…4. Have you ever read a book or tutorial on

Excel?5. So how are you going to improve your

ability?

Are You an Excelaholic?

•Do you have an unhealthy dependence on Excel?▫Do you use Excel to store data?▫Do you feel like you’re making Excel jump

through hoops to perform your calculations? Do you have a vague feeling of shame as a

result?

The Worst Case (More Frequent Than You’d Wish)

•Postdoc uses Excel to keep track of complex experiment involving two external groups

•Eventually realizes that data stored in Excel were corrupted (“paste failure”)▫Result: it took her six months to recover

•She now uses FileMaker (relational database)

The Next Level Up: Relational Databases Take Your Pick

A Real Example From Yours Truly

But You Also Need Programming…

Why Programming?

•Address small problems that can nail you•Address bigger problems by standing on

the shoulders of giants•Flexibility: If you’re doing “real” science,

off-the-shelf software will fail you every time▫80% rule…

Don’t Try This With Excel

•Millions of reads compared against mouse transcriptome• Determining number of distinct species and frequency of members in each• Summarize using plots for each codon

Remember SQL?

The Quest For Power

Heard at lab meeting:

“I would have shown you this graph

but Excel crashed while computing a big file”

→You can’t do this (censored) on your laptop anymore

Welcome To The Cloud

Why Own When You Can Rent?

An Example: PathSeq•Compare millions of short-read sequences

against all genomic + transcriptomic sequences for all microbes (!)

Amazon Cloud “Management Console”

Why The Cloud Matters For Biologists

• You can purchase as much computing power as you need▫You don’t have to run/manage what you don’t use

• Your purchasing computing power, not machines▫ never outdated

• Can easily migrate from one machine type to another (minutes)

• Can add storage in seconds• Accessible from anywhere• Easy to share e.g., (large) datasets with others

04/10/2023

23

WEKA: the software

•Machine learning/data mining software written in Java (distributed under the GNU Public License)

•Used for research, education, and applications•Complements “Data Mining” by Witten & Frank•Main features:

▫Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

▫Graphical user interfaces (incl. data visualization)▫Environment for comparing learning algorithms

04/10/2023University of Waikato

24

04/10/2023

25

University of Waikato

Explorer: building “classifiers”

•Classifiers in WEKA are models for predicting nominal or numeric quantities

•Implemented learning schemes include:▫Decision trees and lists, instance-based

classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

•“Meta”-classifiers include:▫Bagging, boosting, stacking, error-

correcting output codes, locally weighted learning, …

04/10/2023University of Waikato

26

04/10/2023University of Waikato

27

04/10/2023University of Waikato

28

top related