medchemica bigdata what is that all about?

27
MedChemica BigData ‘What is that ALL about?’ Al Dossetter [email protected] MedChemica Limited Macclesfield Sci Bar 25 th April 2016

Upload: al-dossetter

Post on 16-Feb-2017

37 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: MedChemica BigData What Is That All About?

MedChemica BigData

‘What is that ALL about?’

Al Dossetter [email protected] MedChemica Limited Macclesfield Sci Bar 25th April 2016

Page 2: MedChemica BigData What Is That All About?

Big Data – ‘What is that all about?’

•  Introduction to Big Data

•  Examples from History

•  Big Data and science

•  MedChemica – advancing drug design through actionable knowledge

Page 3: MedChemica BigData What Is That All About?

About Us Passionate about generating better decisions from data

Dr Andrew G. Leach Technical Director Liverpool John Moores 12 years experience Applied computational and medicinal chemistry

Dr Ed Griffen Technical Director 21 years experience Medicinal chemistry and large scale statistical analysis methods

Dr Al Dossetter Managing Director 17 years Medicinal chemistry and extensive cloud computing experience

Dr Ali Griffen Business Analyst PhD Fungal Vascular wilt disease 21 years experience Team leader bioscientist and biological data curation

Dr Shane Montague Lead Data Scientist PhD Computer Science 13 years experience Data science and information security

Dr Jia Wu Consultant Data Scientist PhD Machine Learning 12 years experience in data mining and machine learning. Projects in finance, energy and criminology.

Page 4: MedChemica BigData What Is That All About?

Best Definition of Big Data •  Any analysis of a data set that is too large to

do by hand –  Requires computational techniques –  Requires statistical techniques

•  Yields –  Knowledge -  Knowledge that can be counter intuitive

  It got ‘Big’ because: -  the internet made a lot of data available very

quickly (often for free)   It got interesting because:

-  Knowledge yields real benefits to the bottom line -  Reduce costs or Increased sales

  You the consumer benefit…. -  Cheaper goods, available on-line -  Flights on time, trains on time, deliveries on time

Page 5: MedChemica BigData What Is That All About?

Big Data “The Revolution that will change the world we live in” •  Principles of Big Data –  Use ALL of the Data

•  however noisy –  Analyse in an unbiased way –  “DO WHAT” it tells you

•  Do Not Worry About “WHY” –  KEEP everything

•  ‘you never know what question you want to ask’

Page 6: MedChemica BigData What Is That All About?

The  4  Vs  

•  Picture  from  Google  or  someone  •  What  does  it  mean?  •  Mostly  it  is  about  using  lots  of  computers  

Most issues are sorted out by more CPUs, more drive space, and better stats

Page 7: MedChemica BigData What Is That All About?

Its actually been around quite a while…

•  It was genius to break the codes •  Further genius of collating the data and reducing it so

that analysts can use in a timely manner (volume / velocity / veracity)

•  ….saved many many lives on both sides

Page 8: MedChemica BigData What Is That All About?

….and banking, finance and trading

Page 9: MedChemica BigData What Is That All About?

What do Nappies and Beer have in common? •  Analysis of shopping habits found these two things were bought together •  Put them close together in the store and sell more

+

=

Page 10: MedChemica BigData What Is That All About?

UPS delivery service •  Fitted sensors to all delivery

trucks and gathered data •  Analysed data to detect

early engine issues BEFORE breakdown

•  Therefore FIX early and keep the van on the road

•  The customer benefits

because: •  Deliveries on-time

•  Even larger dataset – high degree of predicition on deliver times

Page 11: MedChemica BigData What Is That All About?

Jet Engines – reliable service •  Sensors on jet engines – monitored in flight •  Similar to UPS •  Therefore FIX early and keep the planes in the air •  The customer benefits because: •  Flights on time and reliable

Page 12: MedChemica BigData What Is That All About?

Google translate The Unreasonable Effectiveness of Data

“Because of a huge shared cognitive and cultural context, linguistic expression can be highly ambiguous and still often be understood correctly.”

         •  h@ps://en.wikipedia.org/wiki/File:Google_Translate_Icon.png  •  h@ps://en.wikipedia.org/wiki/Google_Translate  •  h@ps://www.youtube.com/watch?v=yvDCzhbjYWs  •  University  of  BriQsh  Columbia  DisQnguished  Lecture  Series  -­‐  Sept  23rd  2011  

Groups or pairs of words associated together on websites around the internet Statistical analyse of frequency of pairing Therefore this word (or group) probably translates into this word

Page 13: MedChemica BigData What Is That All About?

What about science? We need to be accurate (don’t we?)

•  Large Hadron Collider shows how we can gather a lot of data very accurately

•  Large amount needs to reduce the errors – very very big data

Page 14: MedChemica BigData What Is That All About?

The Life Science industry has woken up to Big Data

•  Human Genome •  Biological systems •  Kinome •  Metabolomics •  Proteomics •  3D structural information (CDC /

Protein Data Bank) •  Literature and Patents (GVK Bio,

ChEMBL, Pubmed, PubChem) •  Reaction infomatics – what works,

what doesn’t •  Document management •  Regulatory submissions Huge Opportunity in this area  

Page 15: MedChemica BigData What Is That All About?

What about life sciences?

•  Hard and harder to discover drugs. •  They have to work •  They have to be safe •  People want them cheaply

•  A description of the drug research and development process

Page 16: MedChemica BigData What Is That All About?

Company Ticker Number of drugs approved

R&D Spending Per Drug ($Mil)

Total R&D Spending 1997-2011 ($Mil)

AstraZeneca AZN 5 11,790.93 58,955

GlaxoSmithKline GSK 10 8,170.81 81,708

Sanofi SNY 8 7,909.26 63,274

Pfizer Inc. PFE 14 7,727.03 108,178

Roche Holding AG RHHBY 11 7,803.77 85,841

Johnson & Johnson JNJ 15 5,885.65 88,285

Eli Lilly & Co. LLY 11 4,577.04 50,347

Abbott Laboratories ABT 8 4,496.21 35,970

Merck & Co Inc MRK 16 4,209.99 67,360

Bristol-Myers Squibb Co.

BMY 11 4,152.26 45,675

Novartis AG NVS 21 3,983.13 83,646

Amgen Inc. AMGN 9 3,692.14 33,229

Sources: InnoThink Center For Research In Biomedical Innovation; Thomson Reuters Fundamentals via FactSet Research Systems

The Truly Staggering Cost Of Inventing New Drugs Matthew Herper - Forbes

Drug failures later in development are mainly due to EFFICACY and SAFETY

Page 17: MedChemica BigData What Is That All About?
Page 18: MedChemica BigData What Is That All About?

Actual spending – all LO projects are biggest spend

Paul, S. M. et al How to improve R&D productivity: the pharmaceutical industry’s grand challenge, Nat. Rev. Drug Discovery 2010, 9, 203

Snap-Shot of a medium sized companies R&D spend in one year - $1.7 billion

For a period large pharma set targets at each stage of the process – an attrition model - unsuccessful and very wasteful

Better chemistry Reduce the number of projects

Chemistry influence success and speed Methods that really work, new formulations

Page 19: MedChemica BigData What Is That All About?

What Causes Attrition in Development?

PK 7%

Lack of efficacy in

man 46%

Adverse effects in man

17%

Animal toxicity 16%

Commercial reasons

7%

Miscellaneous 7%

Many compounds fail in development through inadequate

pharmacokinetics / bioavailability and unacceptable

toxicological profiles in addition to lack of efficacy in man

Page 20: MedChemica BigData What Is That All About?

liver

kidneys

bladder Dissolve

Cross Membranes

Metabolism

Avoid Excretion

Oral Dosing of Drugs

BBB (Blood Brain Barrier)

Target (maybe in the brain)

Survive pH range 1.5-8

Absorption Distribution Metabolism Excretion Toxicity

Page 21: MedChemica BigData What Is That All About?

Roche Data

rule finder

Roche Database

Genentech Data

rule finder

Genentech Data

AZ Data

rule finder

AZ Database

Grand Rule Database

Grand Rule database Better medicinal chemistry by sharing knowledge not data & structures

MedChemica

Grand Rule Database

Grand Rule Database

Grand Rule Database

AZ  ExploitaQon  

Roche  ExploitaQon  

Genentech  ExploitaQon  

Pharma 4 Data

rule finder

Pharma 4 Data

Grand Rule Database

Pharma  4  ExploitaQon  

Grand Rule Database

Pharma 5 Data

rule finder

Pharma 5 Data

Grand Rule Database

Pharma  5  ExploitaQon  

Grand Rule Database

>500  million  pairs  from  companies  +  12  million  from  public  data  

Page 22: MedChemica BigData What Is That All About?

…so what are you going to

make next…?

Page 23: MedChemica BigData What Is That All About?

Who  is  GOOD  at  Big  Data?  The  people  making  the  money!  

Chemical  transform  

to  improve  metabolism  

Chemists who wanted to fix metabolism also made these…

R  =  

SaltTraX© -­‐  [email protected]      [email protected]    

Page 24: MedChemica BigData What Is That All About?

What  about  clinical  safety?  

SAFE  DRUGS  

‘Potency’  Do  not  sacrifice  

The  be@er  it  is    the  lower  the  dose  

Improved  tes=ng    in-­‐vivo  

with  fewer  animals  

Clinical  linkage  to  protein  target  

Can  test  In-­‐Vivo  AnQ  SAR  

e.g.  hERG,  Nav1.5,  5-­‐HT2a…    

Analysis  of  In-­‐Vivo  data  Pfizer  –  rat  data  

<0.2mg/Kg  Dose  

Metabolism  &  Pharmacokine=cs  

Be@er  design  so    dose  is  lower    

Grand Rule Database

Hughes  et  al,  Bioorg  Med  Chem  Le>.  2008,  18(17),  4872  

Page 25: MedChemica BigData What Is That All About?

Collaborators  and  Users  

Page 26: MedChemica BigData What Is That All About?

The  ‘Internet  of  Things  (IoT)’  A higher diversity of devices connected to the internet with flow of data to and from For example Smart Watches

Life style device – marketed on selling fitness / wellness Like UPS vans and RR jet engines can we detect the illness pre-symptomatically?

Page 27: MedChemica BigData What Is That All About?

Big Data – ‘What is that all about?’

•  Introduction to Big Data –  Big enough to need a computer / advanced stats

•  Examples from History –  Bletchley park, UPS, Beer and Nappies….

•  Big Data and science –  Hadron collider….

•  MedChemica – Advancing drug design through actionable knowledge –  Allows sharing of knowledge to accelerate and

reduce costs of finding new, safe medicines