welcome! mass spectrometry meets cheminformatics wcmc metabolomics course 2013 tobias kind
DESCRIPTION
Chemistry. Biology. Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind Course 1: General Introduction. Informatics. http://fiehnlab.ucdavis.edu/staff/kind. CC-BY License. What is ChemInformatics?. Chemometrics est. 1975 Cheminformatics est. 1998. - PowerPoint PPT PresentationTRANSCRIPT
1
Welcome!
Mass Spectrometry meets ChemInformaticsWCMC Metabolomics Course 2013
Tobias Kind
Course 1: General Introduction
http://fiehnlab.ucdavis.edu/staff/kindCC-BY License
2
What is ChemInformatics?
Chemistry
Statistics
Informatics
Mathematics
Chemometrics est. 1975Cheminformatics est. 1998
3
Who uses Cheminformatics?All parts of chemistry heavily depend on cheminformatics.Life sciences, biochemistry, drug industries use cheminformatics.
20 years ago: 80% in lab – 20% in front of computerNow: 20% in lab - 70% in front of computer (*)
Examples:
• Organic chemistry – automated reaction planning, Beilstein search• Physical chemistry – modeling of structure properties (boiling points)• Inorganic chemistry – ligand bond interactions• Analytical chemistry – structure elucidation of small compounds• Biochemistry – protein/small molecule interaction networks
PhD(*) 10% fixing and installing new programs
4
Motivation for Mass Spectrometry meets ChemInformatics
To be a master of spectra you need to be a master of structures in the first place.
(nist_m sm s) V inc ristine260 310 360 410 460 510 560 610 660 710 760 810
0
50
100
265 353 395 455 513 538604
636
676
705
723
747
765
807
NHO
O
NO H
HOON
OO
N
O
O
O
Complex MS data interpretations only possible with software MS data obtained by hyphenated techniques (GC-MS, LC-MS) Mass spectral database search and structure search routinely are used Mass spectrometers deliver multidimensional data
5
Computer Illiteracy – a threat to your researchYour computer is your friendYou don’t have a computer? You don’t have a friend (just kidding)
• Assume you have a computer:Please step forward name: CPU, speed, memory, hard disk, OS
• You are a chemist, biochemist, biologist: Please step forward name: Computer language or DB you know
OS = operating system; DB = database, CPU = central processing unit
PDP-11 www.bell-labs.com
6
Fighting Computer Illiteracy - name your PC
CPU INTEL,AMD,IBM,HP Pentium, Opteron, Xeon 12-20 Core
Memory DDR, DDR2 GEIL, KINGSTON 16-128 GByte
Hard disk SEAGATE, WD Raptor, Barracuda, Cheetah100-1000 GByte
OS MICROSOFT, LINUX Windows, Linux, OSX, Virtual OS
Language C, Basic, Perl, JAVA
Bit < Byte < kByte < MByte < GByte < TByte
Single Core < Dual Core < QuadCore < MultiCore
MFLOP/s < GFLOP/s < TFLOP/s < PFLOP/s
1 Thread < Dual Thread < MultiThreaded
Cray 2 in rot, Nixdorfmuseum, 2004,
7
The free lunch is over – multithreading needed
Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm
NO YES
Can your metabolomics software use multiple CPUs?
8
The free lunch is over – multithreading needed
Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm
Course example MZMINE alignment (7 files -18 min LC-MS) Single core vs. multi-core
50 seconds
3:29 minutes
Mors certa, hora incerta!
9
Best recommendation ever for slow computersInstall an SSD!
Single hard diskSeagate 750 GB
SSD RAID10Samsung 830 (2 TB)
RAMDISKOSFMount
SSDs and Ramdisks have 200 to 1000-fold(!) 4k speed. 4k speed matters.
10
Computer Illiteracy – learn a programming language
Why should you?
20% lab time – 80% computer timeMass spectrometers deliver data – not results
Why shouldn't you? (fake reasons)
You are too old to learn…You are not good with computers…Your have more important research to do…You are so rich you have programmers who work for you…
Picture Source: WIKI James Manners from Genova, Italia
11
Computer Illiteracy – learn a programming language
• Learn any language which has a large code and user base (JAVA, Perl, Visual Basic)• Use IDEs with automatic code completion like MS Visual Express or Eclipse• Don’t re-invent code - use (and document) code search engines like
http://code.ohloh.net/ (formerly koders now ohloh); google.com/codesearchhttp://krugle.orgmoOMoOMoOMoOMoOmoOMoOMoOMoOMoOMoOMoOMoOMo
OMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMMMmoOMMMommMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMMMmoOMMMMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO
Language “cow” Language “brainfuck”
Do *not* learn these working but esoteric languagesThere are 1123 programming languages http://99-bottles-of-beer.net/
>>++++++++[<++++>-] >++++++++++++++[<+++++++>-] +>+++++++++++[<++++++++++>-] ++>+++++++++++++++++++[<++++++>-] ++>+++++++++++++++++++[<++++++>-] >++++++++++++[<+++++++++>-]
12
Program development – Eclipse for JAVA example
Projects
JAVA or C code
Text output
13
Your computer Illiteracy – your emergency helpersRegular expressions; SQL database requests; EXCEL VBA scripts or Perl scripts are special tools for data handling (Swiss army knifes) Regular expressions (RegEx) are used for finding and replacing text
[0-9] – represents all numbers Examples: \n\n – find double empty lines[a-z] – represents all small letters find \t replace with spaces “ “\n – represents new line (CR/LF) find two numbers in brackets ([0-9][0-9])\t – represents TAB
yr subject winner1901 Chemistry
Jacobus H. van 't Hoff
1902 Chemistry Emil
Fischer1903 Chemistry
Svante Arrhenius
1904 Chemistry Sir William
Ramsay1905 Chemistry
Adolf von Baeyer
1906 Chemistry Henri
Moissan1907 Chemistry
Eduard Buchner
1908 Chemistry Ernest
Rutherford1909 Chemistry
Wilhelm Ostwald
1910 Chemistry Otto
Wallach1913…
SELECT yr, subject, winner FROM nobel WHERE yr = 1909 and subject = 'chemistry'
yr subject winner1909 Chemistry Wilhelm Ostwald
Large Database Table SQL query Result
Visit the SQL Zoo
SQL is used for programming databases
Learn about RegEx
14
Regular Expressions – example MS dataTask: create a list of 4 columns with names, formulas, CAS numbers and peaksProblem: 24,000 lines of mass spectral data (*.msp)Program: Textpad (WIN), Smultron (Mac)
Number of lines in text
(mainlib ) 2,5-P yrro lid ined ione, 1-methyl-3-phenyl-10 30 50 70 90 110 130 150 170 190
0
50
100
14 28 39 51 6378
89
104
117 131 160
189
ON
O
(m/z - intensity pair)
Enter (CR/LF) in gray
15
Regular Expressions – example MS data Solution: replace Enter (\n) with TAB (\t) and use Replace ALL
Result: Metadata in one line
1
2
3
16
Regular Expressions – example MS data Solution: copy only lines of interest (Mark ALL – Copy Bookmarked Lines)
17
Regular Expressions – Result for MS data Solution: Replace redundant code with nothing, copy tab separated file to EXCEL
Result: 1:30 min for RegEx job(1 hour manually?)
Average spectrum size: 70 peaksMinimum size: 5 peaksMaximum size: 439 peaksMost spectra have 35 and 45 peaks
18Try Marvin Space via Webstart
Be prepared – visualize your structures
19
Calculation of tetrahedral and double bond stereoisomersHow many stereoisomers can you expect from glucose (KEGG)?Example: separation of species with ion mobility MS (FAIMS)
Example calculated with MarvinView (via JAVA Webstart)
O
HO
HO
OH
OH
OH
Glucose
20
Computation of resonance forms (electron shifts)What are possible resonant structures?Important for mass spectral interpretation (electron impact, electrospray)
OH
Phenol
Example calculated with MarvinView Start via WebStart
21
Generation of tautomers using MSketchHow many tautomers can you expect?Important for mass spectral interpretations and LC-MS.
H3C O
O
CH3
Methyl acetate
Example calculated with MarvinView Start via WebStart
Derivatization in GC-MS and LC-MS solves the tautomer problemCommon tautomerisms: Enol/Keto, Lactams, Amines/Imines, Amides/Imides
22
Property calculations on chemicalize.org
23
Mass spectral database search – know what existsHow many mass spectra with formula C11H8O3 in NIST DB?
Result: 19 for C11H8O3 in NIST05 DBDownload NIST-MS-Search
24
Mass spectral interpretationAssign structural elements to mass spectral peaks
Download Mass Spectrum Interpreter Version 2
25http://www.hmdb.ca/metabolites/HMDB09837
Mass Spec Scissors (ACDLabs Free)Q: What is peak m/z 281 in negative mode?
26
Molecular Weight Calculator
522.00 524.00 526.00 528.00 530.00 532.000.0
20.0
40.0
60.0
80.0
100.0
Calculate isotopic massesFind formulas from massesCalculate isotopic patterns
Download MWTWIN
27
Structure search – know what could be possibleHow many compounds (isomer structures) are found in public databases?
Result:272 for C11H8O3
http://www.chemspider.com/
28
Stay tuned – new mass spectrometry publicationsvia Yahoo Pipes
[LINK][RSS]
29
Be open minded – NMR can do some things better
ChenomX Profiler – with 312 pH and frequency tuned reference spectra
2D-NMR needed for de-novo structure elucidationNMR metabolic profiling is highly reproducible with low variance
30
NMR prediction with ChemAxon Msketch
31
The Last Page - What is important to remember:
Learn about CPU type, memory, hard disks, bits and bytes; shock you colleagues with random questions about their computer
Think about automation, thinks you would like to do (even if you can’t) shock you colleagues with a small computer script
Use regular expressions for stupid or boring jobs you delete/replace data more than 3x - remember RegEx, RegEx, Regex
Use scripting languages for small problems (EXCEL VBA, PERL) steal some small examples and color your EXCEL data in rainbow color
Generate yourself a collection of programs and databases for MS try such programs in a Virtual Machine without messing up your system