hdmr methodology
DESCRIPTION
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544. HDMR Methodology. HDMR expresses a system output as a hierarchical correlated function expansion of inputs:. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/1.jpg)
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH
DIMENSIONAL MODELREPRESENTATION (RS-HDMR)
Herschel Rabitz
Department of Chemistry, Princeton University,Princeton, New Jersey 08544
![Page 2: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/2.jpg)
HDMR Methodology
• HDMR expresses a system output as a hierarchical correlated function expansion of inputs:
![Page 3: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/3.jpg)
HDMR Methodology (Contd.)
• HDMR component functions are optimally defined as:
- where are unconditional and conditional probability density functions:
![Page 4: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/4.jpg)
RS (Random Sampling) – HDMR (Contd.)
• RS-HDMR component functions are approximated by expansions of orthonormal polynomials
- Inputs can be sampled independently and/or in a correlated fashion
- Only one set of data is needed to determine all of the component functions
- Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion
![Page 5: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/5.jpg)
Global Sensitivity Analysis by RS-HDMR
• Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions
- Where are defined as the covariances of
with f(x), respectively
![Page 6: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/6.jpg)
A Propellant Ignition Model
Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant
![Page 7: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/7.jpg)
A Propellant Ignition Model
• 10 independent and 44 cooperative contributions of inputs were identified as significant
![Page 8: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/8.jpg)
A Propellant Ignition Model
• Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs
![Page 9: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/9.jpg)
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
Microenvironmental/exposure/dose modeling system
Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)
![Page 10: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/10.jpg)
Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• The coupled microenvironmental/pharmacokinetic model:
- Three exposure routes (inhalation, ingestion, and dermal absorption)
- Release of TCE from water into the air within the residence
- Activities of individuals and physiological uptake processes
• Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials
• Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption)
- The amount inhaled or ingested:
- The amount absorbed:
- C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed
![Page 11: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/11.jpg)
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated
The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5
![Page 12: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/12.jpg)
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant
First order sensitivity indexes
![Page 13: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/13.jpg)
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling
• Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs
The ten largest 2nd and 3rd order sensitivity indexes
![Page 14: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/14.jpg)
Identification of bionetwork model parameters
Characteristics of the problem: System nonlinearity Limited number & type of experiments Considerable biological and measurement noise
Multiple solutions exist !
Problems with traditional identification methods: Provide only one or a few solutions for each parameter Assume linear propagation from data noise to parameter uncertainties
The closed-loop identification protocol (CLIP): Extract the full parameter distribution by global identification Iteratively look for the most informative experiments for minimizing parameter uncertainty
![Page 15: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/15.jpg)
xc xr
Proposed Mechanism
CONTROL MODULE
Controlled System Perturbations and
Property Measurements
LaboratoryConstraints
R(u,X)
uc(t)
Xr(t) Qinv
Best Solution Distribution
(k*)
ANALYSIS MODULE
INVERSION MODULE
Learning Algorithm Guiding the Experiments: Qinv
→ J ctrl→ uc( )t
Previous Knowledge
Trial Solutions
(k0)
Pre-lab analysis and design of themost informative experiments
Iterative experiment optimizationand data acquisition
Global parameter identification
General operation of CLIP
![Page 16: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/16.jpg)
Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle
Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)
*
**
**
*
* Rate constants to be identified
![Page 17: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/17.jpg)
The inversion module: identifying the rate constant distribution
The Genetic Algorithm (GA)
Mutation
1101 1111+1100 0010
1101 1101+1100 0110
Crossover
1101 1100 + 1111 0010
1101 0010 + 1111 1100
The inversion cost function
in
calpitn
labitn
t
tt
N
n
piinv XX
TNJ
T
ε/||||11 ,,
,,,
1
,
1
−= ∑∑==
0
5
10
15
20
25
0
0.110.220.330.440.550.660.770.880.991.1
1.211.321.431.541.651.761.871.98
k2 (s-1, in log scale)
no. of solutions
distribution of k2 (random control)
Typical rate constant distributionafter random perturbation/control
Q
Inversion quality index Q
![Page 18: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/18.jpg)
The analysis module: estimating the most informative experiments
Estimate the best species for monitoring system behavior Determine the best species for perturbing the system
Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)
K++= ∑∑≤<≤= nji
ij
n
iitotal
1
2
1
22 σσσ
![Page 19: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/19.jpg)
Optimally controlled identification: squeezing on the rate constant distribution
The control cost function
∑= +
−=
−=M
mim
im
im
imi
inv
ir
ic
iinv
ictrl
kk
kk
MQ
tXtuRQJ
1 min,max,
min,max, ])(
)(1/[1
)](),([ω
Inversion quality
Feng and Rabitz, Biophys. J., 86:1270-1281 (2004)Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)
Non-
![Page 20: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/20.jpg)
Network property optimization:
A. Identifying the best targeted network locations for intervention
B. Identifying the optimal network control
OptimalNetwork
Performance
OptimalNetwork
Performance
Optimal Controls
Optimal Controls
BiologicalSystemBiological
System
Initial Guess/Random Control
Initial Guess/Random Control
ControlDesignControlDesign
LearningAlgorithm
LearningAlgorithm
ObservedResponseObservedResponse
ControlObjective
ControlObjective
![Page 21: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/21.jpg)
A. Molecular target identification for network engineering
P(lac)
IPTG
LacI
lacI cI eyfp
EYFPCI
?
P(R-O12)p(lacIq )
LacI
lacIlacI cIcI eyfp
EYFPCICI
?
P(R-O12)p(lacIq )
Random-sampling high dimensionalmodel representation (RS-HDMR)
∑ ∑= ≤<≤
+++==N
i Njijiijii kkfkfffy
1 10 ),()()( Kk
Randomly sample k
K++= ∑∑≤<≤= Nji
ij
N
iitotal
1
2
1
22 σσσ
Advantages of RS-HDMR:
Global sensitivity analysis Nonlinear component functions Physically meaningful representation Favorable scalability
Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)
![Page 22: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/22.jpg)
P(lac)
IPTG
LacI
lacI cI eyfp
EYFPCI
?
P(R-O12)p(lacIq )
LacI
lacIlacI cIcI eyfp
EYFPCICI
?
P(R-O12)p(lacIq )
k6 k10 ─ k13
01
2
3
4
5
6
7
8
9
p110 pR1 pR2 pR3
IPTG=1mMIPTG=0
k10 ─ k13 fixed
05
10
15
20
25
30
35
4045
p107 pM4 pM5 pM6
IPTG=1mMIPTG=0
k6 fixed
Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)
Laboratory data on the mutants
![Page 23: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/23.jpg)
Example: Biochemical multi-component formulation mapping
• Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs)
• ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory
• A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated
measurements
![Page 24: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/24.jpg)
Biochemical multi-component formulation mapping
The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data
Note: The two parallel lines are absolute error ±0.2
![Page 25: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/25.jpg)
The s-space network identification procedure (SNIP)
aTc: x1 IPTG: x2 EYFP: y(x1,x2)
Encode: x1→x1m1(s) x2→x2m2(s)
Response measurement: y→y(s)
Decode: Fourier transform
TetR
tetR lacI eyfp
EYFPLacI
pL(tet) P(lac)
aTc
p(lacIq)
IPTG
TetR
tetRtetR lacIlacI eyfp
EYFPLacILacI
pL(tet) P(lac)
aTc
p(lacIq)
IPTGLaboratory data on the transcriptional cascade
![Page 26: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/26.jpg)
Nonlinear property prediction by SNIP
Unmeasured region correctly predictedNonlinear, cooperative behavior revealed
Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation
![Page 27: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/27.jpg)
SNIP application to an intracellular signaling network
Sachs, et al., Science, 308:523-529 (2005)
Laboratory single cell measurement data
![Page 28: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/28.jpg)
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
PKC
PKA
P38Jnk
Plc
?
PIP3
PIP2
Raf
Mek
Erk
Akt
Network connections identified by SNIP and Bayesian analysis
Reliable SNIP prediction of Akt levels
Identified network with predictive capability
![Page 29: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/29.jpg)
Example: Ionospheric measured data
• The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points)
• Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE
• Output: ionospheric critical frequencies foE• The inputs are not controllable and not independent;
the pdf of the inputs is not separable, and was not explicitly known
![Page 30: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/30.jpg)
Ionospheric measured data
The dependence of foE on the input “day”
Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for
the data at 12 UT
![Page 31: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/31.jpg)
Ionospheric measured data
The accuracy of the 2nd order RS-HDMR expansion for the output, foE
![Page 32: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/32.jpg)
Quantitative molecular property prediction
X1
X2
Standard QSAR
General strategy:Molecular activity is a function of its chemical/physical/structural descriptors
Problems: Overfitting (choice of descriptors) Underlying physics
A simple solution:y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2
Descriptor-free quantitative molecular property interpolation
![Page 33: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/33.jpg)
Descriptor-free property predictionfrom an arbitrary substituent order
∑=j
jjcy φ
![Page 34: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/34.jpg)
Property prediction from the optimal substituent order
Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)
Complexity of the search: N1!•N2!=14!•8!=1015
Cost function: yJ ∇=
![Page 35: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/35.jpg)
Application to a chromophore transition metal complex library
Before reordering After reordering
Outliers captured by the reordering algorithm
Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)
∑ −=k
labk
calk yyJ 2)(Cost function:
![Page 36: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/36.jpg)
10 20 30 40 50 60 70 80 90
20
40
60
80
100
120
140
Application to a drug compound library
∑ ∑∑ ∑≤′<≤ ≤≤
′′≤′<≤ ≤≤
′′ −+−=2 11 2 1 1
2
1 1
2 ])([])([Njj Ni
jiijjjNii Nj
jiijii yycyycJCost function:
15% of data
Reorder
Prediction
>14,000 compounds
![Page 37: HDMR Methodology](https://reader036.vdocuments.site/reader036/viewer/2022081603/56814686550346895db3aa32/html5/thumbnails/37.jpg)
THE MODERN WAY TO DO SCIENCE*
* Adaptively under high duty cycle and automated
“You should understand the physics, write down the correct equations, and let nature do the calculations.”
Peter Debye