towards better pipeline data governance - cdn.ymaws.com · other related data to calculate maop or...
TRANSCRIPT
Outline
• The limitations of data – “There are known knowns. These are things we know
that we know. There are known unknowns. That is to
say, there are things that we know we don't know. But
there are also unknown unknowns. There are things we
don't know we don't know.” – Donald Rumsfield • Lessons from manufacturing process management – “If you can't describe what you are doing as a process,
you don't know what you're doing.” – W. Edwards Deming
• Pipelines and Black Swans – “It’s tough to make predictions, especially about the
future.” – Yogi Berra
The $64 Question
• “Unfortunately in the San Bruno accident, we found that
the company’s underlying records were not accurate... My
question is that if your many efforts to improve safety are
predicated on identifying risk, and if your baseline
understanding of your infrastructure is not accurate, how
confident are you that your risks are being assessed
appropriately?”
– Deborah Hersman, NTSB Chairman, at the National Pipeline Safety Forum, April 18, 2011
The $Billion Answer
• Is this a pipe? • Is this a pipeline?
• “The map is not the territory.”
– Alfred Korzibski, 1931
The Process of Data Abstraction
• Your pipeline database isn’t the real pipeline – The pipeline database is a representation of the
pipeline
Common Pitfalls in Digital Pipeline Data Abstraction
• Source documents that summarize information – Alignment sheets summarize pipe data
• Individual joints of pipe are typically not represented
• Insufficient detail in source documents – Records for older pipelines may simply not contain
information we now require • Source documents that do not accurately reflect
change over time – Missing repair records – Missing assessment records
• Insufficient documentation of data provenance – Lack of metadata regarding the source of the data
The Goal
• “As PHMSA and NTSB recommended, operators relying on the review of design, construction, inspection, testing and other related data to calculate MAOP or MOP must assure that the records used are reliable. An operator must diligently search, review and scrutinize documents and records, including but not limited to, all as-built drawings, alignment sheets, and specifications, and all design, construction, inspection, testing, maintenance, manufacturer, and other related records. These records shall be traceable, verifiable, and complete.” – PHMSA Advisory Bulletin ADB-11-01
Information Manufacture
• The process of converting raw data to refined information is fundamentally a manufacturing process – Too often, we approach information creation like skilled
artisans • Information is crafted, not manufactured • Process uniformity is lacking • Data validation, verification and clean up is performed as a
custom, “one off” event • Reproducibility is dependent on the skill of the practitioner (i.e.
the Subject Matter Expert) – While results may be acceptable, it’s a grossly inefficient
way to run a business
Tools for Success Borrowed from Manufacturing Process Management • Six Sigma – Process improvement through defect reduction and
process uniformity
• Lean Manufacturing – Process improvement through elimination of waste
• Theory of Constraints (TOC) – Process improvement through maximization of
throughput
• All concentrate on DEFECT PREVENTION
Lessons from Six Sigma
• ϲʍ - if there are six standard deviations between the process mean and the nearest specification limit, the process yield is 99.99966% – 3.4 defects per million operations
• Define and document your processes • Establish process metrics
– Data cycle time – Data defect incidence
• Analyze results; improve the process • Institute process controls to prevent defects – Fail safe data checks to prevent bad data from entering
the system
Lessons from Lean Manufacturing
• Identify and relentlessly eliminate wastes – Long data cycle times – Bad data
• Incorporate “autonomation” (smart automation) in your fail safe checks – Computers are lousy at correcting problems,
but great at identifying them – Utilize the power of GIS • Incorporate spatial context into your autonmated fail
safes
Lessons from Theory of Constraints
• Indentify process constraints, address them in priority order – Complicated processes are like rate-limited
chemical reactions • The overall reaction rate is constrained by the
slowest reaction step
• Speed up the slowest reaction step, and the overall reaction rate increases
Document Your Data!
• The most accurate information is worthless if you don’t know where it comes from – Make data provenance a priority – Treat data like courtroom evidence
• Document the chain of custody
• Popular pipeline data models like PODS and the APDM facilitate only record-level history tracking – This is necessary, but insufficient – Data edits should be tracked at the attribute level – The outcome of every decision branch in the data
manufacturing process should be recorded
The Problem of Induction, Black Swans, and Thermodynamics • The problem of induction (as explored by English
philosopher David Hume) – During much of the 17th century, an Englishman could seemingly state with confidence, “all swans we have seen are white; therefore all swans are white.” – Black swans were discovered in Australia in 1697
• A Black Swan is: – Any event, positive or negative, that is highly improbable, and
results in nonlinear consequences • Black Swans do not conform to Gaussian distributions, but rather obey
Pareto (power law) distributions – An outlier event; nothing in our past experience convincingly points
to its possibility • “It’s the Second Law of Thermodynamics: Sooner or later
everything turns to $#!+.” – Woody Allen
Black Swans and Narrative Fallacy
• Human beings are incredibly adept at explaining things – This leads to an unwarranted confidence in our ability to
predict outcomes resulting from complexly interacting phenomena
– Explanation т Prediction
• “Things always become obvious after the fact.” – Nassim Nicholas Taleb
• Question: How good are our risk models, really?
Black Swans and Diagnostic Testing
• Nuclear Cardiac Stress Testing is used to diagnose Coronary Artery Disease (CAD) – Sensitivity = 91%
• Failure to detect disease = 9% • In other words, it’s about the same as playing Russian Roulette
with a revolver that has ten cartridge chambers
– Specificity = 72% • False positives = 28%
– Utility as a predictor of Acute Coronary Syndrome • “The current myocardial perfusion imaging toolset has limited
sensitivity for screening patients who are at risk for ACS.”
– Question: Is hydrostatic testing a panacea for incomplete pipeline records?
Mitigation vs. Common Sense
• State-of-the-art shark bite
risk mitigation: – The Neptunic shark suit
• Designed to mitigate the effects of unsolicited social interactions with hungry sharks of the “bitey” variety
• Chainmail-style protection provides the diver with full body coverage
• Common sense: – Avoid risk
– DON’T SWIM WITH SHARKS!!!
Conclusion • Data can never represent the physical world with
complete fidelity – We don’t really know much of what we think we know
• Information creation is a manufacturing process 1. We don't know what we don't know. 2. If we can't express what we do know numerically, we don't really
know much about it. 3. If we don't know much about it, we can't control it. 4. If we can't control it, we are at the mercy of chance.
• Dr. Mikel J. Harry
• Black Swans are unpredictable and unavoidable
– The best you can accomplish is Black Swan robustness – Hubris is fatal