nasa ames data sciences group - amazon web services · • nasa engineering and safety center •...

21
NASA Ames Data Sciences Group www.nasa.gov •1 Nikunj C. Oza, Ph.D. Leader, Data Sciences Group [email protected]

Upload: others

Post on 09-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

NASAAmesDataSciencesGroup

www.nasa.gov •1

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

[email protected]

Page 2: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

•2

TheDataSciencesGroupatNASAAmes

GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents

Team Members are NASA Employees, Contractors, and Students.

FundingSources

• ScienceMissionDirectorate:AISTandCMACprograms

• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject

• NASAEngineeringandSafetyCenter

• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram

• Non-NASA:DARPA,DoD

DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)

Page 3: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

ExampleDataMiningProblems

• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)

• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding

• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification

Page 4: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

FourV’sofBig Tough,SleepDeprivingData

AmazingAlgorithm

ØVolume:Ø RadarTracks:47facilities(1

year)~423GB(Compressed),~3.2TB(CSV)

Ø WeatherandForecast(EntireNAS):CIWS~2.8TB

ØVelocityØ RadarTracks:47Facilities

Ø ~35GB/month(compressed).

Ø ~268GB/month(uncompressed)

Ø WeatherandForecast(EntireNAS):CIWS~233GB/month

ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers

ØVarietyØ Numerical

(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports

(Pilot/Controller)

IntuitiveReports

Page 5: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

AeronauticsDataMiningProblems

• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn

• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft

• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring

• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)

• Textmining– Textclassification,topicidentification

Page 6: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

TopicExtractionExample

autopltacftspd

capturemoderatelevel

engagedleveloffvertctl

disconnectedselectedfpmlightclbpitch

manuallywarningpwr

TOPIC1

timedayleg

contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate

previousincidentlack

alerter

TOPIC2apchrwyvisualilstwrlndglocarptfinal

missedclredmsl

interceptvectoredsightgar

terrainfield

uneventfulctl

TOPIC3

Otherexamplesof‘fatigue’

AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach

Page 7: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

AeronauticsAnomalyDetection:CurrentMethods

Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)

• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.

Page 8: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

Data-DrivenMethods

• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)

– nobackgroundknowledgeonanomaliesneeded

• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)

• Data-drivenmethods->insights->modificationofexceedance detection

Page 9: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

Example:HighSpeedGo-Around

• OvershootsExtendedRunwayCenterline(ERC)byover1SM

• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach

BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014

Page 10: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

ProvidingDomainExpertFeedback

Input Features Anomalies

Nominals

MKAD

SME

Operationally significant anomalies

Uninteresting anomalies

Activelearning strategy

Input Features

MKAD

Training

Anomalies

Nominals

Testing

Rationale features

Active learning with rationales framework

2-class classification/ranking

algorithm

Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices

OfKnowledgeDiscovery(ECML-PKDD),2016

Page 11: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

EarthScienceExample

• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors

• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts

– Understandhuman-inducedandotherattributionascausesofvegetationanomalies

– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale

NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)

Page 12: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

ProblemFormulation

• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)

• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables

Vijt=f(Lcij, CVij

t, CVnbt, CVij

t-1, CVnbt-1,.....CVij

t-k, CVnbt-k)

V:vegetation, i,j:pixellocationindicesLC:landcover type, t:timeindexCV:climate variable(s) nb:spatialneighborhoodof

indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf

2.Estimatingbestchoicesfork,nb

Page 13: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

DataPipeline

NDVIResolution: 250 m

Projection: Sinusoidal

LSTResolution: 1 km

Projection: Sinusoidal

TRMM (Ver 6)Resolution: 25 kmProjection: WGS84

Reprojectand

resampledata

NDVI, TRMM, LST

Resolution: 1 kmProjection: WGS84

Filterdatabasedonlandcover

2000 – 2010 Monthly data

Time-Series:Changetoseasonal

Monthly -> Seasonal Windowing:

Smoothingover25x25sizewindow

4 Seasons/yea

r2000 – 2010 Seasonal

data

Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb

Page 14: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

Resultsfor2004-2010

Year RidgeRegression LASSO SVR Symbolic

Regression

2004 0.284 0.284 0.280 0.262

2005 0.289 0.289 0.288 0.278

2006 0.426 0.426 0.430 0.321

2007 0.374 0.374 0.370 0.318

2008 0.308 0.308 0.310 0.336

2009 0.353 0.353 0.360 0.328

2010 0.546 0.547 0.540 0.479

Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation

Conference(GECCO),pp.797-804,2016.

Mean Squared Error

Page 15: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal

lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,

deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon

tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:

“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”

• Doglobalscaleanalysisinparallel

Page 16: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice

• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.

• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.

• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.

• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel

tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.

Page 17: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

OTSUThresholding

Page 18: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

OTSUvs.AdaptiveThresholding

Page 19: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques

• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)

• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning

– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample

Page 20: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

DASHlinkdisseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/

DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding

Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.

How dowegettheWordOut?

Page 21: NASA Ames Data Sciences Group - Amazon Web Services · • NASA Engineering and Safety Center • Exploration Systems Mission Directorate, Exploration Technology Development Program

NASAAmesDataSciencesGroup

www.nasa.gov •21

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

[email protected]