nasa ames data sciences group - amazon web services · • nasa engineering and safety center •...

Post on 09-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NASAAmesDataSciencesGroup

www.nasa.gov •1

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

nikunj.c.oza@nasa.gov

•2

TheDataSciencesGroupatNASAAmes

GroupMembersIlya AvrekhKamalika Das,Ph.D.DaveIversonVijayJanakiraman,Ph.D.RodneyMartin,Ph.D.BryanMatthewsDavidNielsenNikunjOza,Ph.D.VeronicaPhillipsJohnStutzHamed Valizadegan,Ph.D.+summerstudents

Team Members are NASA Employees, Contractors, and Students.

FundingSources

• ScienceMissionDirectorate:AISTandCMACprograms

• NASAAeronauticsResearchMissionDirectorate- ATD,SMART-NAS,SASOProject

• NASAEngineeringandSafetyCenter

• ExplorationSystemsMissionDirectorate,ExplorationTechnologyDevelopmentProgram

• Non-NASA:DARPA,DoD

DataMiningResearchandDevelopment(R&D)forapplicationtoNASAproblems(Aeronautics,EarthScience,SpaceExploration,SpaceScience)

ExampleDataMiningProblems

• Aeronautics:AnomalyDetection,PrecursorIdentification,textmining(classification,topicidentification)

• EarthScience:Fillinginmissingmeasurements,anomalydetection,teleconnections,climateunderstanding

• SpaceScience:Kepler planetcandidates• SpaceExploration:systemhealthmanagement,vascularstructureidentification

FourV’sofBig Tough,SleepDeprivingData

AmazingAlgorithm

ØVolume:Ø RadarTracks:47facilities(1

year)~423GB(Compressed),~3.2TB(CSV)

Ø WeatherandForecast(EntireNAS):CIWS~2.8TB

ØVelocityØ RadarTracks:47Facilities

Ø ~35GB/month(compressed).

Ø ~268GB/month(uncompressed)

Ø WeatherandForecast(EntireNAS):CIWS~233GB/month

ØVeracityØ DatadropoutsØ DuplicatetracksØ TrackendinginmidairØ Reusedflightidentifiers

ØVarietyØ Numerical

(continuous/binary)Ø Weather(forecast/actual)Ø Radar/AirportmetadataØ ATCVoiceØ ASRStextreports

(Pilot/Controller)

IntuitiveReports

AeronauticsDataMiningProblems

• AnomalyDetection– AnomalyDiscoveryoverlargesetofvariables– Particularvariableofinterest,forexample,fuelburn

• Determineexpectedinstantaneousfuelburngivencurrentstateofaircraft

• Comparewithactualinstantaneousfuelburn• Wheredifferenceishigh,problemmaybeoccurring

• PrecursorIdentification– Givenundesirableeffect(e.g.,go-around),identifyprecursors(e.g.,overtakesituation,highspeedapproach)

• Textmining– Textclassification,topicidentification

TopicExtractionExample

autopltacftspd

capturemoderatelevel

engagedleveloffvertctl

disconnectedselectedfpmlightclbpitch

manuallywarningpwr

TOPIC1

timedayleg

contributingfactorshrscrewfactorfatiguenighttriprestdutyflyinglonglate

previousincidentlack

alerter

TOPIC2apchrwyvisualilstwrlndglocarptfinal

missedclredmsl

interceptvectoredsightgar

terrainfield

uneventfulctl

TOPIC3

Otherexamplesof‘fatigue’

AltitudeDeviationSpatialDeviationRampExcursionLandingwithoutclearanceRunwayIncursionUnstableApproach

AeronauticsAnomalyDetection:CurrentMethods

Exceedance-BasedMethods• Knownanomalies• Conditionsover2-3variables(e.g.,speed>250knots,altitude=1000ft,landing)

• Cannotidentifyunknownanomalies• Lowfalsepositiverate,highfalsenegative(misseddetection)rate.

Data-DrivenMethods

• DISCOVERanomaliesby– learningstatisticalpropertiesofthedata– findingwhichdatapointsdonotfit(e.g.,faraway,lowprobability)

– nobackgroundknowledgeonanomaliesneeded

• Complementarytoexistingmethods– Lowfalsenegative(misseddetection)rate– Higherfalsepositiverate(identifiedpoints/flightsunusual,butnotalwaysoperationallysignificant)

• Data-drivenmethods->insights->modificationofexceedance detection

Example:HighSpeedGo-Around

• OvershootsExtendedRunwayCenterline(ERC)byover1SM

• Over250Kts @2500Ft.• Angleofintercept>40°• Overshoots2nd approach

BryanMatthews,DavidNielsen,JohnSchade,Kennis Chan,andMikeKiniry,AutomatedDiscoveryofFlightTrackAnomalies,33rd DigitalAvionicsSystemsConference,2014

ProvidingDomainExpertFeedback

Input Features Anomalies

Nominals

MKAD

SME

Operationally significant anomalies

Uninteresting anomalies

Activelearning strategy

Input Features

MKAD

Training

Anomalies

Nominals

Testing

Rationale features

Active learning with rationales framework

2-class classification/ranking

algorithm

Manali Sharma,Kamalika Das,MustafaBilgic,BryanMatthews,DavidNielsen,andNikunjOza,ActiveLearningwithRationalesforIdentifyingOperationallySignificantAnomaliesinAviation,EuropeanConferenceonMachineLearningandPrinciplesandPractices

OfKnowledgeDiscovery(ECML-PKDD),2016

EarthScienceExample

• Understandrelationshipsbetweenecosystemdynamics andclimaticfactors

• Modelasaregressionanalysisproblem• 3sciencequestions– Magnitudeandextentofecosystemexposure,sensitivityandresiliencetothe2005and2010Amazondroughts

– Understandhuman-inducedandotherattributionascausesofvegetationanomalies

– Howlearneddependencymodelvariesacrosseco-climaticzonesandgeographicalregionsonaglobalscale

NASAESTOAIST-14project,UncoveringEffectsofClimateVariablesonGlobalVegetation(PI:Kamalika Das,Ph.D.)

ProblemFormulation

• Point-to-pointregressionanalysis(GeneticProgrammingbasedSymbolicRegression)

• Estimatespatio-temporaldependencyofforestecosystemsonclimatevariables

Vijt=f(Lcij, CVij

t, CVnbt, CVij

t-1, CVnbt-1,.....CVij

t-k, CVnbt-k)

V:vegetation, i,j:pixellocationindicesLC:landcover type, t:timeindexCV:climate variable(s) nb:spatialneighborhoodof

indexi,jk:temporaldependencyOpenchallenges: 1.Estimatingfunctionf

2.Estimatingbestchoicesfork,nb

DataPipeline

NDVIResolution: 250 m

Projection: Sinusoidal

LSTResolution: 1 km

Projection: Sinusoidal

TRMM (Ver 6)Resolution: 25 kmProjection: WGS84

Reprojectand

resampledata

NDVI, TRMM, LST

Resolution: 1 kmProjection: WGS84

Filterdatabasedonlandcover

2000 – 2010 Monthly data

Time-Series:Changetoseasonal

Monthly -> Seasonal Windowing:

Smoothingover25x25sizewindow

4 Seasons/yea

r2000 – 2010 Seasonal

data

Season 1: March – MaySeason 2: June – SepSeason 3: OctSeason 4: Nov - Feb

Resultsfor2004-2010

Year RidgeRegression LASSO SVR Symbolic

Regression

2004 0.284 0.284 0.280 0.262

2005 0.289 0.289 0.288 0.278

2006 0.426 0.426 0.430 0.321

2007 0.374 0.374 0.370 0.318

2008 0.308 0.308 0.310 0.336

2009 0.353 0.353 0.360 0.328

2010 0.546 0.547 0.540 0.479

Marcin Szubert,Anuradha Kodali,Sangram Ganguly,Kamalika Das,andJoshC.Bongard,ReducingAntagonismbetweenBehavioralDiversityandFitnessinSemanticGeneticProgramming,ProceedingsoftheGeneticandEvolutionaryComputation

Conference(GECCO),pp.797-804,2016.

Mean Squared Error

OngoingandFutureWork• Experimentwithdifferentcombinationsoftemporal

lookback and/orspatialeffects• Introduceadditionalregressors(radiation,forestfiremaps,

deforestationmaps)• StudytheeffectofdifferentregressorsondifferentAmazon

tiles• DerivenonlinearGPmodelsonAmazontiles• Givenappropriatehistoricaldata,havetheabilitytopredict:

“Underwhatconditionsdoesvegetationnotrecoverwithinacertaintimeframe.”

• Doglobalscaleanalysisinparallel

VESsel GENeration (VESGEN)AnalysisPatriciaParsons-Wingerter,PhD,NASAChiefInnovator/POCNASAAmes2016InnovationFundAward,ChiefTechnologist’sOffice

• VESGEN2Dmapsandquantifiesvascularremodelingforawidevarietyofquasi-2Dvascularizedbiomedicaltissueapplications.

• WorkingontransformingtoVESGEN3D,inlinewithmostvascularizedorgansandtissuesinhumansandvertebrates.

• Vascular-dependentdiseasesincludecancer,diabetes,coronaryvesseldisease,andmajorastronauthealthchallengesinthespacemicrogravityandradiationenvironments,especiallyforlong-durationmissions.

• Onekeycomponentisbinarization:conversionofgrayscaleimagestoblack/whitevascularbranchingpatterns.– Takes10-25hoursofhumaneffort.– Exploringpatternrecognition,matchingfiltering,vessel

tracking/tracing,mathematicalmorphology,multiscaleapproaches,andmodelbasedalgorithms.

OTSUThresholding

OTSUvs.AdaptiveThresholding

FutureWork• Workinprogress:exploringmorepreprocessingandpost-processingtechniques

• Eachstepofpreprocessingandpostprocessing hassomeinputparameters– Theresultissensitivetothisparameters– Weaimtomaketheparameterselectioneitherautomated(machinelearning)orsemi-automated(usercanchoosetherightparameter)

• MachineLearningtolearnthebinarization– Giventhemanuallabels,performsupervisedorsemi-supervisedlearning

– Eachpixelanditsclasslabel(foregroundorbackground)isthetrainingexample

DASHlinkdisseminate.collaborate.innovate.https://dashlink.ndc.nasa.gov/

DASHlinkisacollaborativewebsitedesignedtopromote:• Sustainability• Reproducibility• Dissemination• Communitybuilding

Userscancreateprofiles• Sharepapers,uploadanddownloadopensourcealgorithms• FindNASAdatasets.

How dowegettheWordOut?

NASAAmesDataSciencesGroup

www.nasa.gov •21

NikunjC.Oza,Ph.D.Leader,DataSciencesGroup

nikunj.c.oza@nasa.gov

top related