webinar@aims: inra's big data perspectives and implementation challenges
TRANSCRIPT
![Page 1: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/1.jpg)
INRA's Big Data perspectives and implementation challenges
Pascal NeveuUMR MISTEA
INRA - Montpellier
![Page 2: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/2.jpg)
11 avril 2013
Pascal Neveu 2
What is INRA?
French Institute for Agricultural Research
• Specialties:
– Agriculture– Environment– Food
● 10000 Persons (4000 researchers)
● 18 Centres (> 40 sites)
![Page 3: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/3.jpg)
11 avril 2013
Pascal Neveu 3
What is INRA ?
Technical staff (4000) --> data producer
INRA Data characteristics:
- Spread over territory
- Many disciplines
- National and international collaborations
![Page 4: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/4.jpg)
Pascal Neveu / AG MIA 2014 4
Raise integrated issues and challenges:
– How to adapt agriculture to climate change?
– How agriculture impacts environment?
– Agroecology «producing and supplying food in a different way »
– Global food security and needs of adaptation
– Plant treatment and food safety
– ...
Agronomic Sciences
![Page 5: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/5.jpg)
11 avril 2013
Pascal Neveu 5
Data challenges in ScienceModern science must deal with:
● More experimental data
● A lot of datasets available on the Web
● More collaborative and integrative approaches
→ Management, sharing and data analysis play an increasing role in research
Discover, combine and analysethese data
→ Big Data Challenges
![Page 6: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/6.jpg)
11 avril 2013
Pascal Neveu 6
Big Data
A definition: data sets grow so large and complex that it becomes impossible to process using traditional data processing methods (Management and Analysis)
Make data valuable (information, knowledge, decision support)
![Page 7: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/7.jpg)
11 avril 2013
Pascal Neveu 7
Agronomic Big Data V characteristics
● Volume: massive data and growing size → hard to store, manage and analyze
● Variety and Complexity: different sources, scales, disciplines different semantics, schemas and formats etc. → hard to understand, combine, integrate
● Velocity: speed of data generation → have to be processed on line
● Validity, Veracity, Vulnerability, Volatility, Visibility, Visualisation, etc.
![Page 8: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/8.jpg)
11 avril 2013
Pascal Neveu 8
Why Big Data is important in Agronomic Sciences?
Production of a lot of heterogeneous data for understanding
● Open new insights
● Allow to know:
– Which theories are consistent and which ones are not!
– When data did not quite match what we expect…
→ Decision support needs (integrative and predictive approaches)
![Page 9: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/9.jpg)
11 avril 2013
Pascal Neveu 9
Illustration: Plant Phenotyping
Environment
Plant Genotype
Phenotype
Interactions
![Page 10: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/10.jpg)
11 avril 2013
Pascal Neveu 10
Searching for the most adapted genotypes
High throughput
Various Environments
Many Plant Genotypes
High frequency anddynamic trait observations of a lot of Phenotypes
Interactions
![Page 11: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/11.jpg)
11 avril 2013
Pascal Neveu 11
Why high throughput phenotyping is important for agriculture?
● Adaptation to climate change
● More efficient use of natural resources (including water and soil) in our farming practices
● Sustainable management and equity
● Food securityCrop performance (yields are globally decreasing)
● …
Genotyping and Phenotyping
Plant phenotyping has become a bottleneck for progress in plant science and plant breeding
![Page 12: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/12.jpg)
11 avril 2013
Pascal Neveu 12
Phenome High throughput plant phenotyping
French Infrastructure 9 multi-species platforms
● 2 green house platforms
● 5 field platforms
● 2 omics platforms
![Page 13: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/13.jpg)
11 avril 2013
Pascal Neveu 13
5 Field Platforms
Various scales and data types● Cell, organ, plant, population● Images, hyperspectral, spectral, sensors, human readings...
Thousands of micro-plots
time
![Page 14: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/14.jpg)
11 avril 2013
Pascal Neveu 14
2 Green house Platforms
-1
Time (d)
0 20 40 60 80 1000
10
20
30
40
50
60
Pla
nt b
iom
ass
g
Various scales and data types
time
![Page 15: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/15.jpg)
11 avril 2013
Pascal Neveu 15
2 « Omics » platforms
●
Grinding weighting (-80°C)
ExtractionFractionation
PipettingIncubation Reading
Various data complex types
Composition and structure
Quantification of metabolites and enzyme activities
![Page 16: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/16.jpg)
11 avril 2013
Pascal Neveu 16
Data management challengesin Phenome: Volume growth
40 Tbytes in 2013, 100 Tbytes in 2014, …
● Volume is a relative concept– Exponential growth makes hard
● Storage● Management● Analysis
Phenome HPC and Storage→ Grid computing (FranceGrille, EGI)– On-demand infrastructure and Elasticity– Virtualization technologies – Data-Based parallelism
(same operation on different data)
![Page 17: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/17.jpg)
11 avril 2013
Pascal Neveu 17
Data management challengesin Phenome: Variety
● Produced by different communities (geneticians, ecophysiologists, farmers, breeders, etc)
● Data integration needs extensive connections and associations to other types of data
● (environments, individuals, populations, etc.)● Different semantics, data schemas, …
Extremely diverse data
→ Web API, Ontology sets, NoSQL and Semantic Web methods
![Page 18: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/18.jpg)
11 avril 2013
Pascal Neveu 18
Data management in Phenome: Velocity
– Green House platforms produce tens of thousands images/day (200 days/year)
– Field platforms produce tens of thousands images/day(100 days/year)
– Omic platforms produce tens of Gbytes/day(300 days/year)
Scientific Workflow – OpenAlea /provenance module (Virtual Plant INRIA team) – Scifloware (Zenith INRIA team)
![Page 19: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/19.jpg)
11 avril 2013
Pascal Neveu 19
Data management challengesin Phenome: Validity
Data cleaning
● Automatically diagnose and manage:– Consistency? Duplicates? Wrong?– Annotation consistency?– Outliers? – Disguised missing data?– ...
Some approaches – Unsupervised Curve clustering (Zenith INRIA team)– Curve fitting over dynamic constraints– Image Clustering
![Page 20: Webinar@AIMS: INRA's Big Data Perspectives and Implementation Challenges](https://reader031.vdocuments.site/reader031/viewer/2022021502/58ebc9191a28ab6c4f8b4665/html5/thumbnails/20.jpg)
11 avril 2013
Pascal Neveu 20
Conclusion
High throughput phenotyping data:
– Hard to produce
– Hard to manage
– Hard to analyse
→ We need new approaches
Thank you for your attention