Integrated Data Management forIntegrated Data Management for
Agricultural ResearchAgricultural Research
�� Diganta Nath, InternDiganta Nath, Intern
�� Dr. Rosemary Renaut, Committee ChairDr. Rosemary Renaut, Committee Chair
•• Director, Computational Biosciences PSMDirector, Computational Biosciences PSM
�� Dr. Jeffrey W. White, Internship AdvisorDr. Jeffrey W. White, Internship Advisor
•• ALARC, USDAALARC, USDA--ARC, Maricopa, AZARC, Maricopa, AZ
�� Dr. Hasan Davulcu, Committee Member Dr. Hasan Davulcu, Committee Member
•• Dept of Computer Science & Dept of Computer Science & EnggEngg..
Goals of ProjectGoals of Project
�� Improve data management, analysis Improve data management, analysis
and distributionand distribution
•• GIS Analysis of GIS Analysis of Lesquerella fendleriLesquerella fendleri
•• GMS database for Lesquerella (LesquIS)GMS database for Lesquerella (LesquIS)
•• Web interface for LesquISWeb interface for LesquIS
Goals contd.Goals contd.
•• GMS database for Vernonia (VernIS)GMS database for Vernonia (VernIS)
•• Web interface for VernISWeb interface for VernIS
•• Excel workbook as per ICASA standardsExcel workbook as per ICASA standards
The novel crop The novel crop -- LesquerellaLesquerella
•• Contains oil rich in Contains oil rich in hydroxyhydroxy fatty acid fatty acid
(HFA)(HFA)
•• Used in making resins, waxes, motor Used in making resins, waxes, motor
oils etc.oils etc.
Climate Analysis of Climate Analysis of L. fendleriL. fendleri
Distribution using DIVADistribution using DIVA--GISGIS
�� Data obtained from ALARC Data obtained from ALARC
collections, ASU Herbariumcollections, ASU Herbarium
�� Integrated into one database Integrated into one database –– 248 248
collectionscollections
�� Collection locations linked to climate Collection locations linked to climate
variablesvariables
DIVA contd.DIVA contd.
�� Frequency of mean temperature during Frequency of mean temperature during
the wettest quarter for collection sites of the wettest quarter for collection sites of
L. fendleriL. fendleri
DIVA contd.DIVA contd.
�� Frequency for precipitation of driest Frequency for precipitation of driest
quarterquarter
Climate Analysis algorithmsClimate Analysis algorithms
�� BIOCLIMBIOCLIM
•• Extracts climate data set for collection Extracts climate data set for collection
pointspoints
•• Computes mean and standard deviation Computes mean and standard deviation
from mean for each climatic variablefrom mean for each climatic variable
•• Builds an envelope identifying similar Builds an envelope identifying similar
areas based on percentile.areas based on percentile.
Algorithms contd.Algorithms contd.
�� DOMAINDOMAIN•• Based on GOWER distance Based on GOWER distance –– Relative measure Relative measure of similarity (Absolute distance/maximum of similarity (Absolute distance/maximum distance)distance)
•• Calculates GOWER distance between collection Calculates GOWER distance between collection points and each cellpoints and each cell
•• Generates a map based on similarityGenerates a map based on similarity
•• D = (1 D = (1 –– d ) * 100d ) * 100AB
DIVA DIVA -- BIOCLIMBIOCLIM
�� BIOCLIM Analysis BIOCLIM Analysis –– 3 variables3 variables
BIOCLIM contd.BIOCLIM contd.
�� BIOCLIM Analysis BIOCLIM Analysis –– 4 variables4 variables
DIVA DIVA -- DOMAINDOMAIN
�� DOMAIN Analysis DOMAIN Analysis –– 3 variables3 variables
DOMAIN contd.DOMAIN contd.
�� DOMAIN Analysis DOMAIN Analysis –– 4 variables4 variables
DIVA DIVA –– Next stepNext step
�� Collect more distribution dataCollect more distribution data
�� Involve soil chemistryInvolve soil chemistry
�� Assess other niche modeling Assess other niche modeling
methodsmethods
LesquIS GMSLesquIS GMS
�� ICIS databaseICIS database
•• GMSGMS
�� Import data from Excel into Excel Import data from Excel into Excel
spreadsheetspreadsheet
�� Implement standardized processImplement standardized process
has progenitors
(Recursive
Links) has group
has source
Developed by
Called
Developed at
Developed by
With value of
Named by
METHODS
NAMES
LOCATIONS
USERS
USER-
DEFINED
FIELDS
ATTRIBUTES
Named at
Germplasm
Defined property
Assigned at
Assigned by
Generative Germplasm
Derivative Germplasm
GMS database
ER diagram
LesquIS GMS contd.LesquIS GMS contd.
�� Form for Loading new accessionsForm for Loading new accessions
LesquIS GMS contd.LesquIS GMS contd.
�� Form for loading AttributesForm for loading Attributes
LesquIS webLesquIS web
�� Custom web interface to search Custom web interface to search
Lesquerella germplasm recordsLesquerella germplasm records
�� Technology used Technology used ––
•• Microsoft Active Server PagesMicrosoft Active Server Pages
•• MySQL server databaseMySQL server database
�� Hosted on IIS serverHosted on IIS server
LesquIS web contd.LesquIS web contd.
�� MSMS--Access to MySQL conversionAccess to MySQL conversion
�� Custom tool Custom tool –– NavicatNavicat
�� MySQL Enterprise ManagerMySQL Enterprise Manager
LesquIS web demoLesquIS web demo
�� Lesquerella Information SystemLesquerella Information System
VernIS GMSVernIS GMS
�� Vernonia (ironweed)Vernonia (ironweed)
•• Contains oil rich in epoxy fatty acids.Contains oil rich in epoxy fatty acids.
•• Potential use as plasticizers and Potential use as plasticizers and
additives in PVC, drying agent in paints.additives in PVC, drying agent in paints.
•• Research initiated in ALARC in 1990.Research initiated in ALARC in 1990.
VernIS contd.VernIS contd.
�� Collection Information in Excel.Collection Information in Excel.
�� Same methodology and tools used as Same methodology and tools used as in LesquIS.in LesquIS.
�� No changes to tools were necessary.No changes to tools were necessary.
�� VernIS GMS database implementedVernIS GMS database implemented
VernIS WebVernIS Web
�� Similar to LesquIS webSimilar to LesquIS web
�� Used the same codeUsed the same code--base and base and
processprocess
�� Hosted on the same web server and Hosted on the same web server and
database serverdatabase server
VernIS web demoVernIS web demo
�� Vernonia Information SystemVernonia Information System
Excel workbook for data collection Excel workbook for data collection
and data interchangeand data interchange
�� Preformatted workbook for collecting data Preformatted workbook for collecting data of field experimentsof field experiments•• Not all workers need or want a complex Not all workers need or want a complex system like ICISsystem like ICIS
�� Functionalities to import/export dataFunctionalities to import/export data
�� XML outputXML output
�� ICASA standardsICASA standards
Workbook demoWorkbook demo
Conclusion and Future directionConclusion and Future direction
�� DIVADIVA
�� LesquIS and VernISLesquIS and VernIS
�� ICASAICASA
Special ThanksSpecial Thanks
�� Dr. Rosemary RenautDr. Rosemary Renaut
�� Dr. Jeffery W. WhiteDr. Jeffery W. White
�� Dr. Hasan DavulcuDr. Hasan Davulcu
�� Dr. Dave DierigDr. Dave Dierig
�� Pernell TomasiPernell Tomasi
�� Dr. Andrew SalywonDr. Andrew Salywon
ReferencesReferences�� Anonymous 2003. Vascular Plant Herbarium, Arizona State UniversiAnonymous 2003. Vascular Plant Herbarium, Arizona State University. ty.
http://http://lifesciences.asu.edulifesciences.asu.edu/herbarium//herbarium/�� Bruskiewich, R.M., Bruskiewich, R.M., CosicoCosico, A.B., , A.B., EusebioEusebio, W., Portugal, A.M., Ramos, L.M., Reyes,, W., Portugal, A.M., Ramos, L.M., Reyes,
Ma.TMa.T., ., SallanSallan, M.A.B., , M.A.B., UlatUlat, V.J.M., Wang, X., McNally, K.L., Sackville Hamilton, R., Mc, V.J.M., Wang, X., McNally, K.L., Sackville Hamilton, R., McLaren, C.G. 2003. Linking genotype to phenotype: the Laren, C.G. 2003. Linking genotype to phenotype: the International Rice Information System (IRIS). Bioinformatics 19:International Rice Information System (IRIS). Bioinformatics 19: 6363--65.65.
�� Busby, J.R. 1991. BIOCLIM Busby, J.R. 1991. BIOCLIM -- a bioclimatic analysis and prediction system. Pp. 64a bioclimatic analysis and prediction system. Pp. 64--6868
in in MargulesMargules, C.R. and Austin, M.P. (, C.R. and Austin, M.P. (edseds) Nature Conservation: Cost Effective Biological Surveys and dat) Nature Conservation: Cost Effective Biological Surveys and data Analysis. Melbourne: CSIROa Analysis. Melbourne: CSIRO
�� Carpenter, G., Carpenter, G., GillisonGillison, A.N. and Winter, J. 1993. DOMAIN: a flexible , A.N. and Winter, J. 1993. DOMAIN: a flexible modellingmodelling
procedure for mapping potential distributions of plants and animprocedure for mapping potential distributions of plants and animals. Biodiversity and Conservation 2:667als. Biodiversity and Conservation 2:667--680.680.�� DeLacyDeLacy, I.H., McLaren, C.G., Fox, P.N., White, J.W. and , I.H., McLaren, C.G., Fox, P.N., White, J.W. and TrethowanTrethowan, R. The , R. The
Genealogy Management System Genealogy Management System http://www.icis.cgiar.org:8080/TDM/Docs/ICIS02G_GMS_Overview.DOChttp://www.icis.cgiar.org:8080/TDM/Docs/ICIS02G_GMS_Overview.DOC (verified Nov. 20, 2006)(verified Nov. 20, 2006)�� DelacyDelacy, I., and , I., and MicallefMicallef, S. Global Wheat Information System, S. Global Wheat Information System
http://mendel.lafs.uq.edu.au:8080/ICIS5/ABOUTGWIS.HTMhttp://mendel.lafs.uq.edu.au:8080/ICIS5/ABOUTGWIS.HTM (verified Nov. 20, 2006)(verified Nov. 20, 2006)�� Dierig, D.A., Tomasi, P., Salywon, A.M., Dierig, D.A., Tomasi, P., Salywon, A.M., DahlquistDahlquist, G.H., Isbell, T.A., Ray, D.T. 2005., G.H., Isbell, T.A., Ray, D.T. 2005.
Breeding strategies for improvement of lesquerella fendleri (Breeding strategies for improvement of lesquerella fendleri (brassicaceaebrassicaceae). pp 689). pp 689--697.697.�� ElithElith, J., H. Graham, Catherine, R. P. Anderson, M. , J., H. Graham, Catherine, R. P. Anderson, M. DudikDudik, S. Ferrier, A. , S. Ferrier, A. GuisanGuisan, R. J., R. J.,,
Hijmans, F. Hijmans, F. HuettmannHuettmann, J. R. , J. R. LeathwickLeathwick, A. Lehmann, J. Li, L. G. , A. Lehmann, J. Li, L. G. LohmannLohmann, B. A. , B. A. LoiselleLoiselle, G. , G. ManionManion, C. Moritz, M. Nakamura, Y. , C. Moritz, M. Nakamura, Y. NakazawaNakazawa, J. , J. McCMcC. . M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson,M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. R. ScachettiScachetti--Pereira, R. E. Pereira, R. E. SchapireSchapire, J. , J. SoberonSoberon, S. Williams, M. S. , S. Williams, M. S. WiszWisz, and N. E. , and N. E. Zimmermann. 2006. Novel methods improve prediction of species' dZimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. istributions from occurrence data. EcographyEcography 29:12929:129--151.151.
�� Fox, P.N., McLaren, C.G. and White, J.W. The International Crop Fox, P.N., McLaren, C.G. and White, J.W. The International Crop
Information System: Reflects the Information System: Reflects the InforamtionInforamtion--Intensive Nature of Modern Crop Research. Intensive Nature of Modern Crop Research. http://www.icis.cgiar.org:8080/TDM/Docs/ICIS01k_Introduction.DOChttp://www.icis.cgiar.org:8080/TDM/Docs/ICIS01k_Introduction.DOC (verified Nov. 20, 2006)(verified Nov. 20, 2006)
�� Franco, J., Crossa, J., Warburton, M.L., and Franco, J., Crossa, J., Warburton, M.L., and TabaTaba, S. 2006. Sampling Strategies for Conserving Maize Diversity Wh, S. 2006. Sampling Strategies for Conserving Maize Diversity When Forming Core Subsets Using en Forming Core Subsets Using Genetic Markers. Crop Sci 46: 854Genetic Markers. Crop Sci 46: 854--864.864.
�� Hijmans, R. J., Hijmans, R. J., SchreuderSchreuder, M., De la Cruz, J. and , M., De la Cruz, J. and GuarinoGuarino, L.. 1999. Using GIS to check, L.. 1999. Using GIS to check
coco--ordinates of ordinates of genebankgenebank accessions. Genetic Resources and Crop Evolution 46:291accessions. Genetic Resources and Crop Evolution 46:291--296.296.�� Hijmans, R.J., Hijmans, R.J., GuarinoGuarino, L., Jarvis, A., O'Brien, R., , L., Jarvis, A., O'Brien, R., MathurMathur P., C. P., C. BussinkBussink, M. Cruz, I. , M. Cruz, I.
BarrantesBarrantes and Rojas, E. 2005. DIVAand Rojas, E. 2005. DIVA--GIS, version 5.2. Manual. GIS, version 5.2. Manual. http://www.divahttp://www.diva--gis.org/DIVAgis.org/DIVA--GIS5_manual.pdfGIS5_manual.pdf
�� Hijmans, R.J. & D.M. Spooner, 2001. Geographic distribution of wHijmans, R.J. & D.M. Spooner, 2001. Geographic distribution of wild potato species.ild potato species.
AmerAmer J J BotBot 88: 210188: 2101––2112.2112.�� Hijmans, R.J., Cameron, S., and Hijmans, R.J., Cameron, S., and ParraParra, J., 2004. DIVA, J., 2004. DIVA--GIS Climate data from GIS Climate data from WorldclimWorldclim, http://, http://www.worldclim.orgwww.worldclim.org/, version 1.3, October 2004/, version 1.3, October 2004�� Hunt, L.A., White, J.W., Hoogenboom, G., 2001. Agronomic data: aHunt, L.A., White, J.W., Hoogenboom, G., 2001. Agronomic data: advances in dvances in
documentation and protocols for exchange and use. Agricultural Sdocumentation and protocols for exchange and use. Agricultural Systems 70, 477ystems 70, 477--492.492.
�� Hunt, L.A., G. Hoogenboom, J.W. Jones, J.W. White, 2006. ICASA VHunt, L.A., G. Hoogenboom, J.W. Jones, J.W. White, 2006. ICASA Version 1.0 Data ersion 1.0 Data
Standards for Agricultural Research and Decision Support. Standards for Agricultural Research and Decision Support. www.icasa.netwww.icasa.net/standards (verified Nov. 20, 2006)./standards (verified Nov. 20, 2006).�� McLaren, G., Bruskiewich, R., Metz, T. INTERNATIONAL RICE INFORMMcLaren, G., Bruskiewich, R., Metz, T. INTERNATIONAL RICE INFORMATION ATION
SYSTEM web. SYSTEM web. http://http://www.iris.irri.orgwww.iris.irri.org/(Verified/(Verified Nov. 20, 2006)Nov. 20, 2006)�� ReyesReyes--UlatUlat, M.T., Bruskiewich, R., , M.T., Bruskiewich, R., CosicoCosico, A. ICIS WEB INTERFACE (, A. ICIS WEB INTERFACE (ICISWebICISWeb))
http://www.icis.cgiar.org:8080/TDM/Docs/ICIS16A_ICIS_Web.dochttp://www.icis.cgiar.org:8080/TDM/Docs/ICIS16A_ICIS_Web.doc (verified Nov. 20, 2006)(verified Nov. 20, 2006)�� Thompson, A.E., D.A., Dierig, E.R. Johnson, G.H. Thompson, A.E., D.A., Dierig, E.R. Johnson, G.H. DahlquistDahlquist, and R. , and R. KleimanKleiman. 1994a.. 1994a.
Germplasm development of Vernonia galamensis as a new industrialGermplasm development of Vernonia galamensis as a new industrial oilseed crop.oilseed crop.
Indus. Crops Prod. 3:185Indus. Crops Prod. 3:185--200.200.
QuestionsQuestions
??