[ieee 2014 third international conference on agro-geoinformatics - beijing, china...

4
GCDViewer: an Online Data Query, Visualization and Analysis System for Global Climatic Data Hao Xu Ministry of Education Key Laboratory for Earth System Modeling, Center for Earth System Science, Tsinghua University Beijing, China E-mail:[email protected] Yuqi Bai Ministry of Education Key Laboratory for Earth System Modeling, Center for Earth System Science, Tsinghua University Beijing, China E-mail: [email protected] AbstractClimatic data are critical in supporting of the agricultural and ecological studies. Traditionally, the way of managing and accessing to the data is locally fulfilled. To demonstrate the value of cyberinfrastructure and to ease the access to the data, this paper presented a case study of managing, visualizing and analyzing the global multi-station climatic data in a Web-based environment. This study took 1961-1990 global standard climate normals data published by World Meteorological Organization as an exemplar. It proposed a relational data model to store the climate normals data. It utilized SQL Server to store and manage the data as Web-accessible data services, ArcGIS Server as GIS Server to enable on-demand accessuser operation map and presented an ASP.Net-based web portal for scientists to fulfill data search, data visualization and data analysis functions. The system design and the implementation details of this prototype are introduced. This case study clearly shows that the unstructured climatic text data could be transformed to be structured database, which further enables the on-demand data sharing, search, retrieval and online analysis functions. GCDViewer's design principle and system architecture could apply to many other types of scientific data. It clearly shows the advantage of building cyberinfrastructure to improve the overall research efficiency. KeywordsClimatic Data; Online Analysis; Cyberinfrastructure I. INTRODUCTION Turing award winner Jim Gray proposed that the paradigms of scientific research have transferred from experimental observation, theoretical analysis and simulation calculation phases into a new data-intensive analysis and integration phase [1]. Meteorology and climate are typical data-intensive research fields where large amounts of different types of e observation data and model output data need to be effectively archived, shared, analyzed, visualized and integrated[2]. There are many Web-based data sharing platforms for climatic and meteorological research that are in operation. The National Weather Service (NWS) of America has applied GIS and Web-based technologies to disseminate near-real time weather monitoring information [3]. National Oceanic and Atmospheric Administration (NOAA) has provided Web-based data analysis and GIS spatial analysis functions through the "Climate Data Online" platform (http://www.ncdc.noaa.gov/cdo-web/). It enables users’ access to the NOAA's archive of historical weather and climate data in addition to station history information. The meteorological departments of China and Australia have also introduced their own climatic and meteorological data sharing solutions (http://www.bom.gov.au, http://cdc.cma.gov.cn). Users can customize input parameters such as time, stations and methods. Analytic results can be visualized as charts and graphs [2]. These existing climatic data systems are very comprehensive and useful. But they are rigid and not flexible on the other hand. The query, visualization, and analysis functions they provide can only be used against the climatic data that are locally archived on those servers. Researchers usually store the climatic data in their local working environment. It is desired to make these data system open to the end users, so that they could upload the climatic data of interest to leverage the data manipulation functions that are already there on these data systems. Focusing on the WMO global standard climate nornals data, this case study demonstrated a feasible technical solution for such an open system, where both of the locally archived climatic data, and the remote datasets, once ingested into the database, could be easily queried, visualized, and analyzed. II. SYSTEM DESIGN A. Data Source This study utilized 1961-1990 global standard climate normals data published by WMO. This data product covers climatological standard normals computed by data from more than 4000 stations of 135 countries and territories over the world. Data files include ASCII data files (file extension .dat), documentation files (file extension .txt), eye-readable ASCII table files (file extension .txt in subdirectories of the TABLES directory), graphics files (file extension .pcx), and limited extraction software (file extension .exe). The data files (file

Upload: yuqi

Post on 10-Mar-2017

215 views

Category:

Documents


2 download

TRANSCRIPT

GCDViewer: an Online Data Query, Visualization and Analysis System for Global Climatic Data

Hao Xu Ministry of Education Key Laboratory for Earth System

Modeling, Center for Earth System Science, Tsinghua University

Beijing, China E-mail:[email protected]

Yuqi Bai Ministry of Education Key Laboratory for Earth System

Modeling, Center for Earth System Science, Tsinghua University

Beijing, China E-mail: [email protected]

Abstract—Climatic data are critical in supporting of the agricultural and ecological studies. Traditionally, the way of managing and accessing to the data is locally fulfilled. To demonstrate the value of cyberinfrastructure and to ease the access to the data, this paper presented a case study of managing, visualizing and analyzing the global multi-station climatic data in a Web-based environment.

This study took 1961-1990 global standard climate normals data published by World Meteorological Organization as an exemplar. It proposed a relational data model to store the climate normals data. It utilized SQL Server to store and manage the data as Web-accessible data services, ArcGIS Server as GIS Server to enable on-demand accessuser operation map and presented an ASP.Net-based web portal for scientists to fulfill data search, data visualization and data analysis functions.

The system design and the implementation details of this prototype are introduced. This case study clearly shows that the unstructured climatic text data could be transformed to be structured database, which further enables the on-demand data sharing, search, retrieval and online analysis functions. GCDViewer's design principle and system architecture could apply to many other types of scientific data. It clearly shows the advantage of building cyberinfrastructure to improve the overall research efficiency.

Keywords—Climatic Data; Online Analysis; Cyberinfrastructure

I. INTRODUCTION

Turing award winner Jim Gray proposed that the paradigms of scientific research have transferred from experimental observation, theoretical analysis and simulation calculation phases into a new data-intensive analysis and integration phase [1]. Meteorology and climate are typical data-intensive research fields where large amounts of different types of eobservation data and model output data need to be effectively archived, shared, analyzed, visualized and integrated[2].

There are many Web-based data sharing platforms for climatic and meteorological research that are in operation. The National Weather Service (NWS) of America has applied GIS and Web-based technologies to disseminate near-real time

weather monitoring information [3]. National Oceanic and Atmospheric Administration (NOAA) has provided Web-based data analysis and GIS spatial analysis functions through the "Climate Data Online" platform (http://www.ncdc.noaa.gov/cdo-web/). It enables users’ accessto the NOAA's archive of historical weather and climate data in addition to station history information. The meteorological departments of China and Australia have also introduced their own climatic and meteorological data sharing solutions (http://www.bom.gov.au, http://cdc.cma.gov.cn). Users can customize input parameters such as time, stations and methods. Analytic results can be visualized as charts and graphs [2].

These existing climatic data systems are very comprehensive and useful. But they are rigid and not flexible on the other hand. The query, visualization, and analysis functions they provide can only be used against the climatic data that are locally archived on those servers. Researchers usually store the climatic data in their local working environment. It is desired to make these data system open to the end users, so that they could upload the climatic data of interest to leverage the data manipulation functions that are already there on these data systems.

Focusing on the WMO global standard climate nornals data, this case study demonstrated a feasible technical solution for such an open system, where both of the locally archived climatic data, and the remote datasets, once ingested into the database, could be easily queried, visualized, and analyzed.

II. SYSTEM DESIGN

A. Data Source This study utilized 1961-1990 global standard climate

normals data published by WMO. This data product covers climatological standard normals computed by data from more than 4000 stations of 135 countries and territories over the world.

Data files include ASCII data files (file extension .dat), documentation files (file extension .txt), eye-readable ASCII table files (file extension .txt in subdirectories of the TABLES directory), graphics files (file extension .pcx), and limited extraction software (file extension .exe). The data files (file

extension .dat) are large unstructured files in a fixed length record format. The normals data were computed by the Member countries and territories of WMO.

This study mainly used station metadata, data files and some documentation files. The station metadata consists of information that identifies the station and includes: WMO region number; WMO international index number; location information (latitude, longitude, elevation) and station name as provided by the WMO Member. Each station is uniquely identified by its [region number-country code-WMO station number-national identification number] aggregate information. The WMO global standard normals data are archived by station sort(region number, country code, WMO station number, national identification number) then parameter (climatic element ,statistic, qualifier code).Each data record consists of station identification information, data period, parameter information, normal data values, and quality assurance (QC) codes. In addition, some documentation files described and explained country codes, region codes, climatic elements codes and statistic codes in data files. All the original data and station files are stored by dat files.

Original files we had to process are data files, station metadata files, region codes file, country codes file, climatic elements codes file and statistic codes file. In order to import data more conveniently, we developed several text processing tools to batch separating different columns by separators. Then we chose SQL Server2005 to store and manage these attribute data.

B. Data Management

Figure 1 the Global Weather Station Map

The system mainly consists of spatial data and attribute data. We utilized shapefile format of ESRI to store spatial data. ArcGIS Server were used to distribute and visualize these spatial data. Spatial data were made from station location information and a global base map (See Figure1). The base map layers and analytic layers will be displayed on clients’ browsers through asp.net pages.

We used SQL Server2005 to set up and manage attribute data. Data tables we created were:

1) Country table: country code and country name

2) Region table: country code and country name

3) Climatic elements table: climatic elements code, unit and description.

4) Statistic elements table: statistic elements code and description.

5) Station table:station ID, region code, country code, station number, station identifier, latitude, longitude, station name and country name.

6) Data table: data ID, region code, country code, station number, station identifier, start time of record, end time of record, climatic element code, statistic element code, statistic element identifier, climate normal for every month(Jan,Feb,…Dec), annual climate normal and corresponding QC codes.

Based on tables designed above, we designed a rational model to describe and manage relations between different tables (see Figure 2).

allnorms(Data Table)

Countries(Country Table)

Regions(Region Table) Stations(Station Table)

Elements(Climatic Elements Table)

Statistics(Statistic Elements Table)

[Data ID] bigint Not NullPK

[Region Code]nvarchar(255) Not Null

FK

[Station Num.]nvarchar(255) Not Null

[Country Code]nvarchar(255) Not Null

FK

[Sta. Identifier]nvarchar(255)

[Begin Year]nvarchar(255)

[End Year]nvarchar(255)

Jan/Feb.../Dec float

Annual float

QCnvarchar(255)

[elements code]nvarchar(255)FK

[Statistic Code]nvarchar(255)FK

[Statistic Identifier]nvarchar(255)

[QC Test]nvarchar(255)

code nvarchar(255) Not Null

PK

[Country or Territory Name]nvarchar(255) Not Null

code nvarchar(255) Not Null

PK

Regionnvarchar(255) Not Null

Station_ID int Not NullPK

[Region Code]nvarchar(255) Not Null

FK

[Country Code]nvarchar(255) Not Null

FK

code nvarchar(255) Not Null

PK

Unitsnvarchar(255)

Descriptionnvarchar(255)

code nvarchar(255) Not Null

PK

Descriptionnvarchar(255)

[station name]nvarchar(255) Not Null

[Sta. Identifier]nvarchar(255)

latitude float

[Station Num.]nvarchar(255) Not Null

elevation flaot

longitude float

[Country or Territory Name]nvarchar(255)

Figure 2 Rational Model of WMO Climatic Data

C. System Architechture A multi-tier Browser/Server (B/S) architecture was

developed in this case study. Figure 3 shows the architecture of the system.

User

Web Server IIS7.0

ArcGIS Application Server

Client Layer

Web Layer

Middle Layer

Data Layer

Querying requested data by SQL command and visualizing the analytic results. Receiving

the processing results from ArcGIS Server.

Map displayzooming, pan, querying etc.

Figure3 System Architecture

All the analysis operations are executed on the server side. The system functions consist of map operations and parameter selections. Users can search for stations on the map or making selections from drop-down lists.

ArcGIS Server is served as application server in the middle layer, which is specially responsible for processing and generating map visualization.

Data layer consists of spatial data and attribute data. We used ArcGIS to manage spatial data and SQL Server 2005 to store and manage attribute data.

III. FUNCTION AND IMPLEMENTATION

The whole system was programmed in the asp.net environment. The programming language we used on the server side was C#. ArcGIS Server and its Application Program Interface (API) were also used for displaying spatial data. Asynchronous JavaScript and XML (AJAX) technique was used in the system. When user operates the map or submits analysis request, only part of the page will be refreshed, which has provided a better user experience.

The system mainly have three core functions: data upload, data analysis and map operation (see figure 4).

Online Analysis System for Climatic Data

Data Analysis Map Operations

Select Region&Country

Select Stations(add/delete)

Select Analysis Elem

ents/Method

Select Visualization M

ethod

Generate charts/table

Zoom in/Zoom

out

Pan

Full Extent

Previous/Next View

Select Stations/Highlight

Data Upload

Upload W

MO

formatted

Data

Data Verification Figure 4 Overall functions of the system

Users can select stations from the dropdown list, or directly define ones on the map. They can select parameters of interest, e.g. Dry Bulb Temperature, to visualize and analyze. They can further select statistic method to apply to the matched weather station data, and define how the analysis results could be displaced, e.g. Column Clustered.

As shown in the lower part of the Figure 5, the analysis result are visualized as a graph, and presented as a table as well. Users could also upload the formatted climatic data that are archived in their local working environment. As long as they are compliant with the data format definition defined by WMO, they could be remotely ingested into the database, and then will be available for search, visualize and analysis as well.

Figure 5 Overall system interface

IV. OUTLOOK AND CONCLUSION

GCDViewer demonstrates the feasibility of maintaining the climatic data in a relational database, providing Web-based data sharing, search, visualization and analysis functions, and supporting remote data ingestion to provide these functions against the data that are archived in researchers' local working environment. The advantage of this system is obvious: it can dramatically ease the access to the global climatic data. The design principle and system architecture of GCDViewer could apply to many other types of scientific data.

Furthermore, by providing Web-accessible data manipulation and analysis function, GCDViewer demonstrates the advantage of research Cyberinfrastructure, which is proposed by U.S. National Science Foundation. As Cyberinfrastructure is defined as an advanced scientific research supporting environment to support data acquisition, data storage, data management, data integration, data analysis and data visualization, GCDViewer clearly shows that how the research efficiency of climatic data analysis could be greatly

improved[4-7]. Users do not have to archive and manipulate the climatic data locally. Instead, much of the analysis work could be done in a mouse click.

REFERENCES

[1] Hey, A.J., S. Tansley and K.M. Tolle, The fourth paradigm: data-intensive scientific discovery. 2009.

[2] Gao, M., L. Jie and W. Zhang, Data Sharing System for Meteorological Researches. Journal of Applied Animal Meteorological Science, 2004. 15(z1): p. 17-25.

[3] Xianju, Z., Research and Application of Meteorological Service. 2009, Wuhan University of Technology.

[4] Wang, S. and M.P. Armstrong, A theoretical approach to the use of cyberinfrastructure in geographical analysis. International Journal of Geographical Information Science, 2009. 23(2): p. 169-193.

[5] Hey, T. and A.E. Trefethen, Cyberinfrastructure for e-Science. Science, 2005. 308(5723): p. 817-821.

[6] Council, C., Cyberinfrastructure vision for 21st century discovery. 2007: National Science Foundation, Cyberinfrastructure Council.

[7] Yang, C., et al., Geospatial cyberinfrastructure: past, present and future. Computers, Environment and Urban Systems, 2010. 34(4): p. 264-277.