[ieee 2014 third international conference on agro-geoinformatics - beijing, china...

5
GIScript: Towards an Interoperable Geospatial Scripting Language for GIS Programming Mingda Zhang, Peng Yue, Xia Guo State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing Wuhan University, Wuhan, China [email protected] Abstract—A scripting language is a form of programming language that is interpreted (translated on the fly) rather than compiled ahead of time. Programming languages used for scripting range from domain-specific languages (e. g., Bash for Unix, JavaScript for Web browsers, VBA for Microsoft Office applications) to general-purpose programming languages (e. g., Python, Ruby, Perl, PHP). Scripting languages have been widely used, since they allow the quick programming development, easy communication with other programs, and convenient job controls. In the GIS domain, geo-enabled scripting languages such as ArcPy and GeoScript plays an increasingly important role in automating workflow-based geoprocessing and map generation. Such scripting languages could be further extended into a parallel computing environment to enable the automation of high performance geocomputation. It is necessary then to call for an interoperable geospatial scripting language for GIS programming, the so-called GIScript. This paper proposes a conceptual framework for GIScript. It discusses the key considerations for the design of GIScript. Possible solutions towards an implementation of GIScript are suggested. Keywords—GIScript, Geospatial scripting language, Interoperability, Parallel execution I. INTRODUCTION Scripting languages such as Python and Perl are becoming more and more popular among programmers due to their benefits over conventional system programming languages such as C and JAVA. These benefits include simple syntax, easy to use, convenient job control, and “glue” role. The most significant feature of scripting languages is that they are often used as “glue” languages to connect software components, thus facilitating system integration. These “glue” languages often have simple syntax, good support for inter-process communication, and do not need any compilation. System programming languages are mostly used to create applications from scratch, whereas scripting languages are intended to glue together components written in different system programming languages, which allow significant software reuse. Thus scripting languages and system programming languages could complement each other. There are numerous commercial software such as ArcGIS and open source tools such as GRASS (Geographic Resources Analysis Support System) [1], Quantum GIS (QGIS) [2] and GeoTools [3,4] available for processing geospatial data. Each software has its own algorithm components. In addition, every tool tends to have their own product formats, which generates a big gap between different software. Besides, no single tool is comprehensive enough to have all necessary geoprocessing functions or handle all spatial data formats. Sometimes, we need to take the best of all and integration components from different software packages together. Scripting languages can help glue them together. Some of early work on geospatial scripting languages is the Arc Macro Language (AML) [5]. It is coupled with the workstation Arc/Info architecture, the prior version of ESRI GIS software. In a later product, ArcView, the Avenue scripting language was also released. In the current ESRI GIS software – ArcGIS, ArcPy, a Python site package, is developed. It includes tools, functions, classes, and modules, which serves a software development kit (SDK), and allows users to create simple or complex workflows quickly and easily. GeoScript [6] is another collection of modules for geometry handling, spatial data access, and vector feature rendering. It adds geospatial capabilities to dynamic scripting languages (e. g., Groovy, JavaScript, Python, Scala). Although there are a few geospatial scripting languages available, some of them are limited to specific software such as ArcGIS or have limited functions. For example, they does not support the parallel computing. Parallel computing is to use multiple compute resources simultaneously to solve problems. The problems will be broken into discrete parts that are executed concurrently [7]. There are two typical compute resources for parallel computing: a single computer with multiple processors/cores, and clusters (computers connected by a network). With the continuous development of high performance computing (HPC), parallel computing has been widely investigated and employed. In the geospatial domain, a large amount of remotely sensed image data and other spatial data are generated every day, which need timely processing. Parallel computing has become an essential characteristic for the next generation GIS software and needs to be supported by scripting languages. All described above call for a geospatial-oriented and more general scripting language that supports parallel computing. This paper presents a conceptual framework for interoperable GIS scripting languages, named GIScript. Key considerations for GIScript are discussed. Possible solutions towards an implementation of GIScript are suggested.

Upload: xia

Post on 29-Mar-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - GIScript:

GIScript: Towards an Interoperable Geospatial Scripting Language for GIS Programming

Mingda Zhang, Peng Yue, Xia Guo State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing

Wuhan University, Wuhan, China [email protected]

Abstract—A scripting language is a form of programming

language that is interpreted (translated on the fly) rather than compiled ahead of time. Programming languages used for scripting range from domain-specific languages (e. g., Bash for Unix, JavaScript for Web browsers, VBA for Microsoft Office applications) to general-purpose programming languages (e. g., Python, Ruby, Perl, PHP). Scripting languages have been widely used, since they allow the quick programming development, easy communication with other programs, and convenient job controls. In the GIS domain, geo-enabled scripting languages such as ArcPy and GeoScript plays an increasingly important role in automating workflow-based geoprocessing and map generation. Such scripting languages could be further extended into a parallel computing environment to enable the automation of high performance geocomputation. It is necessary then to call for an interoperable geospatial scripting language for GIS programming, the so-called GIScript. This paper proposes a conceptual framework for GIScript. It discusses the key considerations for the design of GIScript. Possible solutions towards an implementation of GIScript are suggested.

Keywords—GIScript, Geospatial scripting language, Interoperability, Parallel execution

I. INTRODUCTION Scripting languages such as Python and Perl are becoming

more and more popular among programmers due to their benefits over conventional system programming languages such as C and JAVA. These benefits include simple syntax, easy to use, convenient job control, and “glue” role. The most significant feature of scripting languages is that they are often used as “glue” languages to connect software components, thus facilitating system integration. These “glue” languages often have simple syntax, good support for inter-process communication, and do not need any compilation. System programming languages are mostly used to create applications from scratch, whereas scripting languages are intended to glue together components written in different system programming languages, which allow significant software reuse. Thus scripting languages and system programming languages could complement each other.

There are numerous commercial software such as ArcGIS and open source tools such as GRASS (Geographic Resources Analysis Support System) [1], Quantum GIS (QGIS) [2] and GeoTools [3,4] available for processing geospatial data. Each software has its own algorithm components. In addition, every

tool tends to have their own product formats, which generates a big gap between different software. Besides, no single tool is comprehensive enough to have all necessary geoprocessing functions or handle all spatial data formats. Sometimes, we need to take the best of all and integration components from different software packages together. Scripting languages can help glue them together.

Some of early work on geospatial scripting languages is the Arc Macro Language (AML) [5]. It is coupled with the workstation Arc/Info architecture, the prior version of ESRI GIS software. In a later product, ArcView, the Avenue scripting language was also released. In the current ESRI GIS software – ArcGIS, ArcPy, a Python site package, is developed. It includes tools, functions, classes, and modules, which serves a software development kit (SDK), and allows users to create simple or complex workflows quickly and easily. GeoScript [6] is another collection of modules for geometry handling, spatial data access, and vector feature rendering. It adds geospatial capabilities to dynamic scripting languages (e. g., Groovy, JavaScript, Python, Scala).

Although there are a few geospatial scripting languages available, some of them are limited to specific software such as ArcGIS or have limited functions. For example, they does not support the parallel computing. Parallel computing is to use multiple compute resources simultaneously to solve problems. The problems will be broken into discrete parts that are executed concurrently [7]. There are two typical compute resources for parallel computing: a single computer with multiple processors/cores, and clusters (computers connected by a network). With the continuous development of high performance computing (HPC), parallel computing has been widely investigated and employed. In the geospatial domain, a large amount of remotely sensed image data and other spatial data are generated every day, which need timely processing. Parallel computing has become an essential characteristic for the next generation GIS software and needs to be supported by scripting languages.

All described above call for a geospatial-oriented and more general scripting language that supports parallel computing. This paper presents a conceptual framework for interoperable GIS scripting languages, named GIScript. Key considerations for GIScript are discussed. Possible solutions towards an implementation of GIScript are suggested.

Page 2: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - GIScript:

The remainder of the paper is organized as follows. Section II describes the concept of GIScript. Key considerations in designing a GIScript is presented in Section III. Section IV provides possible solutions for technological implementation of GIScript. Conclusion is given in Section V.

II. GISCRIPT The proposal of GIScript is to direct a conceptual

framework for geospatial scripting language that facilitates geospatial data access and parallel processing. Differing from general-purpose scripting programming languages, GIScript makes it easier to express geographic objects and process geospatial information with the HPC.

GIScript can be implemented in different ways. A new language and a new runtime environment may be built, which is time-consuming and not recommended. Extensions to existing scripting language could achieve the goal, which is easier and more acceptable to users. Many scripting languages (e. g., Python, Ruby, Perl, PHP) have a long development history and are widely used. They are stable and have excellent extensions that could be reused to implement the GIScript. When selecting an existing scripting language as the starting point, its features such as object-oriented feature should be an important consideration. The following sections will analyze key considerations for GIScript and ground them to existing scripting languages to provide possible solutions.

III. KEY CONSIDERATIONS FOR GISCRIPT Geospatial data are data that are connected to a place in the

Earth. Most of human activities are linked directly or indirectly to location, and huge volumes of geospatial data are generated every day. In conventional GIS, there are two types of geospatial data – vector and raster. The ability to access, process, and visualize either raster or vector data should be expected for GIScript. To support efficient and widely employment of GIScript, parallel execution and interoperability are necessary. Although it is still an open problem on an exhaustive list of features that a geospatial scripting language must have, the following primary considerations can be identified:

A. Spatial objects Object-oriented approach is widely used in programming.

Object-oriented model is based on a collection of objects which contain values stored in variables and methods associated procedures [8]. In object-oriented language, spatial objects can be represented by geometry classes. The geometry classes (Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, etc.), together with their attributes and methods, are based on the Geometry Model specified in the OpenGIS Abstract Specification [9]. Each geometric object is associated with a spatial reference system. The Dimensionally Extended nine-Intersection Model (DE-9IM) [10] spatial predicates and several spatial operators are used to generate new geometries from existing geometries. Other objects include maps, rasters, projections and styles. These objects and predicates can be represented in scripting languages to facilitate the code generation on spatial operations.

B. Parallel execution Parallel computing is a form of computation in which

computations can be carried out simultaneously. High performance computing including parallel computing has become a significant paradigm in geocomputation. The scripting framework can provide mechanism for parallel execution in either systems with multiple processors/cores or clusters (computers connected via network) environment. Since geocomputation is often data-intensive, computing-intensive, or both, GIScript should provide capabilities to support parallel execution.

C. Interoperability Various software packages, various operating platforms,

and numerous data formats make geospatial data processing complicated. Interoperability is a key issue in GIS software development. A bridge is needed to cross the gap between different tools. GIScript is a perfect tool to glue together existing components and make data conversion easier.

IV. POSSIBLE SOLUTION To fulfill the goal of GIScript, capabilities of existing

scripting languages need investigations and appropriate extensions are necessary. The Python programming language is an interpreted, portable, object-oriented, high-level language with readability, extensibility, and ease of programming [11, 12]. Python is free and open source, which can be distributed and changed freely. Python uses a platform-independent byte code that can be executed on almost any computer. There are a variety of standard extensions for Python, which can be reused. Considering all features above, Python will be taken as an example to implement GIScript.

A. Adding Spatial classes to Python Because Python is object-oriented and easily extensible,

geometry classes representing spatial objects could be added smoothly. A new module containing spatial objects according to OpenGIS Geometry Model will be developed. Every spatial object has its own attributes, methods and is associated with a spatial reference system. Attributes are expressed by the basic data type of Python and spatial reference can be treated as a special attribute for spatial objects. Spatial objects should also provide properties for statistics such as length and methods for determining spatial relationships. The implementations of such properties or methods have been coded many times previously in existing software libraries As Python is a powerful gluing language, libraries written in other language such as C and Java could be integrated to achieve the goal. Java Topology Suite (JTS) [13] is an API of 2D spatial predicates and functions and written in Java, which can be used to judge spatial relationships between spatial objects. GeoTools (a Java tookit) provides a rich set of classes and functions for spatial data import/export and manipulation, which can be used to handle more complex processing. So the work is how to invoke Java library within Python.

Jython [14] is a Java implementation of Python, which runs on Java platform. It is especially suited for embedded scripting and interacting with Java packages. However, Jython is not

Page 3: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - GIScript:

compatible well with some standard extension modules of CPython, and work is still going to fix this issue. CPython is the default, most-widely used implementation of the Python programming language, and Python is often referred to CPython.

JPype [15] is another effort to allow Python programs access to Java packages, which bridges the worlds of Java and Python. It uses JNI to communicate with the JVM (Java Virtual Machine). When JPype starts to run, a JVM will be started by the JPype first. However, the JPype’s community is not active.

Py4J [16] enables Python programs to access Java objects in a JVM. It also supports Remote Procedure Call (RPC). Different from the JPype, the Py4J does not start a JVM, so the Java program must be started before executing the Python codes. Py4J uses plain old sockets to communicate with JVM, which are more portable in practice.

Taking the compatibility into account, CPython (Python) is a better one than Jython. JPype and Py4J are both standard extensions to Python. Therefore, the development will leverage these existing implementations to add spatial objects to Python.

B. Adding Parallel Execution Capability to Python Python interpreter uses GIL (Global Interpreter Lock) for

internal bookkeeping, which prevents running python byte-code in parallel. So extension modules are needed to overcome the limitation. There are already some modules to address this issue. For example, Parallel Python (PP) [17] is a python module that provides mechanism for parallel execution of python code on SMP (systems with multiple processors or cores) and clusters. It is a powerful and open source module written in pure Python. This module makes it simple to execute python code in parallel on SMP and clusters and improves the computational efficiency significantly. Experiments on both SMP and clusters environments demonstrate the improvement. Fig.1 shows the time it takes to execute eight tasks respectively with 1 core, 2 cores and 4 cores. It can be observed that the total time (“Time elapsed since server creation”) decreased significantly with increased cores. Then, three computers (one with 4 cores, two with 2 cores) are used to form a cluster environment. Fig. 2 shows that it takes almost only half the time of one computers (4 cores), when using 3 computers with 8 cores totally.

(a)

(b)

(c)

Fig. 1. Parallel execution of python code on SMP with PP

(a)

(b)

Page 4: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - GIScript:

(c)

Fig. 2. Parallel execution of python code on clusters with PP

With PP, computational resources are auto-discovered and parallel execution becomes easy. This module can be used to add parallel execution capability of geocomputation to Python language.

C. Interoperability Scripting languages have their intrinsic nature for easy

integration of existing applications. The simple syntax and less coding effort allows geospatial scripting language to be acceptable to programmers. With Python, existing geospatial software tools could be glued together to work collaboratively.

Based on the investigation, the python version of GIScript will be implemented first. The JPype, Parallel Python (PP), and some other python modules are used to add the basic features to GIScript.

V. CONCLUSION The paper proposes the concept of GIScript. Key

considerations and possible implementation solutions are suggested. The primary features for GIScript is to have spatial objects, parallel execution, and excellent interoperability support. Using GIScript, the development of geospatial applications by integrating existing components or tools will be simpler, faster, and more efficient. The solution also shows better extensibility. After the implementation of GIScript, it will be used to extend the functionalities of GeoJModelBuilder [18] – a geospatial-oriented and open source model builder written in pure Java. Currently, the workflow generated by GeoJModelBuilder will be executed by invoking OGC-Standard services such as Web Processing Service (WPS) one by one. When using GIScript, the workflow can be exported to GIScript script, then the script could be executed with various open source packages, such as GRASS GIS which has provided python scripting library.

In the suggested solution, Parallel Python (PP) provides mechanism for parallel execution, which enriches GIScript with the ability to compute in parallel. However, there are different paradigms available for parallel computing: Open Multi-Processing (OpenMP) [19], Message Passing Interface (MPI) [20], MapReduce [21] and GPU-based approaches [22], which can take advantage of the multi-core hardware, graphics

cards, and clusters. Future work will also investigate how to support these paradigms in GIScript.

Python and most of its extension modules are open source, which makes customization easily. As suggested in the possible solution, JPype, Parallel Python, and some other extension modules will integrate together to build a GIScript prototype. When leveraging them together, some modules may need to be modified or rewritten. The implementation will be open source and is expected to be maintained and used by the user community.

ACKNOWLEDGMENT This research is supported by National Basic Research

Program of China (2011CB707105), National Natural Science Foundation of China (41271397), Program for New Century Excellent Talents in University (NCET-13-0435), and the Fundamental Research Funds for the Central Universities.

REFERENCES [1] M. Neteler, M.H. Bowman, M. Landa, M. Metz, “GRASS GIS: A multi-

purpose open source GIS,” Environmental Modelling & Software, pp. 124–130, 2012

[2] QGIS home page, http://www.qgis.org/en/site/ (accessed July 3, 2014) [3] G.B. Hall, M.G. Leahy (Eds.), Open source approaches in spatial data

handling, Springer, Berlin , pp. 153–169, 2008 [4] GeoTools home page, http://www.geotools.org/ (accessed July 3, 2014) [5] ESRI, ARC Macro Language, ESRI Press, pp. 828, 1995 [6] A. Grama, G. Karypis, V. Kumar and A. Gupta, Introduction to Parallel

Computing,second ed, Addison Wesley, Jan. 2003. [7] GeoScript home page, http://geoscript.org/ (accessed July 3, 2014) [8] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, W. Lorensen,

Object-Oriented Modelling and Design, 1991. [9] Open GIS Consortium, Inc. The OpenGIS abstract specification, topic 1:

Feature geometry, OGC, 1999 [10] C. Strobl, Dimensionality extended nine–intersection Model (DE–9IM),

In: Encyclopedia of GIS Springer. Springer, Berlin, pp 240–245, 2008. [11] MF. Sanner, “Python: a programming language for software integration

and development,” J. Mol. Graph. Mod. 17: 57–61, 1999 [12] Python home page, https://www.python.org/ (accessed July 3, 2014) [13] Vivid Solutions, Java Topology Suite,

http://www.vividsolutions.com/jts/JTSHome.htm (accessed July 3, 2014).

[14] Jython home page, the Jython Project, http://www.jython.org/ (accessed July 3, 2014)

[15] S. Menard, 2009, JPype home page. http://jpype.sourceforge.net/. (accessed July 3, 2014)

[16] Py4j home page, http://py4j.sourceforge.net/ (accessed July 3, 2014) [17] Parallel Python home page, http://www.parallelpython.com/ (accessed

July 3, 2014) [18] M. Zhang, P. Yue, “GeoJModelBuilder: A java implementation of

model-driven approach for geoprocessing workflows,” In Proceedings of Agro-Geoinformatics 2013 IEEE, 393-397.

[19] R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, and J. McDonald, “Parallel Programming in OpenMP,” Morgan Kaufmann, 2000.

[20] P. Pacheco, “Parallel Programming with MPI,” Morgan Kaufmann, 1996.

Page 5: [IEEE 2014 Third International Conference on Agro-Geoinformatics - Beijing, China (2014.8.11-2014.8.14)] 2014 The Third International Conference on Agro-Geoinformatics - GIScript:

[21] J. Dean , S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, 2004, San Francisco, CA

[22] A. Sheppard , “Programming GPUs: Unleash your Inner Supercomputer,” O’Reilly Media, 2013.