we have data, now what?
DESCRIPTION
We have data, now what?. Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University [email protected]. WGISS-26, September 23, 2008. Understanding and Utilizing Data. - PowerPoint PPT PresentationTRANSCRIPT
Slide 1/19
We have data, now what?
Carol SongSenior Research Scientist
Rosen Center for Advanced ComputingPurdue University
WGISS-26, September 23, 2008
Slide 2/19
Understanding and Utilizing Data
• An integrated system for real-time NEXRAD II radar data delivery and 3D visualization, with multi-layer user interfaces to reach a wide audience.– Collaboration among computer scientists and
earth/atmospheric scientists– Team: V. Sundaram, L. Zhao, C.X. Song, B. Benes, P. Kristof, R.
Veeramacheneni, M. Huber.
• Demand-driven subscription system for real-time satellite data delivery– Purdue Terrestrial Observatory– Team: R. Kalyanam, L. Zhao, L. Biehl, C.X. Song
• Providing data through services!
Work supported by:National Science Foundation
Slide 3/19
Next Generation Radar (NEXRAD) Level II Data Weather Surveillance Radar (WSR-88D)
• This data contains a very fine temporal and spatial resolution of three attributes: reflectivity, Doppler radial velocity and spectrum width
• These attributes are vital to understanding, monitoring and predicting severe weather conditions
• There are 135 Radar Stations in the US
• Continuously received in near real-time, streaming
Doppler Radar Tower in Connecticut and
the Pulsed Doppler Radar inside
Acknowledgment: Figures are downloaded from websites
www.CCSU.edu and www.answers.com.
Slide 4/19
NEXRAD II Data Generation
• 3D structure in Radar Data– Continuous rotation over 360° in azimuth – Simultaneous increase in elevation by 1° to 3°per complete
sweep• Continuous NEXRAD Level II radar data stream
– Data files vary in size: a few MB to tens of MB each, depending on the weather conditions.
– Data compressed with a modified bzip2– The temporal resolution is 4-5 minutes in severe weather vs.
9-10 minutes in calm weather
Structure of Doppler Radar Data (Reflectivity )
Slide 5/19
NEXRAD II Data Distribution
• The National Climatic Data Center (NCDC) houses the data and provides a central clearinghouse of archived Level II data as a resource to the research, teaching, and technology development communities.
• Distributed through four top tier distributors• Purdue makes it available on the NSF TeraGrid• Opportunity!
– The near real-time availability of high-resolution radar data provides an exciting opportunity for meteorologists if the data can be accessed and visualized in 3D in a timely manner.
– Super res data becoming available as we speak
Slide 6/19
Technical Challenges
• Large volume and real-time streaming (50 MB/s) presents major computational and data management challenges.
• Super Res data: even larger data– SUPER RESOLUTION DATA INCREASE THE AZIMUTH RESOLUTION FROM 1 DEGREE TO 0.5 DEGREE. – THE REFLECTIVITY DATA RANGE RESOLUTION FROM 1 KM TO 0.25 KM...AND DOPPLER DATA RANGE
FROM 230 KM TO 300 KM FOR SPLIT CUTS...GENERALLY SCANS AT 1.5 DEGREES OR LOWER ELEVATION. – THE AMOUNT OF DATA COLLECTED AND TRANSMITTED DURING A VOLUME SCAN WILL INCREASE BY A
FACTOR OF APPROXIMATELY 2.3.
• Lack of scale: Analyzing data over a long period or large geographical region requires heavy computation
• Lack of interactive 3D visualizations– Despite the availability of 3D information in the new generation, the
data is most commonly visualized as 2D images, simple 3D Point clouds or iso-surfaces.
• Access Method: Download using FTP/HTTP and no programmatic access
• Data Format: compressed (modified bzip2) but not supported by popular libraries (eg RSL)
Slide 7/19
NEXRAD data products
• Online data– original streamed data from NWS (compressed), searchable
from map and downloadable, most recent months.– Special event data (severe weather events)
• Data services– Uncompressed data (through data services)– Variable values (e.g., reflectivity, radial velocity)– Pre-generated 3D volumes
• Access methods– Data portal– THREDDS, OPeNDAP– Third party viewers (e.g., IDV, Java NEXRAD viewer)– Programming interfaces APIs (C++ library)– New: near real-time, interactive 3D visualization
Slide 8/19
An End-to-End Integrated System
• Three important components:– Data Management
• Download required files from SRB and uncompress using modified bzip2
– Data Processing• Read the radar files using RSL• Process the data from
multiple sites • Convert them into render-able
3D volumes
– Visualization/Data Rendering• Import the volumetric data
from the disk.• Create 3D textures and slices
and apply the texture-based volume-rendering techniques.
• Utilize transfer functions to render the data on GPU.
Slide 10/19
Scaling using Teragrid
• How to scale? Key Observations:– Spatial parallelism: between stations– Temporal parallelism: volumes generated for intervals are
indpendent– Data access can be parallel as well
• Two types of computation tasks– Processing per station per interval– Merging: combines 3D volumes from all sites and creates the
full 3D volume for each interval• Granularity of Parallelization
– Depends on the processing power available– Either fine grained (per site per interval ) or coarse grained (per
site )– Using Condor DAGMan to orchestrate jobs
Main Job
Processing Site 1
.......
Processing Site 2
Processing Site N
Merge
TeraGrid
Slide 11/19
Example
Images rendered at different timestamps using a dataset from scanning a 24-hour supercell storm on March 12, 2006, in the Midwest region of the United States.
Slide 12/19
Hurricane Ike reminant
• Hurricane Ike, data from 4 stations (3 in IL and 1 in IN) between 10-noon on Sept. 14, 2008
Slide 13/19
A Service Architecture
Slide 14/19
Services through multiple interfaces
• Expert use mode– Need to see details (large data, lots of processing), highly
interactive, ability to manipulate color mapping and other settings.
– With accelerated graphics hardware• Learning/casual use mode
– Simple interface, no learning curve– Does not require high degree of details
• Remote access mode– Through web browser– No special hardware– Need interactivity
• Application developers– Need API or web service interfaces to integrate with their
applications
Slide 15/19
Workload distribution & Scalability
• Web 2.0 gadget for the masses– Data preproposed, rendered, composed into animation on
server; animation (or sequence of images) sent over web• Desktop client for maximum interactivity and performance
– Data preprocessed offline and 3D data volumes cached on server
– 3D Graphics rendering on user’s computer (GPU enabled)• Web browser access for interactivity but slower display
– Data preproposed offline, 3D volumes cached and rendered into 3D graphics
– Images sent over the network – User accesses the interactive application through a VNC
based Java applet
Slide 16/19
Reach out to the masses
A LiveRadar3D Google gadget displaying 3D visualization of radar data, continuously updated with streaming data
Slide 17/19
The fully Interactive 3D visualization Client
Slide 18/19
3D Visualization of all stations
Slide 19/19
Summary
• Remote 3D visualization services delivered through multiple interfaces
• Application interface of data services for third party integration• An architecture that scales to different use scenarios• Parallel data pre-processing using the TeraGrid Condor
resources and partial volume caching which improve the response time and scalability of the system.
Continuing effort• User feedback• Scale – support multiple users simultaneously• Hierarchical 3D volume structure to support multi-scale
investigation
Slide 20/19
Thank you!
Publications, URLs available.Feel free to contact Carol
Slide 21/19
PRESTIGEPurdue Real-Time Satellite Information Gateway
• User Requirement– Receive continuous
data updates– Real-time or near-real-
time access– Custom-tailored data
configurations
• Current Systems– Impossible to generate
complete range of data products
– Have to route through the support staff
– Manual process which is time consuming and error-prone
Slide 22/19
Range of MODIS Data Products
• Level 1A (MOD01)• Vegetation Index (MOD09)• Geolocation (MOD03)• Aerosol (MOD04)• Water Vapor (MOD05)• Clouds (MOD06)• Atmospheric Profiles (MOD07)• Reflectance (MOD09)• Snow (MOD10)• Fire Detection (MOD14)• Ocean Color (MOD18)• Sea Surface Temperature
(MOD28)• Sea Ice (MOD29)• Cloud Mask (MOD35)• Also Multiday composites of
above
Note that each data set product may contain a few to many variables.
Slide 23/19
System Design
• User-driven publish/subscribe model
– Dynamic data generation
– User specifies, controls, and receives custom-tailored data
– Continuous data updates in near-real-time
– Multiple ways to access the data
Slide 24/19
Slide 25/19
Satellite Data Subscription
Slide 26/19
Data Subscription
• Web portal based user interface– Choice list based option selection – Options include – Satellite, Coverage area, Data product,
Projection type and Data format– Ability to select date range for subscription validity– User-driven product choice expansion– Individual user-based subscriptions
• User-initiated data production – Data products generated only when some user is
subscribed to the product– Data production automatically turned off when no active
subscription exists
Slide 27/19
Data Notification
• Push-based notifications– Near real-time delivery of new data notification through
email– Implemented by automatically invoking a web-service
from the processing cluster when new data is available– Subscription database used to query active subscriptions
• Data delivery mechanism– Data scp’ed from processing cluster to webserver-
accessible storage space– Thumbnail generated for images to provide a quick look
feature– Link to the webserver data location provided in the
notification email