searching all the web’s spatial data · searching all the web’s spatial data...
TRANSCRIPT
![Page 2: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/2.jpg)
Thanks
• CGA
• Ben Lewis
• Dave Strohschein
![Page 3: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/3.jpg)
Layer Level Search Of Spatial Resources
![Page 4: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/4.jpg)
A Spatial Resource<kml xmlns=“http://www.opengis.net/kml/2.2"> ! <Document> <Placemark> ! <name>Harvard</name> <description>You Are Here</description> <Point> <coordinates>-71.1169,42.3774,0</coordinates> </Point> ! </Placemark> </Document> !</kml>
![Page 5: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/5.jpg)
Not A Spatial Resource
![Page 6: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/6.jpg)
Web Services
• Individual Layer Level Search
• OGC - Get Capabilities, WMS
• ESRI Rest
![Page 7: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/7.jpg)
Anchor Link Signatures
• Anchor Links To Spatial Resources
• ?request=GetCapabilities
• /ArcGIS/rest/service
• *.kml and *.kmz
• */shape/*.zip
![Page 8: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/8.jpg)
Not JavaScript Code<script>
L.esri.tiledMapLayer(
"http://basemap.nationalmap.gov/ArcGIS/rest/services/USGSTopo/MapServer",
{opacity: 0.50, zIndex:2}).addTo(map);
</script>
![Page 9: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/9.jpg)
Not HTML Tags<body>
Please use my base layer:
<blink>
http://basemap.nationalmap.gov/ArcGIS/rest/services/USGSTopo/MapServer
</blink>!
</body>
![Page 10: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/10.jpg)
Google Advanced Search
• What Will A Crawl Discover?
• allinanchor:, allinurl:, filetype:
• Follow Terms Of Service
![Page 11: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/11.jpg)
Limited Crawl
• Crawl A Couple Sites
• JCrawler: Provide Two Functions
• Follow This Link?
• Process Page
• Run On Localhost, Obey robots.txt
![Page 12: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/12.jpg)
Find ALL Spatial Resources
• Not With A Cluster Running Nutch
• Too Hard!
![Page 13: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/13.jpg)
![Page 14: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/14.jpg)
CommonCrawl.org
• Monthly Crawl, 2-3 Billion Web Pages
• 55,000 WARC Files On Amazon East
• Hadoop Sample Code
• Add jsoup And Several Hundred Lines Of Code
![Page 15: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/15.jpg)
CommonCrawl Blog
![Page 16: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/16.jpg)
Easier Hadoop
![Page 17: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/17.jpg)
One Complete “Crawl”
• 25 Slaves, 3 Full Days
• $1400
• R3.XLarge - Lots Of Memory
![Page 18: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/18.jpg)
Sample Crawl Outputhttp://www.ga.gov.au/gis/services/earth_science/GA_Surface_Geology_of_Australia/MapServer/WMSServer?request=GetCapabilities&service=WMS 306 !http://maps.ngdc.noaa.gov/soap/web_mercator/dem_extents/MapServer/WMSServer?request=GetCapabilities%26service=WMS 169 !http://www.ga.gov.au/gis/services/earth_science/Geoscience_Australia_Seismic_Surveys/MapServer/WMSServer?request=GetCapabilities&service=WMS 144 !http://gis.ngdc.noaa.gov/arcgis/services/dem_hillshades/ImageServer/WMSServer?request=GetCapabilities%26service=WMS 132 !http://www.ga.gov.au/gis/services/topography/Australian_Topography/MapServer/WMSServer?request=GetCapabilities&service=WMS 108 !http://www.ga.gov.au/gis/services/earth_science/Crustal_Elements_of_Australia/MapServer/WMSServer?request=GetCapabilities&service=WMS 108
![Page 19: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/19.jpg)
Sample Crawl Outputhttp://www.ga.gov.au/data-pubs/web-services/replacement-services-for-the-national-geoscience-datasets-wms|||http://www.ga.gov.au/gis/services/earth_science/GA_Surface_Geology_of_Australia/MapServer/WMSServer?request=GetCapabilities&service=WMS -306 !http://www.ga.gov.au/data-pubs/web-services/replacement-services-for-the-national-geoscience-datasets-wms|||http://www.ga.gov.au/gis/services/earth_science/Geoscience_Australia_Seismic_Surveys/MapServer/WMSServer?request=GetCapabilities&service=WMS -144 !http://www.ga.gov.au/data-pubs/web-services/replacement-services-for-the-national-geoscience-datasets-wms|||http://www.ga.gov.au/gis/services/earth_science/Crustal_Elements_of_Australia/MapServer/WMSServer?request=GetCapabilities&service=WMS -108 !http://www.ga.gov.au/data-pubs/web-services/replacement-services-for-the-national-geoscience-datasets-wms|||http://www.ga.gov.au/gis/services/topography/Australian_Topography/MapServer/WMSServer?request=GetCapabilities&service=WMS -108 !http://www.ga.gov.au/data-pubs/web-services/replacement-services-for-the-national-geoscience-datasets-wms|||http://www.ga.gov.au/gis/services/earth_science/Geoscience_Australia_Airborne_Geophysics/MapServer/WMSServer?request=GetCapabilities&service=WMS -90
![Page 20: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/20.jpg)
Key / Value Pairs• String / Integer Pairs
• Value > 0
• URL To Resource / Frequency Count
• Value < 0
• URL To Resource + “|||” + Page Found On
• Use Unix Commands To Split, Sort File
![Page 21: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/21.jpg)
Harvester
• Input: List Of Spatial Resources
• Processing:
• Obtain Metadata On Each Layer
• Periodically Re-visit
• Output: Solr Records, Report
![Page 22: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/22.jpg)
Layer Level Search Of Spatial Resources
![Page 23: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/23.jpg)
Search
• People Expect Good Results
• Always Too Many Results For Human Review
• Ranking / Scoring Results Is Key
![Page 24: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/24.jpg)
Some Layers Not Relevant
![Page 25: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/25.jpg)
Layer Within Map
![Page 26: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/26.jpg)
Similar Center
![Page 27: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/27.jpg)
Similar Area
![Page 28: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/28.jpg)
Spatial Solr
• Old Style: Floats
• New Style: Rectangle, Polygons
![Page 29: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/29.jpg)
Solr Schema
• Define Fields To Support Search
• Pre-compute Intermediate Result
• Data Type = Search Options
• Or Schema-less
![Page 30: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/30.jpg)
Solr Schema
MinX, MaxX, CenterX
MinY, MaxY, CenterY
HalfWidth
HalfHeight
Area
tdouble Field Types
![Page 31: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/31.jpg)
Old Style Solr Queryhttp://geodata.tufts.edu/solr/select?q=_val_:%22product(10.0,map(sum(map(MinX,-71.143160023987,-71.096038976013,1,0),map(MaxX,-71.143160023987,-71.096038976013,1,0),map(MinY,42.385170824958,42.428266055761,1,0),map(MaxY,42.385170824958,42.428266055761,1,0)),4,4,1,0)))%22_val_:%22product(15.0,recip(sum(abs(sub(Area,0.002030692438118123)),.01),1,1000,1000))%22_val_:%22product(3.0,recip(abs(sub(product(sum(MaxX,MinX),.5),-71.11959949999999)),1,1000,1000))%22_val_:%22product(3.0,recip(abs(sub(product(sum(MaxY,MinY),.5),42.4067184403595)),1,1000,1000))%22+AND+%28LayerDisplayName:water^3+OR+ThemeKeywords:water^2+OR+PlaceKeywords:water^2%29+AND+%28ThemeKeywords:geoscientificinformation^4%29&&fq={!frange+l%3D1+u%3D10}product(2.0,map(sum(map(sub(abs(sub(-71.11959949999999,CenterX)),sum(0.023560523986994042,HalfWidth)),0,400000,1,0),map(sub(abs(sub(42.4067184403595,CenterY)),sum(0.021547615401498632,HalfHeight)),0,400000,1,0)),0,0,1,0))&wt=json&fl=Name,CollectionId,Institution,Access,DataType,Availability,LayerDisplayName,Publisher,GeoReferenced,Originator,Location,MinX,MaxX,MinY,MaxY,ContentDate,LayerId,score,WorkspaceName,SrsProjectionCode&rows=27&start=0&sort=score+desc&fq=ContentDate:[1950-01-01T01:01:01Z+TO+2012-01-01T01:01:01Z]&fq=DataType%3APoint&fq=Institution%3ATufts+OR+Institution%3AHarvard&fq=Institution:Tufts+OR+Access:Public&json.wrf=jQuery16408675794449108286_1331937717696&_=1331941365233
![Page 32: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/32.jpg)
New Style Spatial
• A Lat-Lon rectangle: minX minY maxX maxY
• <field name="geo">-74.093 41.042 -69.347 44.558</field>
• Units: Degrees
• Distance Calc: Haversine or Euclidean, etc.
![Page 33: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/33.jpg)
Spatial Functions
• fq=geo:"Intersects(-74.093 41.042 -69.347 44.558)"
• fq=geo:”IsWithin(POLYGON((-10 30, -40 40, -10 -20, 40 20, 0 0, -10 30))) distErrPct=0”
• HeatMaps: Coming Soon
![Page 34: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/34.jpg)
Also Search By
Date
Keywords
DataType
Institution
Solr Filter Clause
![Page 35: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/35.jpg)
Future Is Browser Centric
• Client-Side Rendering
• Canvas, GPU, Actual Data
• Client-Side Analysis
• GPU, BYOD
• Apps With Phone Gap
![Page 37: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/37.jpg)
Web Mapping Terms
• Map Servers
• ESRI Rest, OGC / GetCapibilities
• Convert Spatial Data To Map Tiles
![Page 38: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/38.jpg)
Separating Axis
![Page 39: Searching All The Web’s Spatial Data · Searching All The Web’s Spatial Data StephenMcDonald@cga.harvard.edu January 21, 2015](https://reader033.vdocuments.site/reader033/viewer/2022042802/5f3b26cfba2a9f395929d593/html5/thumbnails/39.jpg)
Diff CenterXs > Sum Half Widths
Half Width
Center X