electric transmission and distribution infrastructure

13
Electric Transmission and Distribution Infrastructure Imagery Dataset Varun Nair Tamasha Pathirathna Xiaolan You Qiwei Han Dr. Kyle Bradbury This dataset contains fully annotated electric transmission and distribution infrastructure for approximately 321 km 2 of high resolution satellite and aerial imagery, spanning 14 cities and 6 countries across 5 continents. This dataset was designed for training machine learning algorithms to automatically identify electricity infrastructure in satellite imagery; for those working on identifying the best pathways to electrification in low and middle income countries, and for researchers investigating domain adaptation for computer vision.

Upload: others

Post on 30-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Electric Transmission and Distribution Infrastructure

Electric Transmission and Distribution Infrastructure Imagery DatasetVarun Nair ● Tamasha Pathirathna ● Xiaolan You ● Qiwei Han

Dr. Kyle Bradbury

This dataset contains fully annotated electric transmission and

distribution infrastructure for approximately 321 km2 of high

resolution satellite and aerial imagery, spanning 14 cities and 6

countries across 5 continents.

This dataset was designed for training machine learning algorithms to

automatically identify electricity infrastructure in satellite imagery; for

those working on identifying the best pathways to electrification in low

and middle income countries, and for researchers investigating domain

adaptation for computer vision.

Page 2: Electric Transmission and Distribution Infrastructure

Dataset Overview

Tauranga New

Zealand

Groundview

TucsonUSA

ShanghaiChina

ShanghaiChina

(labeled)

Substation Transmission Distribution

This dataset covers 321 km2 of high resolution satellite and aerial imagery with 5 types of labelled electricity infrastructure:

(1) Substation(2) Transmission tower (3) Transmission line(4) Distribution tower(5) Distribution line

See figure to the right for examples. Each image comes with two label files and two mask files.

Labels are available in the following formats:(1) GeoJSON: for visualization of annotations in

sources such as ArcGIS and Google Earth (see GeoJSON File Structure slide)

(2) CSV: for easy conversion into multiclass mask input to machine learning models. The CSV to mask conversion code is included in the dataset, and default masks have been generated

Masks are available in the following formats (see Mask File Structure slide for more details).

(1) Tif(2) Npz

Page 3: Electric Transmission and Distribution Infrastructure

Zipped Folders Contents

Brazil_Rio.zip

Each folder listed to the left contains a number of satellite or aerial images and associated annotations (polygons and lines) and labels (image masks) for the transmission and distribution lines within the images. To make the dataset as easy to use as possible, there are a number of formats provided for the labels to integrate into your existing research pipeline:

Imagery: Satellite or aerial image (the number of images vary by city)(1) Country_City_#.tif

Annotations: Electric Transmission/Distribution Annotations Geo-coordinates of each annotation and metadata (see GeoJson File Structure slide) and pixel coordinates for mask generation. These are provided in multiple formats for ease of use(2) Country_City_#.geojson(3) Country_City_#.csv

Labels: Multiclass image mask. These are the same size as the image with a label at each pixel value to indicate the type of infrastructure that is present. An .npz version is provided for Python Numpy users, and .tif version for all others.(4) Country_City_#.npz(5) Country_City_#_multiclass.tif

China_Shanghai.zip

Mexico_Matamoros.zip

New_Zealand_Dunedin.zip

New_Zealand_Tauranga.zip

New_Zealand_Rotorua.zip

New_Zealand_Gisborne.zip

New_Zealand_Palmerston.zip

Sudan_Khartoum.zip

USA_AZ_Tuscon.zip

USA_CT_Hartford.zip

USA_KS_Colwich_Maize.zip

USA_NC_Clyde.zip

USA_NC_Wilmington.zip

Sample Data.zipTwo sample images, annotations, and label files from USA_CT_Hartford and Sudan_Khartoum

Documentation.pdfThis document—explains how annotations were generated and the files available for download

File Structure

Page 4: Electric Transmission and Distribution Infrastructure

Raw Imagery Annotated Imagery

Substation

Transmission Network

Distribution Network

This is sample imagery taken from Clyde, NC, USA. On the left is the raw imagery, on the right is the annotation superimposed on the raw imagery.

Page 5: Electric Transmission and Distribution Infrastructure

Annotated Imagery Multi-Class Mask

Given the annotated imagery taken from Clyde, NC, USA on the left, our dataset provides the code necessary to generate a multi-class mask (seen on the right) for use in machine learning algorithms.

Page 6: Electric Transmission and Distribution Infrastructure

Source Dataset Country City Area

(sq. km) # Images Width Height Resolution (meter)

UCONN USA Hartford, CT 56.3 25 10000 10000 0.15†

USGS (Orthoimagery)

USA

Clyde, NC 18.6 8 10000 10000 0.15†

Wilmington, NC 27.9 12 10000 10000 0.15†

Tucson, AZ 39.4 12 11856* 11936* 0.15†

Colwich KS & Maize, KS

34.8 15 10000 10000 0.15†

Mexico Matamoros, Mexico 34.0 10 11571* 12647* 0.15†

LINZ(Orthoimagery)

New Zealand

Tauranga 2.8 8 3840 5760 0.125

Rotorua 2.8 8 3840 5760 0.125

Gisborne 6.6 6 6000 11800 0.125

Palmerston North 6.2 18 3840 5760 0.125

Dunedin 2.1 6 3840 5760 0.125

SpaceNet(WorldView 3 satellite)

Sudan Khartoum 26.0 171 1300 1300 0.30

China Shanghai 29.8 196 1300 1300 0.30

Brazil Rio de Janeiro 33.8 15 5000 5000 0.30

Total 321.0 511

* width and height may vary for images in some cities† 0.5 feet = 0.1524m ~ 0.15m

Key dataCharacteristicsThis table details some key

characteristics of the imagery

present in our dataset, including

the original locations of the

imagery. The images vary in their

width, height, and resolution,

however, are generally square

images* and are of a resolution of

at least 0.30 meters.

Page 7: Electric Transmission and Distribution Infrastructure

GeoJSON File Structure

Geometry Description

type Geometry of annotation - ex. “Polygon”

coordinates Geo-coordinates of annotation (WGS84 projection)

Properties Description

label Energy infrastructure label for given annotation - ex. “DT”

filename Filename of the .tif image associated with all annotations in GeoJSON

country Image country origin

city Image city origin

image_geocoordinates_upper_left Upper left geo-coordinates of associated .tif image

image_geocoordinates_lower_left* Lower left geo-coordinates of associated .tif image

image_geocoordinates_upper_right* Upper left geo-coordinates of associated .tif image

image_geocoordinates_lower_right* Lower right geo-coordinates of associated .tif image

pixel_coordinates Pixel coordinate of annotation relative to .tif image

projection Image geo-coordinate projection system

Country_City_#.geojson

* coordinates are interpolated using scipy LinearNDInterpolator (see pixToGeo.py on GitHub)

GeoJSON files can be used to visualize annotations using sources such as ArcGIS or Google Earth. The properties of the file can be used to sort and refine annotations to needed specifications.

Page 8: Electric Transmission and Distribution Infrastructure

CSV File StructureCountry_City_#.csv

Field Description

Label Energy infrastructure label - ex. “DL”. This label identifies what type of infrastructure the object represents. Annotation types include: Distribution tower (DT), Distribution line(DL), Transmission tower (TT), Transmission line (TL), Other tower (OT), Other line(OL), and Substation (SS).

Object Unique ID for each annotation

Type Geometry category for each annotation. This has 3 possible values: point, line, or polygon.

X X pixel coordinate of vertex

Y Y pixel coordinate of vertex

Height Pixel height of associated image

Width Pixel width of associated image

CSV files are used as input to generate multiclass masks for each annotated image (See Mask File Structure for more details). Each vertex of each annotation is an individual input row within the CSV file. The “Object” label of the input is the same for each vertex comprising the same annotation.

Page 9: Electric Transmission and Distribution Infrastructure

Mask Files StructureCountry_City_#.tif

Country_City_#.npzThis dataset included multiclass image masks in .tif and .npz formats.

The complete set of class labels are included to the right. Since

overlap is possible (and common), we provide the default ordering

priority is shown to the right, with lines appearing on top and

substation below. If a transmission line is co-located on a pixel with a

substation, the transmission line pixels will be labeled as

transmission lines, overriding the co-occurring substation.

Masks were generated using python script multiclassmask.py

available on our Github (link on main page), using the CSV file labels

as input. The class labels and ordering can be modified in

multiclassmask.py, to, for example, make this into a multitask

learning problem by generating binary masks for each class. One can

modify multiclassmask.py to generate other file formats .

Infrastructure type Mask valueDistribution tower (DT) 1Distribution line (DL) 2Transmission tower (TT) 3Transmission line (TL) 4Other tower (OT) 5Other line (OL) 6Substation (SS) 7

Ordering priorityTL > DL > OL > TT > DT > OT > DD

Mask labels and ordering

Page 10: Electric Transmission and Distribution Infrastructure

Source Dataset Country City

Resolution(meter)

Rural/Urban Terrain

CT ECO(Orthoimagery) USA Hartford, CT 0.15 Suburban Built

environment

USGS (Orthoimagery)

USA

Clyde, NC 0.15 Rural Mountainous

Wilmington, NC 0.15 Suburban Coastal

Tucson, AZ 0.30 Urban Desert

Colwich KS & Maize, KS 0.15 Rural Plains

Mexico Matamoros, Mexico 0.15 Urban Desert

LINZ(Orthoimagery)

New Zealand

Tauranga 0.125 Urban Forested Plains

Rotorua 0.125 Suburban Forested Plains

Gisborne 0.125 Urban Coastal

Palmerston North 0.125 Suburban Plains

Dunedin 0.125 Rural Plains

SpaceNet(WorldView 3 satellite)

Sudan Khartoum 0.30 Urban Desert

China Shanghai 0.30 UrbanBuilt environment

Brazil Rio de Janeiro 0.30 Urban Plains

Diversity of ImageryOur dataset contains a diverse set of imagery from

14 cities around the world. The dataset is designed to

be diverse in three respects: (1) human settlement

density (i.e. rural vs urban), (2) terrain type, and (3)

development index. This table illustrates the specific

diversity in imagery present in our dataset, while

these charts show the overall distribution of the data

in these categories.

Human Settlement Density

Terrain Type

Development Index

Page 11: Electric Transmission and Distribution Infrastructure

Country Distribution Towers

Transmission Towers

Distribution Lines(km)

Transmission Lines(km)

Substations

USA 5365 1995 188.21 182.16 17

Sudan 5300 35 201.80 7.91 1

New Zealand 2049 199 104.17 37.13 18

Mexico 3338 0 82.92 0 2

China 1034 307 62.09 59.22 5

Brazil 63 55 6.83 15.16 2

Annotation StatisticsThis dataset was designed to be geographically diverse. The

table below summarize the annotations by country. The figures

to the right display the density per sq km of each feature by

country. Note: these figures are not representative of each country’s true energy infrastructure density - only representative of our annotations in each country’s respective images.

Transmissiondensity

Distributiondensity

Page 12: Electric Transmission and Distribution Infrastructure

Annotation Methodology

Infrastructure Abbreviation Annotation Type

Transmission line TL line

Distribution line DL line

Transmission Tower TT polygon

Distribution Tower DT polygon

Substation SS polygon

Other Tower OT polygon

Other Line OL line

Transmission towers, substations and distribution towers are labeled with polygons while transmission and distribution lines are

labeled with lines. Consistency was ensured throughout the dataset by adopting consistent labeling conventions wherever possible,

include using standardized labels (listed below) within the annotation tool and enclosing objects in polygons without including their

shadows. All team members consulted each other regarding any ambiguous labels to ensure consistency among the labels. The

labeling process depends on human judgements, and to minimize human error, all data curators involved in this process were trained

on identifying transmission and distribution lines in overhead imagery. Transmission and distribution infrastructure within images

were labeled using pyimannotate, a python based image labeling tool developed by Artem Streltsov of the Duke Energy Initiative

(https://github.com/astr93/pyimannotate). The GeoJSON files given in our dataset can be converted to JSON files that are viewable

using pyimannotate (script geoToPix.py on GitHub).

Use standardized labels Example annotation not including shadows

Page 13: Electric Transmission and Distribution Infrastructure

Dataset Sources LicensesLINZ - (Land Information New Zealand)https://data.linz.govt.nz/set/4702-nz-aerial-imagery/

CT-ECO - (Connecticut Department of Energy and Environmental Protection)

Capitol Region Council of Governments. (2016). 2016 Aerial imagery. http://cteco.uconn.edu/data/flight2016/index.htm

USGS - (United States Geological Survey)Source of imagery tagged as from USGS: U.S. Geological Survey. https://earthexplorer.usgs.gov/

SpaceNetSpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified April 30, 2018. https://spacenetchallenge.github.io/

Data ToolsGitHub repository and documentation for supplementary scripts that can be used to further

manipulate the data in this dataset: https://github.com/varunnair18/DataPlus2018

Creative Commons Attribution 4.0 International License

Public Domain

Public Domain

Creative Commons Attribution-ShareAlike 4.0 International License