electric transmission and distribution infrastructure
TRANSCRIPT
Electric Transmission and Distribution Infrastructure Imagery DatasetVarun Nair ● Tamasha Pathirathna ● Xiaolan You ● Qiwei Han
Dr. Kyle Bradbury
This dataset contains fully annotated electric transmission and
distribution infrastructure for approximately 321 km2 of high
resolution satellite and aerial imagery, spanning 14 cities and 6
countries across 5 continents.
This dataset was designed for training machine learning algorithms to
automatically identify electricity infrastructure in satellite imagery; for
those working on identifying the best pathways to electrification in low
and middle income countries, and for researchers investigating domain
adaptation for computer vision.
Dataset Overview
Tauranga New
Zealand
Groundview
TucsonUSA
ShanghaiChina
ShanghaiChina
(labeled)
Substation Transmission Distribution
This dataset covers 321 km2 of high resolution satellite and aerial imagery with 5 types of labelled electricity infrastructure:
(1) Substation(2) Transmission tower (3) Transmission line(4) Distribution tower(5) Distribution line
See figure to the right for examples. Each image comes with two label files and two mask files.
Labels are available in the following formats:(1) GeoJSON: for visualization of annotations in
sources such as ArcGIS and Google Earth (see GeoJSON File Structure slide)
(2) CSV: for easy conversion into multiclass mask input to machine learning models. The CSV to mask conversion code is included in the dataset, and default masks have been generated
Masks are available in the following formats (see Mask File Structure slide for more details).
(1) Tif(2) Npz
Zipped Folders Contents
Brazil_Rio.zip
Each folder listed to the left contains a number of satellite or aerial images and associated annotations (polygons and lines) and labels (image masks) for the transmission and distribution lines within the images. To make the dataset as easy to use as possible, there are a number of formats provided for the labels to integrate into your existing research pipeline:
Imagery: Satellite or aerial image (the number of images vary by city)(1) Country_City_#.tif
Annotations: Electric Transmission/Distribution Annotations Geo-coordinates of each annotation and metadata (see GeoJson File Structure slide) and pixel coordinates for mask generation. These are provided in multiple formats for ease of use(2) Country_City_#.geojson(3) Country_City_#.csv
Labels: Multiclass image mask. These are the same size as the image with a label at each pixel value to indicate the type of infrastructure that is present. An .npz version is provided for Python Numpy users, and .tif version for all others.(4) Country_City_#.npz(5) Country_City_#_multiclass.tif
China_Shanghai.zip
Mexico_Matamoros.zip
New_Zealand_Dunedin.zip
New_Zealand_Tauranga.zip
New_Zealand_Rotorua.zip
New_Zealand_Gisborne.zip
New_Zealand_Palmerston.zip
Sudan_Khartoum.zip
USA_AZ_Tuscon.zip
USA_CT_Hartford.zip
USA_KS_Colwich_Maize.zip
USA_NC_Clyde.zip
USA_NC_Wilmington.zip
Sample Data.zipTwo sample images, annotations, and label files from USA_CT_Hartford and Sudan_Khartoum
Documentation.pdfThis document—explains how annotations were generated and the files available for download
File Structure
Raw Imagery Annotated Imagery
Substation
Transmission Network
Distribution Network
This is sample imagery taken from Clyde, NC, USA. On the left is the raw imagery, on the right is the annotation superimposed on the raw imagery.
Annotated Imagery Multi-Class Mask
Given the annotated imagery taken from Clyde, NC, USA on the left, our dataset provides the code necessary to generate a multi-class mask (seen on the right) for use in machine learning algorithms.
Source Dataset Country City Area
(sq. km) # Images Width Height Resolution (meter)
UCONN USA Hartford, CT 56.3 25 10000 10000 0.15†
USGS (Orthoimagery)
USA
Clyde, NC 18.6 8 10000 10000 0.15†
Wilmington, NC 27.9 12 10000 10000 0.15†
Tucson, AZ 39.4 12 11856* 11936* 0.15†
Colwich KS & Maize, KS
34.8 15 10000 10000 0.15†
Mexico Matamoros, Mexico 34.0 10 11571* 12647* 0.15†
LINZ(Orthoimagery)
New Zealand
Tauranga 2.8 8 3840 5760 0.125
Rotorua 2.8 8 3840 5760 0.125
Gisborne 6.6 6 6000 11800 0.125
Palmerston North 6.2 18 3840 5760 0.125
Dunedin 2.1 6 3840 5760 0.125
SpaceNet(WorldView 3 satellite)
Sudan Khartoum 26.0 171 1300 1300 0.30
China Shanghai 29.8 196 1300 1300 0.30
Brazil Rio de Janeiro 33.8 15 5000 5000 0.30
Total 321.0 511
* width and height may vary for images in some cities† 0.5 feet = 0.1524m ~ 0.15m
Key dataCharacteristicsThis table details some key
characteristics of the imagery
present in our dataset, including
the original locations of the
imagery. The images vary in their
width, height, and resolution,
however, are generally square
images* and are of a resolution of
at least 0.30 meters.
GeoJSON File Structure
Geometry Description
type Geometry of annotation - ex. “Polygon”
coordinates Geo-coordinates of annotation (WGS84 projection)
Properties Description
label Energy infrastructure label for given annotation - ex. “DT”
filename Filename of the .tif image associated with all annotations in GeoJSON
country Image country origin
city Image city origin
image_geocoordinates_upper_left Upper left geo-coordinates of associated .tif image
image_geocoordinates_lower_left* Lower left geo-coordinates of associated .tif image
image_geocoordinates_upper_right* Upper left geo-coordinates of associated .tif image
image_geocoordinates_lower_right* Lower right geo-coordinates of associated .tif image
pixel_coordinates Pixel coordinate of annotation relative to .tif image
projection Image geo-coordinate projection system
Country_City_#.geojson
* coordinates are interpolated using scipy LinearNDInterpolator (see pixToGeo.py on GitHub)
GeoJSON files can be used to visualize annotations using sources such as ArcGIS or Google Earth. The properties of the file can be used to sort and refine annotations to needed specifications.
CSV File StructureCountry_City_#.csv
Field Description
Label Energy infrastructure label - ex. “DL”. This label identifies what type of infrastructure the object represents. Annotation types include: Distribution tower (DT), Distribution line(DL), Transmission tower (TT), Transmission line (TL), Other tower (OT), Other line(OL), and Substation (SS).
Object Unique ID for each annotation
Type Geometry category for each annotation. This has 3 possible values: point, line, or polygon.
X X pixel coordinate of vertex
Y Y pixel coordinate of vertex
Height Pixel height of associated image
Width Pixel width of associated image
CSV files are used as input to generate multiclass masks for each annotated image (See Mask File Structure for more details). Each vertex of each annotation is an individual input row within the CSV file. The “Object” label of the input is the same for each vertex comprising the same annotation.
Mask Files StructureCountry_City_#.tif
Country_City_#.npzThis dataset included multiclass image masks in .tif and .npz formats.
The complete set of class labels are included to the right. Since
overlap is possible (and common), we provide the default ordering
priority is shown to the right, with lines appearing on top and
substation below. If a transmission line is co-located on a pixel with a
substation, the transmission line pixels will be labeled as
transmission lines, overriding the co-occurring substation.
Masks were generated using python script multiclassmask.py
available on our Github (link on main page), using the CSV file labels
as input. The class labels and ordering can be modified in
multiclassmask.py, to, for example, make this into a multitask
learning problem by generating binary masks for each class. One can
modify multiclassmask.py to generate other file formats .
Infrastructure type Mask valueDistribution tower (DT) 1Distribution line (DL) 2Transmission tower (TT) 3Transmission line (TL) 4Other tower (OT) 5Other line (OL) 6Substation (SS) 7
Ordering priorityTL > DL > OL > TT > DT > OT > DD
Mask labels and ordering
Source Dataset Country City
Resolution(meter)
Rural/Urban Terrain
CT ECO(Orthoimagery) USA Hartford, CT 0.15 Suburban Built
environment
USGS (Orthoimagery)
USA
Clyde, NC 0.15 Rural Mountainous
Wilmington, NC 0.15 Suburban Coastal
Tucson, AZ 0.30 Urban Desert
Colwich KS & Maize, KS 0.15 Rural Plains
Mexico Matamoros, Mexico 0.15 Urban Desert
LINZ(Orthoimagery)
New Zealand
Tauranga 0.125 Urban Forested Plains
Rotorua 0.125 Suburban Forested Plains
Gisborne 0.125 Urban Coastal
Palmerston North 0.125 Suburban Plains
Dunedin 0.125 Rural Plains
SpaceNet(WorldView 3 satellite)
Sudan Khartoum 0.30 Urban Desert
China Shanghai 0.30 UrbanBuilt environment
Brazil Rio de Janeiro 0.30 Urban Plains
Diversity of ImageryOur dataset contains a diverse set of imagery from
14 cities around the world. The dataset is designed to
be diverse in three respects: (1) human settlement
density (i.e. rural vs urban), (2) terrain type, and (3)
development index. This table illustrates the specific
diversity in imagery present in our dataset, while
these charts show the overall distribution of the data
in these categories.
Human Settlement Density
Terrain Type
Development Index
Country Distribution Towers
Transmission Towers
Distribution Lines(km)
Transmission Lines(km)
Substations
USA 5365 1995 188.21 182.16 17
Sudan 5300 35 201.80 7.91 1
New Zealand 2049 199 104.17 37.13 18
Mexico 3338 0 82.92 0 2
China 1034 307 62.09 59.22 5
Brazil 63 55 6.83 15.16 2
Annotation StatisticsThis dataset was designed to be geographically diverse. The
table below summarize the annotations by country. The figures
to the right display the density per sq km of each feature by
country. Note: these figures are not representative of each country’s true energy infrastructure density - only representative of our annotations in each country’s respective images.
Transmissiondensity
Distributiondensity
Annotation Methodology
Infrastructure Abbreviation Annotation Type
Transmission line TL line
Distribution line DL line
Transmission Tower TT polygon
Distribution Tower DT polygon
Substation SS polygon
Other Tower OT polygon
Other Line OL line
Transmission towers, substations and distribution towers are labeled with polygons while transmission and distribution lines are
labeled with lines. Consistency was ensured throughout the dataset by adopting consistent labeling conventions wherever possible,
include using standardized labels (listed below) within the annotation tool and enclosing objects in polygons without including their
shadows. All team members consulted each other regarding any ambiguous labels to ensure consistency among the labels. The
labeling process depends on human judgements, and to minimize human error, all data curators involved in this process were trained
on identifying transmission and distribution lines in overhead imagery. Transmission and distribution infrastructure within images
were labeled using pyimannotate, a python based image labeling tool developed by Artem Streltsov of the Duke Energy Initiative
(https://github.com/astr93/pyimannotate). The GeoJSON files given in our dataset can be converted to JSON files that are viewable
using pyimannotate (script geoToPix.py on GitHub).
Use standardized labels Example annotation not including shadows
Dataset Sources LicensesLINZ - (Land Information New Zealand)https://data.linz.govt.nz/set/4702-nz-aerial-imagery/
CT-ECO - (Connecticut Department of Energy and Environmental Protection)
Capitol Region Council of Governments. (2016). 2016 Aerial imagery. http://cteco.uconn.edu/data/flight2016/index.htm
USGS - (United States Geological Survey)Source of imagery tagged as from USGS: U.S. Geological Survey. https://earthexplorer.usgs.gov/
SpaceNetSpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified April 30, 2018. https://spacenetchallenge.github.io/
Data ToolsGitHub repository and documentation for supplementary scripts that can be used to further
manipulate the data in this dataset: https://github.com/varunnair18/DataPlus2018
Creative Commons Attribution 4.0 International License
Public Domain
Public Domain
Creative Commons Attribution-ShareAlike 4.0 International License