![Page 1: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/1.jpg)
ANNOTATING OBJECT INSTANCES
WITH A POLYGON-RNN
Authors: Castrejon et.al.
(Dept of CS,
University of Toronto)
Presented by Mandar Pradhan
![Page 2: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/2.jpg)
OBJECTIVE OF THE PAPER
● To find how to annotate instances in an image as fast as possible
(AUTOMATIC ANNOTATION)
● To do the annotation as close to the ground truth as possible (POLYGON
FOR ANNOTATION)
● To allow a scope for human intervention to correct automated annotations
(AUTOMATIC SEMI-AUTOMATIC ANNOTATION)
![Page 3: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/3.jpg)
MOTIVATION BEHIND THE IDEA
● More Data == More annotation == Time consuming and lots of hard work!! (if
done by manual polygon annotation)
● Other automated methods (Images Tags, Bounding Boxes, Scribbles, Single
point objects) - not as accurate as supervised methods (but an easier way to
obtain ground truth)
● Need for human intervention to correct automated annotations to prevent
model from breaking down
![Page 4: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/4.jpg)
EARLIER RELATED WORKS
● Semi automatic annotations:○ Scribbles/ Multi Scribbles - segmentation using graph cut by combining
appearance cues and a smoothness term (Additional layer of training examples,
not accurate)
○ GrabCut - Annotation as 2D bounding boxes + per pixel labelling using EM
algorithm (Idea extended to 3D bounding boxes + point clouds )
Scribbles GrabCut
![Page 5: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/5.jpg)
EARLIER RELATED WORKS
● Semi automatic annotations:○ Scribbles/ Multi Scribbles - segmentation using graph cut by combining
appearance cues and a smoothness term (Additional layer of training examples,
not accurate)
○ GrabCut - Annotation as 2D bounding boxes + per pixel labelling using EM
algorithm (Idea extended to 3D bounding boxes + point clouds )
Drawbacks:
- Hard to incorporate shape priors
- Labellings with holes
- Hard to correct (Not ideal )
![Page 6: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/6.jpg)
EARLIER RELATED WORKS
● Semi automatic annotations:- Done at super pixel level
- May merge small objects or parts
![Page 7: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/7.jpg)
EARLIER RELATED WORKS
● Semi automatic annotations:- Done at super pixel level
- May merge small objects or parts
● Object instance segmentation (**USED IN THIS PAPER)- CNN used for box / patch for labelling
- Detect edges and link them to obtain coherent region
- Combine small polygons into object regions to label images
- HERE RNNS HAVE BEEN USED TO DIRECTLY PREDICT FINAL
POLYGONS
![Page 8: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/8.jpg)
Polygon - RNN (High level overview)
● Does automated annotation using CNN followed by RNN
● CNN extracts a Bounding Box output of the instance
● RNN Input : Image crop inside the Bounding Box + List of Vertices at time t-1,
t-2 + Initial Point (details in subsequent slide)
● RNN Output : “Polygon object” outlining the instance with a bounding box
(Polygons are list of 2-D vertices)
● Trained end to end
● CNN are fine tuned to object boundaries, RNNs encode the priors on objects
shapes
![Page 9: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/9.jpg)
Polygon - RNN (Some more details)
● “Polygon object” : List of vertices of bounding polygon
● Defining a specific polygon may involve multiple parameterizations. (We can
choose any vertex as starting point and then move on to the next points using
any orientation)
● Convention: Any starting point, Clockwise orientation
![Page 10: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/10.jpg)
Polygon - RNN (Some more details)
● “Polygon object” : List of vertices of bounding polygon
● Defining a specific polygon may involve multiple parameterizations. (We can
choose any vertex as starting point and then move on to the next points using
any orientation)
● Convention: Any starting point, Clockwise orientation
● Why are vertices from t-1 and t-2, both, fed into the RNN input???
○ Account for the orientation
● Why is initial point of polygon fed into RNN input ???
○ Decide when to close the polygon
![Page 11: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/11.jpg)
CNN Module - CNN + Skip connects
● Based on VGG16 architecture with fully connected layer and last max pooling
layer removed and replaced
● We stack all skip connects from the lower layers, after they pass through 3X3
convolutional layer + ReLU and upscaling them to 28 X 28
● Output is downsampled by a factor of 16
![Page 12: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/12.jpg)
CNN Module - CNN + Skip connects
● Based on VGG16 architecture with fully connected layer and last max pooling
layer removed and replaced
● We stack all skip connects from the lower layers, after they pass through 3X3
convolutional layer + ReLU and upscaling them to 28 X 28
● Output is downsampled by a factor of 16
● Why skip connects??? - Pull out low level features like edges and corners)
and semantics of the instance
● How to handle skip connections from multiple dimensions???
- Bilinear upsampling after additional convolution at the conv5
- 2X2 max-pooling before additional convolution at pool2
![Page 13: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/13.jpg)
RNN Module for vertex prediction
● Aim of RNN - Capture history(previous edges) and predict the future(next
edges/ polygon).
● Does coherent prediction for ambiguous cases (occlusion, shadows)
● Units : Convolutional LSTMS - they operate in 2D and preserve spatial info
from CNNs, reduce number of parameters to deal with
![Page 14: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/14.jpg)
The overall network architecture is presented in the diagram below
![Page 15: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/15.jpg)
RNN Module for vertex prediction
● 2 layer RNN with 16 channels and 3X3 kernels
● Representation of output vertex - D X D+1 matrix (one hot encoded)
● The DXD dimensions represent the possible 2D coordinates of the vertices
● The additional dimension is used to denote the end of sequence token
(polygon is complete)
● At the input, apart from the CNN representation of the image, we have the
one hot encoded forms of vertices at t-1 and t-2 along with initial vertex.
![Page 16: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/16.jpg)
RNN Module for vertex prediction
● Prediction of starting point
- Reuse the CNN architecture with 2 additional layers
- The first layer predicts object boundaries
- The second branch takes first branch as well as the image features as
inputs and gives the vertices
- Both the above stated problems are binary classification problems
![Page 17: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/17.jpg)
Training Details
● Loss - Cross Entropy
● Smoothening of target distribution (the D X D+1 grid is non binary)
- To prevent over-penalising the incorrect predictions.
- Assigning non zero probability to locations in distance of 2 from target in
grid
● Optimizer - Adam
● Batch size - 8
● Learning rate - 10-4 with decay by a factor of 10 every 10 epochs
● 𝜷1 = 0.9 , 𝜷2= 0.999 (Momentum constant)
● Use logistic regression
● Ground truth of object boundaries - edges of ground truth polygon
● Ground truth of vertex layer - vertices of the ground truth polygon
● GPU - NVIDIA TITAN-X
![Page 18: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/18.jpg)
Implementational details
● How to choose the best vertex at each time step of RNN?? - look for the one
with highest log-probs
● How does correction of vertex take place?? - Annotator feeds in the correct
annotation at the next time step
● Inference time - 250 ms
● Polygon Simplification
- Eliminate 3 vertices in same line and 2 vertices in same grid cases
● Data augmentation:
- Flip image crop and annotation at random
- Randomly increase context (10-20% of the bounding box)
- Randomly pick the starting vertex
![Page 19: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/19.jpg)
Results
● Datasets: KITTI, Cityscape
● Goals of the model :
- Polygon must be as accurate as possible
- Minimal number of clicks
● Yardsticks to gauge performance:
- Intersection over union measure
- No of vertex corrects needed to predict polygon
● Annotation of polygon done by inhouse detector, bounding box easy to obtain
using AMT
![Page 20: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/20.jpg)
Results : Cityscape
● What in this dataset ?? - 27 cities, 2950 train images, 500 valid, 1525 test
● Issue faced - Test set has no ground truth instances
● Solutions - 500 validation images are now test images
- The images from the Weimar and Zurich are the validation sets
● Labels - person, Car, Rider, truck, Bus, Train and Motorcycle
● Size of Instances - 28 -1792 pixel
![Page 21: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/21.jpg)
Results : Cityscape
● What in this dataset ?? - 27 cities, 2950 train images, 500 valid, 1525 test
● Issue faced - Test set has no ground truth instances
● Solutions - 500 validation images are now test images
- The images from the Weimar and Zurich are the validation sets
● Labels - person, Car, Rider, truck, Bus, Train and Motorcycle
● Size of Instances - 28 -1792 pixel
● Inbuilt instance segmentation is both in terms of pixel labelling as well as
polygons
● New Problem - Polygons in cityspace capture occlusion portion
● Solution - Depth ordering to remove the occluded part (we want only visible
part)
![Page 22: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/22.jpg)
Results : Cityscape
● What do we do about objects with multiple components due to occlusion???
● The authors have treated each component as a single object
● So what happens if the RNN keeps adding new vertices without reaching a
termination???
● The authors set a hard limit of 70 vertices for the RNN (GPU constraints)
![Page 23: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/23.jpg)
Results : Evaluation Metric
● Intersection of Union : Obtained prediction vs Ground Truth (Average over all
instances)
● How to evaluate the Human Action (Corrections of vertices)??? - simulate the
action of the annotators who correct the point each time predicted vertex
● Testing Gameplan : First do sanity check in PREDICTION mode (no
interaction of the annotators to correct). Then evaluate the amount of human
intervention needed
![Page 24: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/24.jpg)
Results : Baselines
● DeepMask : Uses CNN to output pixel labels, indifferent to class
● SharpMask : Improvise the DeepMask idea using upsampling of output to
obtain improved resolution
● Performance is reported based on ground truth boxes
● Network structure: 50 layer ResNet architecture trained on COCO dataset
● For DeepMask and SharpMask, the ResNet part is trained for 150 epochs
and the upsampling part of SharpMask is trained for 70 epochs
![Page 25: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/25.jpg)
Results : Baselines
● SquareBox: Object is mapped to a bounding box (of reduced dimensions).
Individual boxes for each component of the object
● Dilation10: Use segmentation dataset. Pixels are mapped to objects are
grouped as instance masks
![Page 26: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/26.jpg)
Results : Baselines
![Page 27: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/27.jpg)
Results : Baselines
● Verdict
- Baselines are hard to correct
- Better overall average and tops the charts in 6 / 8 categories
- Outperforming SharpMask in Car, Rider, Person classes by 12%, 6 %
and 7% respectively
- Why is the previous point worth noting - SharpMask uses ResNet
architecture which is much powerful vs VGG
![Page 28: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/28.jpg)
Results : Baselines
● Verdict
- Baselines are hard to correct
- Better overall average and tops the charts in 6 / 8 categories
- Outperforming SharpMask in Car, Rider, Person classes by 12%, 6 %
and 7% respectively
- Why is the previous point worth noting - SharpMask uses ResNet
architecture which is much powerful vs VGG
- Larger instances have advantage in larger objects like bus and train due
to better resolution
![Page 29: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/29.jpg)
Results : Annotators in the loop
● How is the quality of annotation and amount of human intervention
quantified??? - No. of mouses clicks needed to get different levels of
accuracy
● What do they mean by different “levels” of segmentation accuracy ??? -
chessboard metric of distance of the errors
● Also, show the resulting IoU to compare
![Page 30: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/30.jpg)
Results : Annotators in the loop
● How is the quality of annotation and amount of human intervention
quantified??? - No. of mouses clicks needed to get different levels of
accuracy
● What do they mean by different “levels” of segmentation accuracy ??? -
chessboard metric of distance of the errors
● Also, show the resulting IoU to compare
● Methodology in a nutshell
- In the first method, pick 10 images per annotator and ask them to
annotate freely without any cues or hint.
- In the second method, crop images and place blue markers on the
instances to be annotated (disambiguous)
![Page 31: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/31.jpg)
Results : Annotators in the loop
![Page 32: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/32.jpg)
Results : Annotators in the loop
● Verdict
- Human annotator IoU: 69.5% in free viewing method and 78.60% for
cropped images
- Indicates need to collect multiple annotations to reduce variations and
biases in the annotators
![Page 33: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/33.jpg)
Results : Annotators in the loop
● Comparison with GRABCUT:
- 54 randomly chosen instances
● Grabcut stats: 42.2s and 17.5 clicks per instance, 70.7% IoU
● Given model’s stats: 5-9.6 clicks per instance, 77.6% IoU
● Verdict - Given model is faster as it needs lesser clicks for comparable
inference time
![Page 34: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/34.jpg)
Results : Annotators in the loop
![Page 35: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/35.jpg)
Results : Annotators in the loop
![Page 36: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/36.jpg)
Results : Final Verdict
Advantages
● Polygon RNN provides plausible annotations with relatively less latency
● Performance is good on smaller objects. This fact is visible in performance
over the different instances of varying sizes within the same datasets (in
Cityscape) as well as in between 2 datasets (smaller objects in KITTI vs
larger objects in Cityscapes)
● Competes well with SharpMask which had ResNet based architecture
● Definitely reduces annotation cost for IoU comparable to human annotation
● Introduction of human intervention adds scope to avoid extremely bad
polygons
![Page 37: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/37.jpg)
Results : Final Verdict
Disadvantages
● Lower resolution and associated quantization error manifest in segmentation
of larger instances.
● Memory intensive - Polygons have more vertices to predict than a single
bounding box which may add latency in return for more accuracy.
● Cannot exploit Velodyne point clouds in KITTI dataset like other datasets
which puts it at a disadvantage
![Page 38: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/38.jpg)
Results : Final Verdict
Takeaways
● Tries to address issues of speed and accuracy of annotations
● The novelty of allowing human intervention allows it to not give very bad
performance
● Performance is good for smaller objects but lowers as complexity reduces
● Scope to work improving resolution and ability to exploit Velodyne point cloud
data to performance address issues in KITTI dataset
![Page 39: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/39.jpg)
OTHER REFERENCES
[1]D. Lin, J. Dai, J. Jia, K. He, and J. Sun. Scribblesup:Scribble-supervised
convolutional networks for semantic segmentation. In CVPR, 2016
[2]C. Rother, V. Kolmogorov, and A. Blake. Grabcut: Interactive foreground
extraction using iterated graph cuts. In SIGGRAPH, 2004.
![Page 40: ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNNcvrr.ucsd.edu/ece285sp18/files/mandar_ece285.pdf · ANNOTATING OBJECT INSTANCES WITH A POLYGON-RNN Authors: Castrejon et.al. (Dept of](https://reader036.vdocuments.site/reader036/viewer/2022081403/6086d1a8faa7d2492a3a6fd4/html5/thumbnails/40.jpg)
QUESTIONS??