d5.2 - report on integration, validation and user trials v1.0 › wp-content › uploads › sites...

Cloud LSVA Large Scale Video Analysis

EUROPEAN COMMISSION

DG Communications Networks, Content & Technology

Horizon 2020 Research and Innovation Programme Grant Agreement No 688099

D5.2 Report on integration, validation & user trials

Project funded by the European Union’s Horizon 2020 Research and Innovation Programme (2014 – 2020)

Deliverable no. D 5.2 Dissemination level Public Work Package no. WP 5 Main author(s) Jos den Ouden (TU/e), Marcos Nieto (VICOMTECH) Co-author(s) Kieran Flynn (IBM), Joachim Kreikemeier (VALEO), Sergio

Sanchez Carballido (VICOMTECH), Suzanne Little (DCU), Brenda Rousseau (TOMTOM), Panagiotis Meletis (TU/e)

Version Nr (F: final, D: draft)

F – v1.0

File Name D5.2 - Report on integration, validation and user trials v1.0.docx

Project Start Date and Duration

01 January 2016, 36 months

Ref. Ares(2018)6675195 - 30/12/2018

D 5.2

F – v1.0

2

Document Control Sheet

Main author(s) or editor(s): Jos den Ouden (TU/e), Marcos Nieto (VICOMTECH) Work area: WP 5 Document title: D5.2 Report on integration, validation & user trials

Version history:

Approval:

Name Date Prepared J. den Ouden (TU/e) 21-12-2018 Reviewed G. Dubbelman (TU/e) Marcos Nieto

(VICOM) 21-12-2018

Authorised Oihana Otaegui (VICOM) 21-12-2018

Circulation: Recipient Date of submission

EC 30-12-2018 Cloud-LSVA consortium 30-12-2018

Legal Disclaimer The information in this document is provided “as is”, and no guarantee or warranty is given that the information is fit for any particular purpose. The above referenced consortium members shall have no liability for damages of any kind including without limitation direct, special, indirect, or consequential damages that may result from the use of these materials subject to any liability which is mandatory due to applicable law. © 2016 by Cloud LSVA Consortium.

Version number

Date Main author Summary of changes

V0.1 05-12-2018 J. den Ouden Document set up - ToC

V0.2 14-12-2018 M. Nieto Adapted sections 2, 3 & 5

V0.3 19-12-2018 J. den Ouden, B. Rousseau

Added section 6

V0.4 20-12-2018 J. den Ouden, P. Meletis, K. Flynn

Adapted section 6, added section 4

V0.5 21-12-2018 J. den Ouden Ready for peer review

v0.6 21-12-2018 G. Dubbelman Peer reviewed and ready for submission

V1.0 23-12-2018 Marcos Nieto Final review.

D 5.2

F – v1.0

3

Abbreviations and Acronyms

Acronym Definition ADAS Advanced Driver Assistance Systems CPU Central Processing Unit FPS Frames Per Second GNSS Global Navigation Satellite System GPS Global Positioning System GPU Graphics Processing Unit HAD Highly Automated Driving iSCSI Internet Small Computer Systems Interface NAS Network Attached Storage OEM Original Equipment Manufacturer OS Operating System PC Personal Computer RAM Random-Access Memory ROI Region of Interest SAN Storage Area Network SLAM Simultaneous Localization and Mapping UI User Interface VCD Video Content Description WP Work Package

D 5.2

F – v1.0

4

Table of Contents

Abbreviations and Acronyms .............................................................................................................. 3 Table of Contents .............................................................................................................................. 4 List of Figures .................................................................................................................................... 6 List of Tables ..................................................................................................................................... 7 Executive Summary ........................................................................................................................... 8 1. Introduction ................................................................................................................................. 9

1.1 Purpose of Document ......................................................................................................... 9 1.2 Intended audience .............................................................................................................. 9

2. Cloud-LSVA platform integration ............................................................................................... 10 2.1 Integration process of final prototype Gamma ................................................................... 11

3. Cloud processing scalability ...................................................................................................... 14 3.1 Introduction ...................................................................................................................... 14 3.2 Analytics as a service ....................................................................................................... 15 3.3 Methodology ..................................................................................................................... 16

3.3.1 Analytics ....................................................................................................................... 16 3.3.2 Monitoring .................................................................................................................... 16 3.3.3 Retrieving logs .............................................................................................................. 17 3.3.4 Running concurrent batches ......................................................................................... 18 3.3.5 Managing compute resources for containers ................................................................. 18

3.4 Tests and discussion: low to medium scale video analysis ................................................ 19 3.4.1 Scaling high computing consuming jobs (CNN detector) ............................................... 19 3.4.2 Economic analysis ........................................................................................................ 21 3.4.3 Scaling high memory and computing consuming jobs (transcoding) .............................. 23

3.5 Tests and discussion: large scale video analysis ............................................................... 24 3.6 Conclusions ...................................................................................................................... 25

4. Cloud data management ........................................................................................................... 26 4.1 Data Storage Strategies ................................................................................................... 26

4.1.1 IBM Cloud Storage Options .......................................................................................... 26 4.1.2 IBM Softlayer NAS Storage Options .............................................................................. 26 4.1.3 IBM NAS Storage Option Chosen ................................................................................. 27 4.1.4 Benefits and Performance............................................................................................. 27

4.2 Data Transfer Strategies ................................................................................................... 27 4.2.1 Internal Data Transfer ................................................................................................... 27 4.2.2 External Data Transfer .................................................................................................. 27

5. Annotation tool - User acceptance tests .................................................................................... 29 5.1 Big Data challenge for ADAS use case ............................................................................. 29 5.2 Evaluation of pixel-wise annotation use case .................................................................... 29 5.3 Evaluation of 2D bounding box annotation use case ......................................................... 33 5.4 Evaluation of 3D annotation use case ............................................................................... 36

D 5.2

F – v1.0

5

6. Cartography use case - integration, requirements and validation. .............................................. 40 6.1 Big Data challenge for cartography ................................................................................... 40 6.2 Upload engine for SLAM mapping from crowd-sourced data ............................................. 40

6.2.1 Autostream map delivery platform ................................................................................. 42 6.2.2 Semantic SLAM ............................................................................................................ 43

7. Conclusion ................................................................................................................................ 49

D 5.2

F – v1.0

6

List of Figures

Figure 1: Cloud-LSVA prototype engines, components, and services. For a detailed overview, see Deliverable 5.5. ................................................................................................................................ 10 Figure 2: Integration of Cloud-LSVA prototypes: timeline and work lines........................................... 12 Figure 3: Proposed architecture of the Kubernetes orchestration of computer vision applications. .... 15 Figure 4: Grafana dashboard showing cluster performance (CPU Load and Memory used). ............. 17 Figure 5: Average CPU load per worker node for the (left) base cluster (4 worker nodes, 16 cores per node), and (right) horizontally scaled cluster (8 worker nodes, 16 cores per node)............................ 20 Figure 6: Load per worker node stacked to 100% in the (top) base cluster and (bottom) horizontally scaled cluster during the experiment. ............................................................................................... 20 Figure 7: Duration of each batch concurrency for the three compared clusters. ................................ 21 Figure 8: Average duration of jobs according to concurrency for each of the three compared clusters. ........................................................................................................................................................ 21 Figure 9: Cost (in €) per job execution considering the effective job time of each experiment and the cluster cost (€/second). .................................................................................................................... 22 Figure 10: Comparison of the scaled clusters and the base cluster in time and cost per job. ............. 22 Figure 11: Cost per job compared to effective job duration for the three compared clusters for a concurrency jobs/cpu=2. The line sets a threshold between performance and cost. .......................... 23 Figure 12: Batch duration against total number of jobs (concurrency) for the two scaled clusters. ..... 23 Figure 13: Load distribution by worker nodes during the execution of the process. ........................... 24 Figure 14: Pixel-wise annotation tool (see D3.7, section 6). .............................................................. 29 Figure 15: Intersection-over-Union (IoU) for non-experts (left) and expert (right) annotators for different classes. .............................................................................................................................. 30 Figure 16: IoU improvement (in percentage) obtained using the DNN-based pre-annotations. .......... 30 Figure 17: Annotation time (minutes) for non-experts and experts using only Manual annotation, and Semi-automatic annotation (with DNN pre-annotations).................................................................... 31 Figure 18: Non-expert annotator: example labelled image using the annotation tool with pre-annotation (top), and original ground truth (bottom). Average IoU = 0.76. ......................................... 32 Figure 19: Expert annotator: example labeled image using the annotation tool with pre-annotation (top), and original ground truth (bottom). Average IoU = 0.83. .......................................................... 32 Figure 20: (Top) Cloud-LSVA interface where participants of the 2nd Annotation Workshop could access to their annotation task. (Bottom) Web UI for 2D labelling. .................................................... 33 Figure 21: User feedback using the Web UI for the 2D annotation use case: (top) positive affirmations, (bottom) negative affirmations. ......................................................................................................... 35 Figure 22: Sample image of the Web UI annotation application: (left) lane annotation, (right) 3D object annotation. ....................................................................................................................................... 36 Figure 23: Autonomous driving map system. In green the TomTom Autostream system and in blue the Cloud-LSVA research extensions. .............................................................................................. 40 Figure 24. Multi-vehicle distributed SLAM. In this set-up the vehicles only house the front-end, which turn the raw video data in more compressed pose-chain data. This compressed data is send to the cloud where a map is constructed from all observations. .................................................................. 41 Figure 25: Color-coded output sample from the developed software module. Each color represents a (sub)-cluster of vehicle poses with similar position and heading angle, i.e. candidate pairs for loop closure. Colors are assigned randomly, so some distinct clusters have more or less the same color. 43 Figure 26. Hierarchy of classes. ....................................................................................................... 47 Figure 27. Semantic Segmentation. ................................................................................................. 47 Figure 28. Hierarchical Semantic Segmentation. .............................................................................. 47 Figure 29. Panoptic Segmentation. Different objects are delineated by a white line. ......................... 48

D 5.2

F – v1.0

7

List of Tables

Table 1: Metrics of the two different types of services on a single-node cluster. ................................ 18 Table 2: user test affirmations .......................................................................................................... 34 Table 3: Training datasets per model ............................................................................................... 45 Table 4: mIoU for Semantic Segmentation & Hierarchical Semantic Segmentation using CityScapes and Vistas datasets ......................................................................................................................... 45 Table 5: PQ & SQ metrics for Panoptic Segmentation using CityScapes and Vistas datasets ........... 46 Table 6: Number of classes & performance parameters for all 3 segmentation models ..................... 46

D 5.2

F – v1.0

8

Executive Summary

The aim of this project is to develop a software platform for efficient and collaborative semiautomatic labelling and exploitation of large-scale video data solving existing needs for ADAS and Digital Cartography industries.

Cloud-LSVA uses Big Data Technologies to address the open problem of a lack of software tools, and hardware platforms, to annotate petabyte scale video datasets, with the focus on the automotive industry. Annotations of road traffic objects, events and scenes are critical for training and testing computer vision techniques that are the heart of modern Advanced Driver Assistance Systems and Navigation systems. Providing this capability will establish a sustainable basis to drive forward automotive Big Data Technologies.

As part of Cloud-LSVA project, the objective of WP5 is to integrate, validate and test the prototypes that were generated during the lifetime of the project and execute user trials.

In the other three WP5 deliverables (D5.3, D5.4 & D5.5) we focussed specifically on the detailed description of the three developed prototypes (Alpha, Beta and the final Gamma prototype).

In this deliverable, we focus on:

- The overall overview of the integration of the final (Gamma) prototype.

- The validation for the semi-automated annotation of objects and events in data coming from car mounted sensors with relevance to ADAS modules and cartography.

- Performance of the Cloud-LSVA platform, with respect to data management and storage.

- The user acceptance tests executed with the annotation platform and the overall performance of the system by assessing the effectiveness in manual labour as well as in economic costs of the annotation task.

D 5.2

F – v1.0

9

1. Introduction

1.1 Purpose of Document In the Cloud-LSVA project, there are 4 deliverables directly related to the developed Cloud-LSVA system. Deliverables 5.3, 5.4 and 5.5 are the Cloud-LSVA prototypes (Alpha, Beta and Gamma respectively).

Deliverable 5.2 is the Report on integration, validation, and user trials. The purpose of all deliverable D5.2 is briefly described below.

Deliverable 5.2: Report on integration, validation, and user trials.

Due: month 36

Type: Report

In this report we detail the integration, validation, and user trials performed with the final Cloud-LSVA prototype (prototype Gamma). As this document is positioned in the final month of the project, its focus will mainly be on the validation and user trials of the prototype.

This document describes the integration of the final prototype as well as the validation for the semi-automated annotation of objects and events in data coming from car mounted sensors with relevance to ADAS modules and cartography.

Finally it describes the user acceptance tests executed with the annotation platform and the overall performance of the system by assessing the effectiveness in manual labour as well as in economic costs of the annotation task and the benefits for automatically updating maps.

In section 2 the integration process of the Gamma Prototype is described giving an overview of the high level components developed in Cloud-LSVA and the development choices made throughout the project.

In section 3 the cloud scalability with the performance tests and assessment on economic costs is described.

Section 4 describes the cloud data management, specifically the data storage and transfer.

Section 5 describes the user tests that have been executed on the developed annotation tools, describing the annotation tools that were developed during the project, its performance with users and economic costs.

Section 6 focusses on the work in the cartography use case with a description of the components and the validation of both the SLAM and segmentation components.

1.2 Intended audience The audience of this deliverable D5.2 is public and therefore serves the purpose of providing an overview of the achievements of the Cloud-LSVA project (specifically WP5) to the general public.

D 5.2

F – v1.0

10

2. Cloud-LSVA platform integration

This chapter describes the status of the Cloud-LSVA prototype integration. An overview of the final Cloud-LSVA prototype is shown in Figure 1.

Figure 1: Cloud-LSVA prototype engines, components, and services. For a detailed overview, see Deliverable 5.5.

In general terms, the Cloud-LSVA system is a cloud-based system that exposes a number of functionalities related to the annotation of large volumes of data coming from sensorised vehicles.

There are four main elements around the Cloud-LSVA system:

• Data and metadata: in the form of video/sensor information (data) recorded from equipped vehicles, and the outcome of the annotation process (metadata).

• Front-end: the (web) interface of the system to the human users of the platform, which exposes services and functionalities to perform actions (e.g. annotation videos, register recordings, etc.).

• Back-end: the core SW engines that provide the underlying functionality of the system (e.g. learning, deploying algorithms, storing data, formatting annotations, etc.).

• Cloud resources: the infrastructure that enables the functionalities, including storage resources (NAS system), and computing resources (e.g. GPU-enabled servers).

D 5.2

F – v1.0

11

• Test vehicles: the main source of data to Cloud-LSVA are the vehicles equipped with sensors that produce large volumes of data. These vehicles include cost-effective recording and processing units which make possible recording of multiple streams along with real-time pre-annotations.

There are two main users of Cloud-LSVA:

• Annotators: operators that access Cloud-LSVA to perform annotation tasks through a GUI, such as identifying objects in images, time lapse with recognised actions, etc.

• Engineer: the user of the Cloud-LSVA platform, which can register uploaded recordings into the system, define annotation tasks, monitor the status of the annotation tasks, etc.

Intentionally, the Cloud-LSVA will offer a common GUI through a web application, which will work as the front-end of the system. The implementation of this Web App Engine has been done using the Angular framework. The front-end provides the access to the different functionalities offered by the back-end.

The back-end of Cloud-LSVA is basically composed by the SW engines that provide the functionality of the system for both ADAS and Digital Cartography use cases, which relies on the HW systems where the Cloud-LSVA platform is deployed (including the storage of raw content from sensors, and the computing clusters where the SW is executed). The SW part is, therefore, composed of a number of modules, in the form of web applications (e.g. Web Application Archives) which define functions and REST interfaces for interoperability.

2.1 Integration process of final prototype Gamma

This section presents a brief overview on the integration process of the Gamma prototype, considering the different branches of activities related to the following integration aspects (as illustrated in Figure 2):

• Annotation GUI: interfaces for manual and semi-automatic annotation.

• Integration/platform: technologies and components to create a platform to manage content and metadata

• Data& vehicles: activities related to the preparation of test vehicles, installation of sensors, recording activities, in-vehicle processing computation, etc.

• Computer vision & Deep Learning: creation of automatic annotation mechanisms based on computer vision and deep learning.

One of the key aspects of Cloud-LSVA is its cyclical implementation, using three cycles to produce iteratively functional prototypes: Alpha (first year), Beta (second year) and Gamma (third year). During the first year, most of the work was exploratory, but served to set the basis for what Cloud-LSVA platform would be, including the early set-up of in-vehicle recording

D 5.2

F – v1.0

12

equipment, creation of cloud infrastructure for storage and computation, and the definition of the data and metadata languages (RTMaps and VCD, respectively).

The Beta prototype incorporated a number of advanced features, especially in the computer vision and deep learning branch, along with enhanced web-based applications and the implementation of engines at the cloud-side.

Figure 2: Integration of Cloud-LSVA prototypes: timeline and work lines.

For the Gamma prototype the following work has been added to the existing Beta prototype to finalize the project:

• Creation of new GUI for additional annotation capabilities (3D annotation on point clouds and multi-view annotation for surround view cameras)

• Integration of DL techniques for additional annotation capabilities

• Platform scale-up using Kubernetes

• In-vehicle integration of automated functions, including vision-based algorithms (e.g. real-time semantic segmentation in TU/e’s vehicle, real-time 3D object detection in Valeo’s vehicle, and real-time 2D object detection in Vicomtech’s vehicle).

In total 3 Annotation workshops were held during the lifetime of the project.

During the final integration period the 3rd and last Annotation Workshop was celebrated, in the form of the Cloud-LSVA Final Event (29th November 2018, Stuttgart), which gathered

D 5.2

F – v1.0

13

developments from the consortium, including life demonstrations of the platform, the analytics, and the in-vehicle real-time mechanisms.

It is noteworthy that Cloud-LSVA had to face the problem of the emergence and evolution of technologies and standards during the lifetime of the project: e.g. we need to move from Open TOSCA to Kubernetes, from Caffe to TensorFlow and from video-only to three.js 3D web applications. In general, several technologies that were unknown or non-existent at the beginning of the project were identified, monitored, compared and finally adopted, which forced Cloud-LSVA to be flexible and adaptive to changes in the cloud-computing and autonomous vehicles ecosystem.

D 5.2

F – v1.0

14

3. Cloud processing scalability

3.1 Introduction Nowadays, IT is becoming part of the business process in many sectors and domains. Cloud technologies, DevOps, micro services architectures, etc., are fulfilling the requirements of the industry in its different business areas. To be efficient and successful, it is also becoming essential for data scientists and researchers to work jointly as team with developers and IT experts but also to be themselves aware of the technology stack and its solutions.

Container technologies such as Docker are standardizing the way the components are encapsulated and deployed in complex cloud applications1. In this context, Kubernetes (k8s) has emerged as an open source solution for automatic deployment, scaling and management of containerized applications2.

The Kubernetes API and objects allow describing the cluster’s desired state: applications or other workloads to run, container images they use, the number of replicas, network and storage resources to make available, and more. Kubernetes makes the cluster current state match the desired state. To do so, Kubernetes performs a variety of tasks automatically– such as runs continuous control loops to manage state (desired vs. current), starting or restarting containers, scaling the number of replicas of a given application, and more.

In this section we describe the proposed approach to orchestrate via Kubernetes the deployment of containerized computer vision analytics as a solution for providing scalability to the annotation platform in the ADAS context. The proposal considers the following aspects:

• An architecture design for analytics as a service from a micro services approach

• Storage of volumes of data to be annotated

• Managing compute resources and study its impact in performance

• Logging and monitoring the system for performance assessment

We have studied how the system manages a highly demanding tasks queue in a large dataset (i.e. large raw recordings), focusing in two questions:

• Given a certain infrastructure: How it manages an increasing workload requirement (pre-annotation batch jobs), even when the system reach its maximum computing capacity

• Given the workload (a pre-annotation batch job): How does performance increase when virtual hardware is added or enhanced

The study also considers economical costs among the most relevant defined situations.

1 https://www.docker.com/ 2 https://kubernetes.io

D 5.2

F – v1.0

15

3.2 Analytics as a service We propose integrate analytics components as container application in a highly scalable and reliable Kubernetes architecture, agnostic to cloud providers, valid for any ADAS analytics, which includes:

• A web server to remotely reach the application through http calls

• A file puller to integrate the demanded application role or version

• File storage systems mounted to read and write the data in each case

Figure 3: Proposed architecture of the Kubernetes orchestration of computer vision applications.

The architecture parts are:

• Kubernetes service: provides a load balanced external IP endpoint

• Replicable Kubernetes deployment (highly available, status feedback, scalable by replicas, auto update, ...), with three containers:

o C1: File Puller: CV app definition, version control, ...

o C2: Proxy with k8s API: Create CV app as Kubernetes Jobs (creates one or more pods and ensures that a specified number of them successfully terminate. As pods successfully complete, the job tracks the successful completions)

o C3: Tomcat: Web server

• Persistent Kubernetes volumes (abstract storage resources, providing storage persistence further to pod life). Mounting the following resources:

o Vol1. Internal: File sharing

D 5.2

F – v1.0

16

o Vol2. External: Share storage input data

o Vol3. External: Output data persistence

The general idea in the annotation context of Cloud-LSVA is to have the recordings stored on a NAS organized as filesystem. On the other hand, there would be a database with basic information (stored in JSON and with SQL access) about the recordings (URL to the video location on the NAS, location, etc.). Finally, to store the results of the annotations there would be another database similar but independent of the previous database. File storage is only meant to be used by servers that are in the same Datacenter the storage is in.

3.3 Methodology

3.3.1 Analytics

Data is recorded in equipped vehicles, creating about 10 TB/day/car, considering a sensor set-up including 4 surround view HD cameras, a raw point cloud from a Velodyne HDL-64. The data is uploaded to the cloud storage system (NAS) and then it is available for the annotation process. This process consists of two basic steps: (i) a transcoder which extracts video data from RTMaps containers, and (ii) a CNN detector which produces 2D object detections from the videos:

1. Transcoder: the transcoding starts as a k8s job for all videos (raw RTMaps recordings between 20 GB and more than 500 GB in size). The transcoding is very challenging since it requires high CPU and memory resources. After the transcoding the results are 30 seconds slices of the original video in mp4 format.

2. Detector: An SSD MobileNet detector provides a pre-annotation of the objects in every of the previously transcoded .mp4 videos. The output of this analytics is a file which describes the detections in VCD format.

To minimize bottlenecks other than pure computation, the data is stored locally in a NAS infrastructure in the same location of the cluster (Germany - Fra05). Following the previous reasoning, to avoid writing conflicts, the output results are saved in the temporary filesystem in the pod.

3.3.2 Monitoring

In the IBM Cloud, metrics are collected automatically. We can use Grafana to monitor the cluster performance metrics during the experiments. In Figure 4 a Grafana dashboard example is shown for an experiment of 20 concurrent transcoding jobs, showing different metrics of the experiments related to the containers and the worker nodes.

D 5.2

F – v1.0

17

Figure 4: Grafana dashboard showing cluster performance (CPU Load and Memory used).

However, the metrics collected by the IBM Monitoring Service are not stored permanently and are only available for some days (varying with the contracted account). For this reason, a metrics retrieving application has been developed using the Metrics API of the IBM Cloud. Using this developed tool we have recorded all the metrics for all executed experiments for further analysis.

3.3.3 Retrieving logs

The logs are retrieved using a bash shell script which run sequentially the Kubernetes command-line tool to retrieve and save locally the logs of every deployed pod and the information about the completed pods and jobs. Specifically, we have used the commands:

• kubectl describe pod <pod name>

D 5.2

F – v1.0

18

• kubectl get pods -a

• kubectl get jobs -a

Also, a Python script has been developed to read the retrieved logs and save the starting and finishing time of every pod in a csv file.

3.3.4 Running concurrent batches

A bash shell scripting and the Kubernetes command-line tool, kubectl, have been used to deploy and manage the batch experiments.

To enable further analysis of the metrics and logs, it is necessary to get a unique ID for each pod in the experiments. A code name for the deployed jobs has been defined as:

<job role (crop or cnn)> + <date (month, day)> + <version (a, b, c, ...)> + <batch concurrency> + <j> + <number of the task in the batch>

This code must be maintained as short as possible to avoid too long URI errors.

3.3.5 Managing compute resources for containers

Pod specifications optionally allow setting how much CPU and memory (RAM) each container uses. When containers have resource requests specified, the Kubernetes scheduler can make better distributions of pods in nodes. Analogously, if containers have specified limits, resource allocation on nodes can be handled in the specified manner: CPU is specified in units of cores, and memory is specified in units of bytes.

When a pod is created, the Kubernetes scheduler selects a node for the pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for pods. The scheduler ensures that for each resource type, the sum of the resource requests of the scheduled containers is less than the total capacity of the node. It is noteworthy that even when actual memory or CPU resource usage on nodes is very low, the scheduler still may refuse to place a pod on a node if the capacity check fails.

In order to define accurately the required resources for the two services (Transcoder and CNN Detector), a single-node cluster with 16 cores and 64 GB of RAM was created to run one instance of each type separately. Table 1 shows the results of the experiment.

Table 1: Metrics of the two different types of services on a single-node cluster.

Transcoder CNN Detector CPU usage 10 % (~ 1.6 cores) 30 % (~ 5 cores) Memory usage 30 GB 0.7 GB Time to finish 423 s 95 s

From Table 1 we can observe that the CPU usage request defined for each pod has been defined to be lower than the CPU used by each service (0.5 cores for the Transcoder and 1 core for the CNN Detector). By doing so, we ensure the usage of the full computing capacity of the cluster, for cases where the sum of the requests for concurrent services exceeds the capacity of the cluster. Additionally, this approach protects the cluster against oversaturation produced by scheduling all jobs simultaneously. In any case, Kubernetes is able to manage

D 5.2

F – v1.0

19

request peaks, and completes all the tasks, although at the cost of having some tasks evicted temporally, and re-scheduled when resources are released.

Regarding RAM usage, it is not possible for a pod to use more memory than available in the cluster or the defined as limit for the pod. Therefore, memory requests and limits are always kept high enough to ensure enough resources for every scheduled pod.

3.4 Tests and discussion: low to medium scale video analysis

For the test, three different clusters have been used. A base cluster has been defined with 4 worker nodes, each with 16 computing cores and 64 GB of RAM. Then, two other clusters have been created, using the base cluster as basis and doubled its capabilities with two different approaches:

• Vertically scaled cluster: keeping the base number of worker nodes as 4, but doubling the performance of each of them (16x2 = 32 cores, 64x2 = 128 GB).

• Horizontally scaled cluster: keeping the base worker nodes characteristics (16 cores, 64 GB) but doubling the number of worker nodes to 8.

Different job concurrencies have been tested: 16, 32, 64, 128 and 256 concurrent batch jobs were deployed at each of the clusters. We use a single video for the analysis, so that all tasks can be compared. Transcoding tasks use a 25 GB video, so the tested batch are equivalent to processing video from 400 GB to 7 TB approximately. CNN detection tasks use a 6,25 GB video. Each of the different batch concurrencies were deployed sequentially, not starting until the previous one was finished.

3.4.1 Scaling high computing consuming jobs (CNN detector)

Figure 5 shows the average CPU load for each worker node of the Kubernetes host over the past 1 minute for the base cluster (4 workers and 64 cores in total) and for the horizontally scaled cluster (8 workers and 128 cores in total). An average CPU load greater than the number of cores of the worker indicates that traffic to the host is queuing. Pod scheduling is based on requests. A pod is scheduled to run on a node only if the node has enough CPU resources available to satisfy the pod CPU request. Therefore, these results show how Kubernetes is able to orchestrate computing requirements higher than cluster capacity and end all the jobs successfully.

D 5.2

F – v1.0

20

Figure 5: Average CPU load per worker node for the (left) base cluster (4 worker nodes, 16 cores per node), and (right) horizontally scaled cluster (8 worker nodes, 16 cores per node)

The plots in Figure 6 show the same load per worker node stacked to 100% in the two compared clusters. This way it is shown how Kubernetes orchestrates the load between the nodes during the entire lifetime of the experiment.

Figure 6: Load per worker node stacked to 100% in the (left) base cluster and (right) horizontally scaled cluster during the experiment.

In Figure 7, the total duration of each batch process (spanning the number of jobs launched simultaneously at the batch, i.e. batch concurrency) is shown for the three compared clusters (base cluster, horizontally scaled, and vertically scaled).

D 5.2

F – v1.0

21

Figure 7: Duration of each batch concurrency for the three compared clusters.

As expected, from the graph we can confirm that the experiment duration is lower for higher performance clusters. However, we can also observe that the ratio between cluster performance and batch duration is not linear (doubling cluster scale does not provide half the time). This non-linearity suggests some processing time is inherent to how Kubernetes manages the orchestration as function of the cluster architecture.

To clarify this point, we can analyze the mean duration of jobs deployed in each batch experiment related to the concurrency per cores in the cluster (illustrated in Figure 8). It can be observed that the job in the base cluster is always faster than in the scaled clusters. Besides, the jobs in the vertically scaled cluster show the worst performance.

Figure 8: Average duration of jobs according to concurrency for each of the three compared clusters.

From these results we can deduce that scaling up:

• improves the cluster performance of the total process and it improves with higher concurrencies

• more complex cluster architectures yield lower performance per job, and horizontal scaling is the most effective way to scale for the considered type of jobs

3.4.2 Economic analysis

Figure 9 shows the cost (€) per job execution, computed with the effective job time (total batch duration/batch jobs) of each experiment and the cluster cost (€/second). Obviously, scaling

D 5.2

F – v1.0

22

up means higher costs. However, as the trend lines in the graph clearly show, the job cost gap between the base and the scaled clusters decreases as concurrency increases.

Figure 9: Cost (in €) per job execution considering the effective job time of each experiment and the cluster cost (€/second).

Figure 10 shows the comparison of the two scaled clusters and base cluster results for time and cost per jobs. The trend lines show, that scaling horizontally the performance of the process is increased considering the two dimensions: processing time and cost of the cluster.

Figure 10: Comparison of the scaled clusters and the base cluster in time and cost per job.

Therefore, from a cloud architecture point of view, the scale of a cluster can be designed based on the cost per job versus effective time per jobs. Figure 11 shows this comparison in a case of concurrency twice the cluster capacity (jobs/cpu=2).

D 5.2

F – v1.0

23

Figure 11: Cost per job compared to effective job duration for the three compared clusters for a concurrency jobs/cpu=2. The line sets a threshold between performance and cost.

3.4.3 Scaling high memory and computing consuming jobs (transcoding)

Containers which consume a lot of memory (such as the Transcoder job defined in this report), suppose greater difficulties for the orchestrator compared to containers which consume CPU. A container can exceed its memory request if the node has memory available. But a container is not allowed to use more than its memory limit. If a container allocates more memory than its limit, the container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the container is terminated. If a terminated container can be restarted, the kubelet restarts it, as with any other type of runtime failure.

During the batch experiments of different job concurrencies (16, 32, 64, 128 and 256) of the Transcoder task, the base cluster provides evicted jobs due to a lack of memory resources in the cluster. The evicted jobs are not desirable; however, Kubernetes restarted them and finally completed the execution of all tasks successfully.

Figure 12 shows the variation in the duration of batches of the transcoding jobs (high CPU and memory consumption) for the different levels of concurrencies and for the two considered scaled clusters (vertical and horizontal). Both clusters show a similar trend. Then, scaling up in memory is not so related to the speed of the computation of each job, but about the reliability of the orchestration process (ensuring not evicted jobs) and about the capacity of the cluster to schedule jobs in parallel.

Figure 12: Batch duration against total number of jobs (concurrency) for the two scaled clusters.

D 5.2

F – v1.0

24

3.5 Tests and discussion: large scale video analysis

The final tests focus on an analysis of the needed cluster characteristics to successfully execute the defined jobs (transcoding and CNN detection) for a dataset produced by an equipped test car during a one-day recording stage. Using the parameters of the camera we have estimated that approximately 200 MB of raw video are generated per camera each second. Therefore, around 3 TB are generated per camera per hour. Each car has 4 cameras: 12 TB of data are generated in a car each hour of recording. A normal recording journey is between 4 and 8 hours. Then, up to 23 TB of data can be recorded at the end of a recording day. In order to reproduce this order of video processing, and still using a single video for the analysis, so that all tasks can be compared. 4000 CNN detection tasks have been defined using a 6.25 GB video, so we scale the demand until 25 TB of total video processing.

After the conclusions reached in the previous section, we are following a horizontal scale up strategy. It means we are defining a high-performance cluster equipped with 32 worker nodes with 16 cores and 64 GB RAM each (making a total of 512 cores and 2048 GB RAM).

The duration of the complete process was 1:32:22 (H:M:S). The mean duration of each job was 0:20:33 (H:M:S) with a standard deviation of 0:02:24 (H:M:S). The job with maximum duration took 0:22:56 (H:M:S) and the minimum took 0:09:58 (H:M:S). Considering the 4000 jobs are running parallel across the worker nodes, the effective time per job (total duration/total jobs) is 1.39 s/job. The cost of this cluster in the IBM Kubernetes Service is 23.282,44 €/month. Then, the cost by job (with the effective time as 1.39 s/job) is 0.012 €/job, which means the total cost of the entire process is about 48 €.

Figure 13 shows how the load of the batch jobs is orchestrated (over 100 % of the total load at any given time) across the 32 nodes of the cluster. The most interesting insight is how the load is equally orchestrated across the nodes during the experiment. This way the cluster computing capabilities are optimally exploited. Some discontinuities in orchestration are shown during the start and the end of the experiment, which can be considered as a normal behaviour since in these transitory times the load is not stable.

Figure 13: Load distribution by worker nodes during the execution of the process.

D 5.2

F – v1.0

25

3.6 Conclusions

In this study we have analysed the problem of executing computer vision processing tasks in clusters of different scales and evaluated the best practices to design optimally in cost and time-effective terms. We have proposed an approach based on containerized services orchestrated with Kubernetes and deployed in a PaaS provided by IBM.

We can state the solution proposed (Analytics as a Service, containerization and Kubernetes orchestration) is scalable, being able to manage an increasing workload, and that increases in performance when hardware (worker nodes and storage) is added or improved. The approach also scales being efficient in large situations (i.e. large input data, large number of users and a large number of participating nodes). Regarding costs, increasing volumes of workloads show a reduction of the effective cost per task in the scaled-up solutions.

The Kubernetes code developed to deploy our solution is highly agnostic to the cloud provider. The objects described in Kubernetes do not change between providers, only some variables to mount the storage systems must be adapted to the provider. The main barrier to change between providers is the hiring of their own services and the understanding of the different restrictions inherent in the hardware that supports the cloud. The main bottlenecks are in the areas of security and in terms of bandwidth for transmission of large amounts of data through the cloud. Therefore, in the specific problem of automotive data annotation, a hybrid multi-zone cloud solution, where data is processed close to where it is created but taking advantage of all the advantages provided by the cloud, the improvements in costs and the greater reliability and availability of services when required.

D 5.2

F – v1.0

26

4. Cloud data management

4.1 Data Storage Strategies

The need for efficient data management and storage became apparent in the project from one of the partners. Current state of art is that the annotation process is still very much manual labour, but also the management of the annotated data proved to be much more manual labour than initial envisioned. Valeo noticed that integration with external partners and sharing data without a common platform, currently still generates a huge effort in managing data-exchange. Logistic expertise is needed to send the data via HDDs (currently not possible via direct connections based on the required throughput), check the consistency of data and import/link annotation-result files to the correct traces. Using a cloud source as the common platform proved to be a means to solve this issue.

As section 3.5 proved, the cloud data management strategies investigated and developed in this project are particularly critical as the volume of data is significant and the inability to deliver data in a timely manner will have a dramatic effect on performance.

In order to ensure best possible performance, a variety of potential storage options were considered, which are outlined below. The criteria cover metrics such as cost, reliability, resilience and of course suitability for Big Data.

4.1.1 IBM Cloud Storage Options

IBM Cloud provides for a number of “true” cloud-based storage solutions including block storage, object storage and file storage.

Object storage is designed for the application to manage the storage and there is not real file system in the traditional sense. Since the video data is from a fixed source (i.e. specific files types) this option is not realistic.

File Storage is the most straightforward providing a simple file system and requiring minimal management. The downside to these is that there is less inherent ability to deal with massive data sets and there is a very low level of control as to where the data resides, how it is provided and how transfers are managed.

Block storage offers a variety of benefits including expandability, replication, snapshots, and volume duplication across data centres, high availability and does so without the overhead of having to build or manage system level RAID arrays. The general benefit is that it is low management, and costs are predictable, however the cost is relatively high and there is still a certain lack of control regarding some of the lower level configuration elements.

4.1.2 IBM Softlayer NAS Storage Options

The NAS option goes in a different direction and uses cloud provided hardware to build out your own personalized cloud storage option. The advantage of this is a far greater level of control of the Storage system itself but also the ability to place it physically in the same location as the compute power behind the analytics systems. Softlayer provide a variety of NAS configuration types:

1. Local Disk – This is really only useful for small datasets, 400GB or less.

D 5.2

F – v1.0

27

2. SAN Disk – this can handle substantially more data, up to 8TB

3. iSCSI SAN Disk – This expands the existing SAN size to beyond 8TB and also provides some replication and snapshot functionality

4. NAS – This provides a dedicated bare metal machine where storage can be configured to a wide variety of requirements and has large potential for expanding storage into the petabyte range.

4.1.3 IBM NAS Storage Option Chosen

For this project a Quantastor NAS provided by OS Nexus was chosen. This provides a variety of advantages. Mainly the capacity to expand to petabyte scale but it also provides a huge level of control and configuration options. The NAS will support for all major file, block, and object protocols including iSCSI/FC, NFS/SMB and provides the ability to directly manage disk RAID arrays as well as the network configuration for the NAS. Direct management of the storage grid is also facilitated via the UI as well as the ability to create and manage storage pools and volumes. There is also the ability to enable remote replication, interface with traditional cloud storage and create glusterfs and ceph clusters. Finally the NAS includes end-to-end security for data at rest and on the move.

4.1.4 Benefits and Performance

The chosen NAS solution covered a variety of use cases. It provided iSCSI interfaces for the main datasets so they could be accessed from a variety of potential sources from virtual machine file systems to directly from Kubernetes persistent storage claims. But in addition to this it also allowed us to configure other systems to rely on the NAS – in this instance the esxi cluster used the NAS for all the VM storage requirements. During the various tests and trial runs the NAS facilitated rapid availability of the large data sets to the docker images (and later k8s pods) that required them. During these tests the NAS and supporting SAN infrastructure was never stressed more than 50% of its capacity, showing that the chosen setup easily supported the analytics tasks.

4.2 Data Transfer Strategies

4.2.1 Internal Data Transfer

At the core of the internal data transfer strategies is the SAN (Storage Area Network) whose primary purpose is to transfer data between the storage and compute elements in a given environment. It consists of a communication infrastructure which provides physical connections and a management layer which organizes the connections, storage elements and computer systems so that data transfers are optimized. To maximise data transfer speeds across the SAN fibre channels are used. These are high speed transfer protocols allowing transfers speeds up to 128 gigabit per second providing lossless, in-order delivery of raw block data from SCSI Disks.

4.2.2 External Data Transfer

On a wider level, there may be a need to make the data available in more than one cloud infrastructure location depending on the use case and stakeholders involved and any

D 5.2

F – v1.0

28

additional requirements for data outside the core “Big Data” video files in question. While any initial data transfers of bulk data may require physical transfer, as new data is made available it may be necessary to transfer across WAN networks. IBM Aspera High-Speed File Transfer enables rapid transfer of large files and data sets over an existing WAN infrastructure. It provides predictable, reliable and secure delivery regardless of file size, transfer distance and network conditions. This file transfer software consists of client and server software packages that can transfer time-critical files across multiple locations—even remote locations using poorly-performing networks. It provides a management console application that offers consolidated, single-point management of the entire Aspera network, as the transfer of such large datasets is not trivial and must be managed carefully. The aspera network allows for the entire Cloud-LSVA dataset to be transferred from a datacentre in Europe to a datacentre in the US in just 6 days, and incremental updates of approximately 0.5TB per hour.

D 5.2

F – v1.0

29

5. Annotation tool - User acceptance tests

5.1 Big Data challenge for ADAS use case

In order to develop further ADAS systems and more autonomous driving functionality into the future vehicles, these systems need to be tested with accurate data. This implies that the requirements on highly accurate annotated datasets is of high priority.

As mentioned before, in current state of art, video still needs to be annotated manually, to make sure that the required ground truth data is available for the testing of ADAS systems.

During the project, several semi-automated annotation tools have been developed and the user tests executed with these are described below.

5.2 Evaluation of pixel-wise annotation use case In this section we present the evaluation of the developed pixel-wise annotation tool (reported in deliverable “D3.7 Final automatic video annotation tools”, section 6). This tool can be used to produce semantic labels for images, i.e. each pixel receives a label or colour which identifies the class the pixel belongs to (see Figure 14).

Figure 14: Pixel-wise annotation tool (see D3.7, section 6).

Semantic segmentation is well known to be extremely time-consuming. In average, 60 minutes of manual labour are needed to label an image of 2048x1024 pixels3.

The developed tool aims to provide mechanisms to reduce this annotation time while keeping the labelled image under the desired quality tolerances.

The two main features of this tool are:

• Super pixels: the tool segments the image automatically and presents groups of pixels which share similar texture (called super pixels), which can be labelled with a single click in a semi-manual process.

3 Xie, J., Kiefel, M., Sun, M.-T., and Geiger, A.: “Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer,” in Proc. CVPR 2015

D 5.2

F – v1.0

30

• Pre-annotation using DNN (Deep Neural Network): the tool has an embedded pre-trained model which is able to pre-annotate the entire image automatically, spending about 30 seconds to 1 minute, depending on the machine.

For the evaluation of the tool we have selected 20 representative images of the CityScapes dataset, and presented to 10 annotators which were asked to use the tool to produce a labelled image as accurately as possible, with and without the use of pre-annotation. Among the annotator group, 5 where completely new to the tool, and thus declared as “non-expert”, while the other 5 where engineers involved in the development of the Cloud-LSVA annotation tools, and had some previous knowledge of the tool. Therefore they were labelled as “experts”.

The total annotation time has been recorded, and the quality of the produced annotation is measured as the standard IoU (Intersection-over-Union) metric.

The experiment is conducted with the goal to investigate the benefit of using the pre-annotation feature, but also the expertise of the annotator. The results are computed also at class level (e.g. “person”, “car”, “pole, “sky”).

Figure 15: Intersection-over-Union (IoU) for non-experts (left) and expert (right) annotators for different classes.

Figure 16: IoU improvement (in percentage) obtained using the DNN-based pre-annotations.

D 5.2

F – v1.0

31

Figure 17: Annotation time (minutes) for non-experts and experts using only Manual annotation, and Semi-automatic annotation (with DNN pre-annotations).

In Figure 15 we can see the quality of the produced annotations per class the usage of the super-pixel feature seems to have positive impact while annotating classes whose geometry is rounded or non-complex, such as vehicles, road, building or sky, reaching IoU above 0.9. However, super-pixels tend to be not accurate enough for small or thin objects, such as poles or distant persons: the IoU keeps between 0.5 and 0.75. The usage of DNN pre-annotation provides better annotations (see Figure 16) for non-expert annotators, with an average improvement of 5.78% of IoU. For expert annotators, the improvement is negligible (average of 0.18% of IoU).

As we can observe in Figure 17, the usage of the pre-annotation produces faster annotation process, with an average 30% reduction of annotation time. As expected, non-expert annotators spend much more time annotating. The figures show that expert annotators spend almost half the time spent by non-experts. Additionally, non-expert annotators achieve an average IoU of 0.73, compared to the much better 0.84 obtained by expert annotators. Figure 18 and Figure 19 show of the produced labelled images, and the original ground truth from CityScapes dataset.

D 5.2

F – v1.0

32

Figure 18: Non-expert annotator: example labelled image using the annotation tool with pre-annotation (top), and original ground truth (bottom). Average IoU = 0.76.

Figure 19: Expert annotator: example labeled image using the annotation tool with pre-annotation (top), and original ground truth (bottom). Average IoU = 0.83.

D 5.2

F – v1.0

33

In general terms, the developed tool is functional and ready to support annotators with advanced tools to speed up the annotation process. Nevertheless, there is room for improvement, in particular, the results obtained with small objects suggest that the tool need to go down to pixel-level painting tools to add the necessary details super-pixels and DNN can’t resolve by themselves.

5.3 Evaluation of 2D bounding box annotation use case

The second main annotation use case as defined during the project was the annotation of 2D bounding boxes on video sequences. Objects of interests are usually vehicles or pedestrians.

After the development of the Beta prototype, the 2D bounding box annotation web interface was ready for evaluation. For that purpose, during the 2nd Annotation Workshop (November 2017), a coordinated user trials exercise was carried out. 35 participants from different volunteers from the consortium labelled two reference video sequences.

Figure 20: (Top) Cloud-LSVA interface where participants of the 2nd Annotation Workshop could access to their annotation task. (Bottom) Web UI for 2D labelling.

D 5.2

F – v1.0

34

Network performance and computational performance of the servers were real-time monitored and analysed afterwards. As a result, a number of numerical conclusions were reached and reported in deliverable “D5.4 Cloud-LSVA Beta prototype”.

In this document we present the user feedback, from the point of view of the validation of the developed tool.

During the annotation workshop volunteers filled in a questionnaire with 24 affirmations that they ranked from 1 (Strongly disagree) to 5 (Strongly agree) to provide their feedback. Affirmations were classified in two groups: “Positive” and “Negative”, thus higher rank is better for “Positive” affirmations, while lower tank is better for “Negative” affirmations.

Table 2 shows the affirmations from the annotation workshop.

Table 2: user test affirmations

ID Question Positive/ Negative

1 I think that I would like to use this system frequently (if I'd have to annotate) + 3 I thought the system was easy to use + 5 I found the various functions in this system were well integrated + 7 I would imagine that most people would learn to use the system very quickly + 9 I felt very confident using the system + 13 The keyboard short-cuts are useful + 15 The new item's bounding boxes can be drawn quickly + 17 The system user interface is easy to understand + 20 Pre-annotations are helpful + 2 I found the system unnecessarily complex - 4 I would need the support of a technical person to be able to use this system - 6 I thought there was too much inconsistency in this system - 8 I found the system very cumbersome to use - 10 I needed to learn a lot of things before I could get going with this system - 11 The time-line is not very useful - 12 Too many input/interactions are required to obtain acceptable results - 14 It is difficult to keep track of the already annotated items - 16 I think there is too much information displayed in too many panels - 18 Semi-automatic annotation algorithms are too slow - 19 I needed to correct many errors made by the semi-automatic annotation

algorithms -

21 I needed to correct many errors made by pre-annotations - 22 Managing pre-annotations is complex - 23 I prefer using only manual/basic annotation functionalities - 24 I found the annotation guidelines unclear/confusing in most cases -

Figure 21 shows the gathered results for positive (top) and negative (bottom) comments. The results are presented splitting the users into 4 different groups according to their experience using annotation tools:

• Group blue: “This is my first time annotating”

• Group red: “I’ve used the Cloud-LSVA platform a few times in the past”

D 5.2

F – v1.0

35

• Group green: “I’ve used other(s) annotation platform/tools in the past, for short periods/limited tasks (sporadic annotator)

• Group purple: “I’ve used other(s) annotation platform/tools in the past, for long periods (professional annotator)

Figure 21: User feedback using the Web UI for the 2D annotation use case: (top) positive affirmations, (bottom) negative affirmations.

From Figure 21 we can conclude that positive affirmations reached an average of 4 for most questions (which means almost strong agreement), being the non-expert annotators the ones most satisfied with the tool. Negative affirmations were also in average not above value 2, which means somehow in disagreement. Again, professional annotators were the most critical about the tool.

This result gives a twofold view:

• professional annotators are most exigent with the tools

• professional annotators are usually familiar with specific other tools, and dislike new tools which make their activity different from what they master, while non-expert annotators tend to agree and like new features

D 5.2

F – v1.0

36

Once analysed, the feedback was used to draw several action lines which guided the re-definition of the architecture of the Web UI for the third period. The created tool is the final UI for annotation, which includes multi-stream capabilities, annotation of 3D information, visualization of point clouds, and a refined performance for streaming data across networks. A full description of the tool can be found in deliverable “D3.7 Final automatic video annotation tools”, section 5. Next section presents the evaluation of the tool.

5.4 Evaluation of 3D annotation use case

The last development effort concentrated on creating a refined version of the Web UI, in order to support additional annotation use cases. Guided by industrial requirements from Valeo, the tool included a large number of improvements and new functionalities compared to the Beta prototype. To name only a few:

• Multi-stream annotation: recordings with multiple cameras (e.g. 4 surround views) can be loaded and a top-view image is built automatically by the Web UI.

• Point cloud: sequences of laser point clouds can be loaded into the Web UI, using WebGL technology to render the data. These point clouds are used as annotation reference for 3D objects such as pedestrians, cars or lanes.

• Different use cases: the new Web UI is highly configurable, supporting different annotation use cases, such as 2D bounding boxes on images, 2D polygons on images, 3D bounding boxes, 3D polygons (e.g. for lane markings). The tool has been developed in a way it can be easily extended to new use cases.

Figure 22: Sample image of the Web UI annotation application: (left) lane annotation, (right) 3D object annotation.

The evaluation of this tool was carried out with a continuous beta testing work line, instead of a punctual large user trial workshop. Valeo and Vicomtech users where using the tool during the last 6 months of the project, producing reports on bugs and performance issues, which were monitored and used by the development team to improve the tool sequentially.

The history of modifications of the tool can be observed by looking at the change log file:

Web Annotation Tool

D 5.2

F – v1.0

37

Current version: 0.1.2 Release date: 2018-11-30

0.1.2 (2018-11-30)

• New annotations: o Lane Sensing annotation o 3DOD Cuboid annotation o Image bbox annotation

• New features: o [Edition]: Lane width adjustment capabilities o [Edition]: Lane geometry merge(shift)/split capabilities o [Edition]: Lane Sensing pre-annotations from PC analytics o [Edition]: Temporal merge/split of annotations o [Security]: Added functionality for supporting csrf token o Added new contextMenu to PCView/VView to add existing objects o Add intervals functionality o Add/Remove intervals o Resize intervals o Propagation of geometries between intervals o Enable/disable edition of dynamic Attributes whether is on frame o Add loading information o Separate views handler for 2D and 3D views o Parted huge PC loaded as smaller pcd o [Optimization] Rendering optimization to increase frontend FPS

• Changes: o Memory usage:Static and dynamic annotation memory usage o Structure:Pedestrians annotated as polygons not polylines o [Optimization] Add content-type to pcd request in order to compress request o Update header buttons depends of use case

• Solved bugs/improvements/others: o o Frame calculate without accumulating error on case 30fps o Wait until odometry load is load with PC o Removing from menu fixed o Visible/not visible bug fixed. o Geometry change saved in the correct place. o Adding point in polygon fixed o Coloring error when loading parted PCs o Two annotations selected bug fixed o Other minor fixes.

0.1.1 (2018-10-10)

• New features: o Add control before remove annotation. o Add option to load big pcd file as partitioned. o Annotation url validation using backend o Add Point cloud coloring with shaders (by height, by intensity, direct color) o External point addition in Open Polygons (alt+click)

• Changes: o More precise and optimized camera moving control o Adapt contextMenu to space of the view o Better management of life cycle of webworkers. o Improve threejs for video texture. o Tree component optimized o Improve data transfer between classes.

D 5.2

F – v1.0

38

o ObjectView show all annotations instance of frame related annotations. o Update structure files o Support for loading and saving '3dod' and 'pd' annotations

• Solved bugs/improvements/others: o [Critical Bug]: Loss of precision on frame time fixed (29.976). o [Critical Bug]: Resolve memory leak of threejs object's garbage collector o Navigation Bug: Final frame management fixed. o Annotations Bug: When VCD locally loads refresh view (to show annotations) o Annotations Bug: Annotations flip and visibility corrected o Interaction Bug: Points movement in polygons (used to disarrange the

polygon). o Overlays Bug: pcd load ui management o Other minor fixes.

0.1.0 (2018-07-31)

• New features: o Objects pointcloud overlay added. o Option to generate a historic label in overlays (e.g. 3DOD ROI Historic).

Details about configuration of this option will be provided. • Changes:

o Top view cameras (Point Cloud View and Camera Top View) now rotated so that front of the car is pointing up.

• Solved bugs/improvements/others: o Fixed bugs in overlays menu and visualization. o Other minor fixes.

0.1.0-beta3 (2018-07-25)

• Solved bugs/improvements/others: o Major improvement in pointcloud data/memory management. o Fixed bug in Object tree visualization.

0.1.0-beta2 (2018-07-23)

• New features: o Objects can now be scaled in one axis. o Complete timeline (not zoomed out). User can switch between complete and

zoom-out timeline versions using the magnifier icon in the timeline panel. o Added video streams in preview tab to allow full speed playback of original

videos. o Added Top View, synthetically created using the four video streams. Top

view will be configurable in the corresponding configuration file. o Overlays now editable. User can select which overlays' layers are to

visualize. Overlays layers will be configurable in the corresponding configuration file. Overlays of "label" type (e.g. ego vehicle) can be provided in front Axle or rear Axle coordinates system.

o Ground point cloud overlay added. o Added mouse pointer icons, which change when interacting with objects and

views. o Added option to manually set/unset key frames. o Added options to load/save annotation labels in world/frontAxle/rearAxle

coordinates system (to be configured in structure file). o Previous XML-based annotation labels now loadable into tool (prior special

XML->JSON conversion needed). o Added tool version information.

D 5.2

F – v1.0

39

o Option to open different traces with the tool. • Changes:

o Object interaction changed. Former Ctrl+click(+others) controls changed to simpler click(+others) controls.

o View interaction changed. Panning former click+drag changed to Ctrl+click+drag.

o Navigation playback icons now larger. o Scan point cloud data coloured in blue, to distinguish from ground data.

• Solved bugs/improvements/others: o Updated structure configuration files to match original XML ones (correct

default coordinates for all objects). o Fixed bug when playing video in VView (not updating video). o Fixed bug of non-master video streams not updating when using playback. o Improved zoom for VView. o Improved point cloud data management. o Fixed some memory leaks. o Improved memory management. o Output VCD annotation labels now limits coordinates precision to 3 decimals. o Fixed other functional and operational bugs.

0.1.0-beta (2018-07-09)

• Initial version. • Main features:

o Interface with timeline, object tree, and player section o Focus on PD and 3DOD annotation functions. o VView for video-only annotation; Video3DView + PointCloudView for video-

point cloud annotation. o Synchronized playback of video streams and point cloud data. o Navigation in timeline panel and with keyboard arrows/spacebar. o Object attributes and default coordinates as defined in structure configuration

file. o Object attributes can be modified in object tree and/or timeline panel. o Scene attributes can be modified in object tree. o Basic zoom/pan controls in views. o Possibility to change current visualized video stream in Video3DView. o Point cloud overlays proof of concept (not editable). o Options to add/modify/move/scale/rotate objects (previously defined in

structure configuration file). o Automatic interpolation of objects' coordinates (between define key frames). o Local saving and loading of output annotation labels.

D 5.2

F – v1.0

40

6. Cartography use case - integration, requirements and validation.

6.1 Big Data challenge for cartography

The higher the level of autonomous driving the car has, the more important the correctness of the map should be.

TomTom is currently working on delivering HD-MAPs for that specific purpose.

Everyday TomTom has 150 mobile mapping vans creating new data for the HD-MAP updates. One such day of driving, results in 1 Tbyte data per car. This is mostly LIDAR, GPS/GNSS and 360 images. This data needs to be send to the map creation process overnight in order to be post-processed by engineers. This has proven to be impossible in most situations. Only special upload stations with high bandwidth are able to send this amount of data overnight. In most areas however, these are not available and the only way to transfer the data is by physically sending the hard disks by overnight delivery. Having projects as Cloud-LSVA where the data is reduced by running algorithms in the car is the only way to scale this to an entire fleet.

6.2 Upload engine for SLAM mapping from crowd-sourced data

To ensure that changes detected by the algorithms running in the car quickly reflects in a map change, the car uploads its slam detections to a slam observation data service. Multiple observations are combined and if needed requests for extra data (e.g. photos/video) can be send to the car. This will result in an evidence for a map change that is send to the TomTom data gateway.

The TomTom map production is a continuous map delivery system. That can create several map layers like HD-Road, Traffic, SD-Map. In Clould-LSVA this process is extended to create an up-to-date SLAM map-layer (see Figure 37). This new layer is sent back to the vehicle through the TomTom Autostream system. An additional layer with mining locations will also be generated, which will contain locations where full sensor recording is needed to improve the map. This data will be sent after the trip when the car is in range of its home Wi-Fi.

Figure 23: Autonomous driving map system. In green the TomTom Autostream system and in blue the Cloud-LSVA research extensions.

D 5.2

F – v1.0

41

Requirements of SLAM observation processing system:

The requirements for the big-data processing platform are based on both functional and non-functional aspects. Starting with the non-functional, it is important that the platform is flexible, so that it can adapt to changing functional requirements, as the research progresses. Therefore, we choose to work with general of-the-shelve components that can be re-configured and deployed easily, these components e.g. include Docker, CUDA, MariaDB. These components act as the corner stones of the big-data processing platform and are also used in other big-data research programs. In post-project productization and deployment, it is very well possible that more tailored (home-made) components will be used.

Concerning the functional requirements, it is important that the big-data processing platform supports both non-linear graph optimization and deep learning. For this, the current state-of-the-art are G20 and TensorFlow, respectively, and are therefore also used in our research. Concerning scalability, TensorFlow is designed for large-scale heterogeneous distributed memory computing and therefore very well suited for big-data processing. The non-linear graph optimizer G2O currently only supports shared memory parallelism and therefore will be a bottleneck in general (non-private) cloud-based systems. However, the underlying optimizers of G2O, performing matrix factorization, can in post-project development efforts be replaced with their distributed memory counterparts. Within the scope of this research project, the current shared memory optimizers suffice in terms of compute power.

Figure 24 shows the final implemented concept framework using visual odometry and semantic segmentation on the vehicles and COP-SLAM and G2O on the back-end to create the maps with additional object features.

Figure 24. Multi-vehicle distributed SLAM. In this set-up the vehicles only house the front-end, which turn the raw video data in more compressed pose-chain data. This compressed data is send to the cloud where a map is constructed from all observations.

D 5.2

F – v1.0

42

6.2.1 Autostream map delivery platform

The Autostream platform contains of a server and client library. That together ensures the availability of the most recent data relevant for the car. By giving the client library the current location and hints like most probable path or planned route. The client can request the server the latest version of the data tiles that covers its interest area.

The system is kept stateless to ensure easy scalability through load balancers and different availability regions. This service is currently under development within TomTom. In Cloud-LSVA we use the same concepts as the Autostream concept but only focused on the new SLAM map layer. We safe the costs of deploying a full continuous map delivery in the Cloud-LSVA cloud but by staying close to the concept, this layer could be easily added to the production process, if proven successful.

Data Transfer:

By using a tiling concept in Autostream we reduce the transfer of unused data to a minimum. On the upload side, only a small amount of metadata is sent to the cloud. When photos or videos are needed for the machine learning processes this will be requested but only sent when the car is under the right conditions like (Wi-Fi/Ethernet/unmetered 4G)

For the HD-Road map layer there are some metrics available for a typical road car. We expect that the SLAM data layer will have similar characteristics.

Assumptions: Tile size: 2.4 x 2.4 km2 Chance requiring a unique tile every 2.4 km: 75% Average tile size: 30 KB (HD-ROAD FRC-0) Protocol overhead factor: 20% Average driven kms per year: 13000 km Unique driven kms per year: 5000 km Average updates of individual tiles per year: 1 Data usage per car: Estimated # of unique tiles/year (rounded up): < 1600 tiles/year Estimated data usage (rounded up): < 60 MB/year

Loop closure across multiple vehicles, using global vehicle pose clustering

To improve map updates, we can perform loop closure between multiple vehicles, and between multiple passes of the same vehicle. To achieve this, we developed a software module for clustering pose information. It can determine which recorded vehicle poses (from different vehicles and different passes) are close to each other in terms of Euclidian distance and heading angle. The image data associated with these vehicle poses likely shows largely the same scene. Using key point matching, loop closure can be performed with these images, increasing the accuracy of the associated vehicle poses and corresponding sensor data, leading to better map updates.

D 5.2

F – v1.0

43

The clustering software module takes as input one or more recorded (and optimized) traces from one or more vehicles. The individual vehicle poses are clustered using their x and y coordinates, using the Birch clustering algorithm from scikit-learn. The resulting position clusters are subjected to another simple clustering algorithm developed from scratch. This one divides the position clusters into sub clusters of poses that have similar heading angles. Figure 25 shows the result of these two steps combined. Vehicle poses belonging to the same

(sub-)cluster contain candidate pairs for loop closure, as the associated images likely show largely the same scene.

Figure 25: Color-coded output sample from the developed software module. Each color represents a (sub)-cluster of vehicle poses with similar position and heading angle, i.e. candidate pairs for loop closure. Colors are assigned randomly, so some distinct clusters have more or less the same color.

This software module is fast, and it scales well. We already demonstrated it in an integrated setting. During this demonstration, the input folder of this software module was repeatedly fed with new (optimized) vehicle traces, repeatedly triggering the clustering procedure. The output was automatically fed to a loop closure software module and global optimizer, which are still under development.

6.2.2 Semantic SLAM

The above describes how to obtain the map geometry. Another crucial part is obtaining the semantics of the street scene. For this we have developed a semantic segmentation method based on deep learning. Example output is shown in Figure 27, Figure 28 and Figure 29. The network is currently able to segment up to 108 classes of which several are relevant for mapping purposes. The current version of the network is being trained to also detect lane attribute markings. All three versions (with different functionalities) are described below in order of development:

• Semantic Segmentation

D 5.2

F – v1.0

44

• Goal: Predict, using a Fully Convolutional Network (FCN), a semantic class for every pixel in the input image, from a predefined set of a few dozen classes.

• Hierarchical Semantic Segmentation

• Goal: Extend the set of semantic classes that the FCN can predict to hundreds, though a tree of hierarchical classifiers.

• Panoptic Segmentation (Simultaneous Semantic and Instance Segmentation)

• Goal: Predict a semantic class, from a predefined set of stuff and things classes, and an object id for every pixel in the input image. Pixels predicted to stuff classes (e.g. vegetation, sky, road) don’t have an object id. Pixels predicted to things classes have an id that separates different instances from each other.

Design

All three models consist of the same feature extractor backbone (modified fully convolutional ResNet-504), where the majority of computations happen, and shallow extensions together with different classification and/or regression heads for each specific model.

The feature extractor backbone contains conventional and dilated convolutional and pooling layers. The convolutional layers explicitly leverage 2D spatial image information (relations), while the pooling layers summarize spatial context by reducing the spatial dimensions of the representation. The output of the feature extractor is fed to the following modules according to each model:

Semantic Segmentation: the feature representation of the input image is sent to a per-pixel softmax classifier, which outputs the final per-pixel semantic predictions according to a MAP decision rule.

Hierarchical Semantic Segmentation: the feature representation of the input image is fed to a hierarchy of per-pixel softmax classifiers. Before each classifier we insert a few extra convolutional layers in order to adapt the representation to the needs of each classifier. The output is generated from all classifiers according to a hierarchical MAP decision rule.

Panoptic Segmentation: the feature representation of the input image is shared between a detection branch, based on Mask-RCNN, and a segmentation branch, based on the module above. The detection branch outputs (possibly overlapping) per-pixel object masks. The segmentation branch outputs per-pixel semantic predictions. The output of the two branches are merged using advanced heuristics, which resolve per-pixel conflicts of overlapping semantic classes and/or object ids.5

4 Meletis, P., & Dubbelman, G. (2018). Training of Convolutional Networks on Multiple Heterogeneous Datasets for Street Scene Semantic Segmentation. IEEE IV 2018. 5 de Geus, D., Meletis, P., & Dubbelman, G. (2018). Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network. arXiv preprint arXiv:1809.02110.

D 5.2

F – v1.0

45

Implementation

All models are implemented in Tensorflow using Python. Semantic Segmentation and Panoptic Segmentation models are trained on a single machine with 4 Titan V (Volta – 12 GB) GPUs. Hierarchical Semantic Segmentation model is trained on a Titan Xp (Pascal 12 GB) GPU.

Table 3: Training datasets per model

Semantic Segmention Hierarchical Semantic Segmentation

Panoptic Segmentation

Datasets Cityscapes, Vistas, Apolloscape, AutoNUE, KITTI, Wild Dash

Cityscapes, Vistas, GTSDB

Vistas

Quantitative Evaluation

Semantic Segmentation and Hierarchical Semantic Segmentation

We evaluate using two metrics:

1. Intersection over Union (IoU): This metric quantifies well the amount of overlap between the predicted and the ground truth pixel masks for each class, and is maximized when the overlap is perfect. It is averaged across all classes (mIoU) and has range [0%, 100%].

2. Accuracy (Acc): This metric quantifies the accuracy of the predicted class for each pixel for a multiclass classification problem and coincides with the recall definition for a binary problem. It is averaged across all classes (mAcc) and has range [0%, 100%].

Table 4: mIoU for Semantic Segmentation & Hierarchical Semantic Segmentation using CityScapes and

Vistas datasets

Semantic Segmentation Hierarchical Semantic Segmentation

Cityscapes Vistas Cityscapes

Vistas GTSDB

mAcc [%] 86.1 53.3 66.6 38.9 57.7 mIoU [%] 77.5 43.8 57.3 31.9 41.5 mIoU oracle [%]

100.0 100.0 97.2 96.24 N/A

The drop in performance for the hierarchical model is due to limited memory resources (1 GPU) and the need to include examples from every dataset into the training batch.

If looking at three specific classes relevant for the ADAS and cartography usecase however, the mIoU for road (purple in Figure 27, Figure 28 and Figure 29) 95.4, for cars (blue in these same figures) is 91.2 and for traffic signs 62.7 (yellow in Figure 27 and Figure 29) for Hierarchical Semantic Segmentation based on the CityScapes dataset, making it a promising automated segmentation solution.

D 5.2

F – v1.0

46


We evaluate using the Panoptic Quality6 (PQ) metric, an extension of the IoU metric, which also incorporates the notion of objects separation. This metric evaluates predictions by matching predicted things and stuff pixel masks with the ground truth and heavily penalizes unmatched pixel masks. This makes the metric hard and unintuitive, and often it doesn’t reflect the qualitative results. This can be seen by the low “human oracle” on many datasets. PQ metric can be decomposed to a product of Segmentation Quality (SQ) and Recognition Quality (RQ). PQ, SQ, and RQ are averaged across classes and have range [0%, 100%].

Table 5: PQ & SQ metrics for Panoptic Segmentation using CityScapes and Vistas datasets


Cityscapes Vistas PQ [%] 23.9 30.5 SQ [%] 66.0 72.9 RQ [%] 31.2 39.6 PQ human [%] 69.6 57.7

Results – Inference

All three models run on a Titan X (Pascal – 12 GB) GPU in the car. Semantic classes include various subcategories of the following high-level classes: flat surfaces, humans, vehicles, constructions, objects (e.g. traffic light, traffic signs) and nature.

Table 6: Number of classes & performance parameters for all 3 segmentation models

Semantic Segmentation

Hierarchical Semantic Segmentation


Number of semantic classes

20 108 47 things, 18 stuff, max 64 objects per image

Resolution 604 x 960 604 x 960 604 x 960 Frames per second 20 18 7

6 Kirillov, A., He, K., Girshick, R., Rother, C., & Dollár, P. (2018). Panoptic Segmentation. arXiv preprint arXiv:1801.00868.

D 5.2

F – v1.0

47

Figure 26. Hierarchy of classes.

Figure 27. Semantic Segmentation.

Figure 28. Hierarchical Semantic Segmentation.

D 5.2

F – v1.0

48

Figure 29. Panoptic Segmentation. Different objects are delineated by a white line.

Performance testing and validation on AUTOSTREAM

The AUTOSTREAM map update process is only in the early stage. For the Gamma prototype we did tests in a controlled environment where the ground truth was captured by a mobile mapping van (MoMa). By removing a traffic sign from the original map and have the car create evidence, we could exactly see if the process was working. We upload the evidence in a data format that is called Roadagram. By adding several reference points, the map alignment process can now position the detected traffic sign with similar precision as it would have been done by the mobile mapping van.

However, this is still a very heavy weight process and is not ready for millions of car producing these evidences.

Economic benefit for using AUTOSTREAM with respect to previous products

TomTom now has weekly updates of their map. Having Autostream at the end of the chain enables near real-time updates of the HD-MAP witch is a huge step forward in having a fresh and accurate map what is needed for autonomous driving. The Cloud-LSVA project gave valuable input in this process and allowed prototyping of the update ideas.

D 5.2

F – v1.0

49

7. Conclusion

The aim of the Cloud-LSVA project is to develop a software platform for efficient and collaborative semiautomatic labelling and exploitation of large-scale video data solving existing needs for ADAS and Digital Cartography industries.

We have showed in this document the final stage of the integration of all developed components into the final Gamma Prototype. This was demonstrated during the Final Event on 29 November in Stuttgart with several live demonstrations of the developed software & hardware solutions (focusing on semi-automated annotations and back-end cloud solutions) and in 3 different vehicles with both the semi-automated annotations and SLAM map-update technology.

We have shown in this document the performance and open issues on the cloud data management, transfer, storage and scalability, by executing performance tests on different component levels.

The proposed solution (Analytics as a Service, containerization and Kubernetes orchestration) is scalable, being able to manage an increasing workload, and that increases in performance when hardware (worker nodes and storage) is added or improved. The approach also scales being efficient in large situations (i.e. large input data, large number of users and a large number of participating nodes). Regarding costs, increasing volumes of workloads show a reduction of the effective cost per task in the scaled-up solutions.

The implementation of the annotation software and user tests have been shown with respective results. Results on evaluation between non-expert and expert annotators and semi-automated annotation tools have been proposed. Using pre-annotation proved to be an improvement in performance of the annotators.

Finally, the cartography update framework was shown with the validation of both the mapping as well as the implemented semantic segmentation functionality for updating objects (like traffic signs) into maps. In tests with TomTom the annotation using a semantic segmentation network for updating the maps with traffic sign information proved to be promising, although still only in the early stages. The current developed map-updating process Autostream using cloud updates proves to be of high value.

d5.2 - report on integration, validation and user trials v1.0 › wp-content › uploads › sites...

Documents