dirty data and how to fix it - shipserv · dirty data and how to fix it shipserv smart procurement...

38
DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, Hamburg [email protected] Georgina Gavin, Chief Commercial Officer 29.03.2017

Upload: hatuong

Post on 21-Apr-2018

232 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

DIRTY DATA ANDHOW TO FIX ITShipServ Smart Procurement

2017, Hamburg

[email protected]

Georgina Gavin, Chief Commercial Officer 29.03.2017

Page 2: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

AGENDA

1.What do we want from Big Data?

2.What do we mean by ‘Dirty Data’?

3.The importance of cleaning

4.Data discipline

5.Investing in people and technology

6.Non-technical overview of VV coding and database

7.Why go to all this effort?

Page 3: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

INTRODUCTION TOVESSELSVALUE

Page 4: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

INTRODUCTION TO VESSELSVALUE

Page 5: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

SERVICES

VALUES

• Daily updated values for Vessels, Companies, Portfolios

• Tankers, Bulkers, Containers, LNG, LPG, PSVs, AHTSs, AHTs, MODUs

• Accuracy tested and reported

• Full supporting information

Page 6: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

SERVICES

SEARCH

• Powerful, highly accurate, interactive database

• Search and compare by any combination of criteria

• Fleet search: vessels, companies, specifications, incidents, locations, laden/ballast

• Deals search: S&P, Newbuilds, Demolitions, Charters

Page 7: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

SERVICES

MAP

• AIS Satellite and Terrestrial mapping/tracking

• GIS Maritime energy infrastructure (oil fields, platforms, pipelines, windfarms)

• Automated alerts (i.e. pre-defined sanction zones, OSV activity around rigs). We currently provide these to banks and regulators

Page 8: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

BIG DATA ANDWHAT WE WANT

Page 9: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

’’GURUS AMONG US HAVE PROCLAIMED 2017 WILL BE THE YEAR BIG DATA GOES MAINSTREAM’’

FORBES, JAN 31 2017

BIG DATA AND WHAT WE WANT

Page 10: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

• Where you have too much data to comprehend and you’re receiving it so fast you struggle to process it

• AIS, for example

• New ways to process and store this data

• However, big data sets aren’t the challenge, it’s understanding what to do with them!

• The important shift we’re all looking for: rather than simply reflecting performance, big data needs to help drive business operations

WHAT IS IT?

Page 11: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

HOW BIG IS BIG?

• Byte = one grain of rice

• Kilobyte = cup of rice

• Megabyte = 8 Sacks

• Gigabyte = 3 Trucks

• Terabyte = 2 Container Ships

• Petabyte = Area size of London

• Exabyte = Area size of UK

• Zettabyte = Fills the Pacific Ocean

• Yottabyte = Rice ball the size of Earth!

Page 12: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

VV’S BIG DATA EXPLAINED

64,222Ships

47,639 Valuable

14.4MRows AIS Position data

Daily on average

5MCaptains Reports

75k changes

16.2Billion Rows

Archived

386MValuations

+1.5M by user request

Page 13: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

Obviously depends on what your business is

Data into INFORMATION

• KPIs

• BI analytics

VV: AIS linked with economic data

• Analyse different types of risk; commercial, voyage risk related to environment and navigational safety…

• Define yourself how risk should be quantified, set your own parameters

• Identify opportunities to optimise your business today

• Identify NEW opportunities

• Solid information (data you can trust to be accurate and commercially sound) will make you better informed and give you confidence to make braver decisions

WHAT DO WE WANT TO ACHIEVE

Page 14: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

• APIs provide access to specific datasets

• Datasets are dynamically updated, up to the minute ‘live’ information

• You, the receivers, can run complex queries and query the data using parameters (quality indicators) to specify your request

• Query the data at anytime

• Currently available in JSON, csv and xml formats

ADVANTAGES

➢Easy to implement

➢Reliable and proven technology

➢Receiver can instantaneously send feedback

➢Cloud storage now available to support

DATA DELIVERY VIA API (APPLICATION PROGRAMMING INTERFACE)

Page 15: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

‘DIRTY DATA’

Page 16: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

• Data is inevitably "dirty" thanks to obsolete, inaccurate,

and missing information.

• Cleaning it up is an increasingly important and overlooked

job that can help prevent costly mistakes

• Although techniques are improving all the time, scrubbing

data can only accomplish so much. Even when dealing with

a relatively tidy set of information, getting useful results

can be arduous and time-consuming.

• every single person in your organization must buy in to the

value analytics brings, from data gathering to

management. Reducing risk of dirty data

THERE ARE NO CLEAN DATA SETS!

Page 17: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

Untangling ‘jumping ships’ and multiple ships reporting on same unique identifier

BEFORE

AFTER

MANUAL FIXING

Page 18: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

Write algorithms to spot outliers and determine whether they are within an acceptable tolerance

• Because the volume of data is so huge, software can automatically sift through numbers and text to look for anything unusual that needs further review

• Over time, computers can improve their accuracy in spotting what's belongs and what doesn't. They can also better understand what words and phrases mean by clustering similar examples together and then grading their interpretations for accuracy. (AI)

• Remember models take time to improve

OUTLIERS

Page 19: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

“Senior shipping executives need to start looking more closely into how data analytics can augment human decisions, while bringing the workforce up to speed…

…With technology changing rapidly today, the industry will develop slower than others if it does not harness and use big data successfully.’’ Oh Bee Lock, PSA

Many organisations are purchasing data but may not currently have the technical capabilities or the economic data that can be linked to produce useful analytics

Data processors vs data providers!

VV has large, dedicated team of mathematicians and developers with freedom to use best data available

HANDLING BIG DATA

Page 20: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

CLEANING

Page 21: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

• Understand what we’re receiving

• Consider potential problems now and in the future, for example unrealistic distance travelled, loss of signal

• Solution, flag or alert when one of those problems occurs

• Algorithms to automatically identify and fix

• Sometimes requires manual fixing, for example incorrect captain’s entry on AIS of ship’s MMSI number

• All of this happens real time 24-7

• A team of 20 continuously monitor and analyse our data to turn it into useful information

HOW DO WE HANDLE BIG DATA?

Page 22: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

DATA DISCIPLINE

Page 23: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

✓ Structured databases

Free form electronic notes

Data inputters need training to input data correctly

HELPFUL TECHNIQUES

Input validation

Standardised fields will help

Suggested drop downs

Outlier analysis – predefined correct ranges

Sister analysis

ORGANISING DATA

Page 24: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

INVESTING IN PEOPLEAND TECHNOLOGY

Page 25: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

87 STAFF

STOKE: 21 highly trained skilled programmers. Most have mathematics background. Product development for internal and external systems.

IOW: 45 dedicated researchers, data inputters

LONDON HQ: Commercial, analysts, economists, quants

SINGAPORE: Representative Office

VV OFFICES

Page 26: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

59,164Development Hours

(between 2009 and 2016)

£4.1MCost @ £70/hour

18Experienced Developers

required to recode in a single year, given a complete spec

15.3 BillionRows of AIS Position Data

> World population (7.4B)

5,400Columns of Data

in 450+ Tables

10+TB of StorageTo store all VV Data

40Servers

22 Database, 18 Compute

2.8MLines of Code

29Development testing sites

THE FIGURES

Page 27: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

2.8M LINES OF CODE

Page 28: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

WHY ARE WE DOING THIS?

Page 29: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

SEARCH THE DATABASE

Page 30: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

TRADE ANALYTICS START AT VESSEL LEVEL

Page 31: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

VESSEL LEVEL STOPPAGES

Page 32: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

VESSEL LEVEL JOURNEYS

Page 33: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

VESSEL LEVEL PROBABLE EVENTS

Page 34: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

AGGREGATED UP TO COMPANY OR SECTOR LEVEL

Page 35: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

AVERAGE SPEEDS & TON MILE

Page 36: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

This is the flow of the requests for a single valuation. The darker colour highlights where the largest % of the time is spent.

It took 213ms (one fifth of a second) to calculate, log and return the values.

Knowledge like this allows us to continuously optimise and eliminate bottlenecks.

Once DCF was complete it took over 3 hours to value every ship, after a few days of optimisation it was down to 13 minutes.

PERFORMANCE AND SPEED

Page 37: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

• Explore other modern ways support your business, embrace

change!

• Understand your own capabilities, be realistic – this will dictate

what plan you need to take

• Establish clear and simple goals

• Remain informed

• Question your suppliers and processors

• Demand transparency

• It’s not what you know. It’s what you do with what you know.

TO SUMMARISE

Page 38: DIRTY DATA AND HOW TO FIX IT - Shipserv · DIRTY DATA AND HOW TO FIX IT ShipServ Smart Procurement 2017, ... •AIS Satellite and Terrestrial mapping/tracking •GIS Maritime energy

THANK YOU