data visualization for social problems

43
DATA VISUALIZATION FOR SOCIAL PROBLEMS S Anand, Chief Data Scientist, Gramener

Upload: gramener

Post on 14-Jul-2015

4.161 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data visualization for social problems

DATA VISUALIZATION

FOR SOCIAL PROBLEMS

S Anand, Chief Data Scientist, Gramener

Page 2: Data visualization for social problems
Page 3: Data visualization for social problems
Page 4: Data visualization for social problems

Most discussions of decision-making assume that only senior executives make decisions or that only senior executives’ decisions matter. This is a dangerous mistake…

Peter F Drucker

Data generation and analysis are not sufficient.

Consuming it as a team and acting in cohesion is.

Page 5: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Low effort High effort

High effort

Low effort

Creator

Consumer

THERE ARE MANY WAYS TO AID DATA CONSUMPTION

Page 6: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 7: Data visualization for social problems
Page 8: Data visualization for social problems
Page 9: Data visualization for social problems
Page 10: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 11: Data visualization for social problems

EDUCATION

PREDICTING MARKS

What determines a child’s marks?

Do girls score better than boys?

Does the choice of subject matter?

Does the medium of instruction matter?

Does community or religion matter?

Does their birthday matter?

Does the first letter of their name matter?

Page 12: Data visualization for social problems

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

TN CLASS X: ENGLISH

Page 13: Data visualization for social problems

TN CLASS X: SOCIAL SCIENCE

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Page 14: Data visualization for social problems

TN CLASS X: MATHEMATICS

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Page 15: Data visualization for social problems
Page 16: Data visualization for social problems
Page 17: Data visualization for social problems

DETECTING FRAUD

“We know meter readings are incorrect, for various reasons.

We don’t, however, have the concrete proof we need to start the process of meter reading automation.

Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.

ENERGY UTILITY

Page 18: Data visualization for social problems

This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of

readings are aligned with the tariff slab boundaries.

This clearly shows collusion of some form with the customers.

Apr-10 May-10Jun-10Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11

217 219 200 200 200 200 200 200 200 350 200 200

250 200 200 200 201 200 200 200 250 200 200 150

250 150 150 200 200 200 200 200 200 200 200 150

150 200 200 200 200 200 200 200 200 200 200 50

200 200 200 150 180 150 50 100 50 70 100 100

100 100 100 100 100 100 100 100 100 100 110 100

100 150 123 123 50 100 50 100 100 100 100 100

0 111 100 100 100 100 100 100 100 100 50 50

0 100 27 100 50 100 100 100 100 100 70 100

1 1 1 100 99 50 100 100 100 100 100 100

This happens with specific customers, not randomly. Here are such customers’ meter readings.

Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11

Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%

Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%

Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%

Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%

Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%

Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%

Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%

Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%

Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%

If we define the “extent of fraud” as the percentage excess of the 100 unitmeter reading, the value varies considerably across sections, and time

New section manager arrives

… and is transferred out

… with some explainable anomalies.

Why would these happen?

Page 19: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

… to inform and to entertain

Page 20: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 21: Data visualization for social problems
Page 22: Data visualization for social problems

Jain

Harini

Shweta

Sneha Pooja

Ashwin

Shah

Deepti

Sanjana

Varshini

Ezhumalai

Venkatesan

Silambarasan

Pandiyan

Kumaresan

Manikandan

Thirupathi

Agarwal

Kumar

Priya

Page 23: Data visualization for social problems

Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years, it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200.

June bornsscore the lowest

The marks shoot up for Aug borns

… and peaks for Sep-borns

120 marks out of 1200 explainable by month of birth

An identical pattern was observed in 2009 and 2010…

… and across districts, gender, subjects, and class X & XII.

“It’s simply that in Canada the eligibility cutoff for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year—and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.”

-- Malcolm Gladwell, Outliers

Page 24: Data visualization for social problems

LET’S LOOK AT 15 YEARS OF US BIRTH DATA

This is a dataset (1975 – 1990) that has

been around for several years, and has

been studied extensively. Yet, a

visualization can reveal patterns that

are neither obvious nor well known.

For example,

• Are birthdays uniformly distributed?

• Do doctors or parents exercise the C-section option to move dates?

• Is there any day of the month that has unusually high or low births?

• Are there any months with relatively high or low births?

Very high births in September.

But this is fairly well known.

Most conceptions happen during

the winter holiday season

Relatively few births during the

Christmas and Thanksgiving

holidays, as well as New Year and

Independence Day.

Most people prefer not

to have children on the

13th of any month, given

that it’s an unlucky day

Some special days like April

Fool’s day are avoided, but

Valentine’s Day is quite

popular

More births Fewer births … on average, for each day of the year (from 1975 to 1990)

Page 25: Data visualization for social problems

THE PATTERN IN INDIA IS QUITE DIFFERENTThis is a birth date dataset that’s

obtained from school admission data

for over 10 million children. When we

compare this with births in the US, we

see none of the same patterns.

For example,

• Is there an aversion to the 13th or is there a local cultural nuance?

• Are holidays avoided for births?

• Which months have a higher propensity for births, and why?

• Are there any patterns not found in the US data?

Very few children are born in the

month of August, and thereafter.

Most births are concentrated in

the first half of the year

We see a large number of

children born on the 5th, 10th,

15th, 20th and 25th of each month

– that is, round numbered dates

Such round numbered patterns a

typical indication of fraud. Here,

birthdates are brought forward

to aid early school admission

More births Fewer births … on average, for each day of the year (from 2007 to 2013)

Page 26: Data visualization for social problems

THIS ADVERSELY IMPACTS CHILDREN’S MARKS

It’s a well established fact that older

children tend to do better at school in

most activities. Since many children

have had their birth dates brought

forward, these younger children suffer.

The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the

month tend to score lower marks.

• Are holidays avoided for births?

• Which months have a higher propensity for births, and why?

• Are there any patterns not found in the US data?

Higher marks Lower marks … on average, for children born on a given day of the year (from 2007 to 2013)

Children “born” on round numbered days score lower marks on average,due to a higher proportion of younger children

Page 27: Data visualization for social problems

0%

10%

20%

30%

40%

50%

60%

0 2 4 6 8 10 12 14 16 18

# contestants

Win

ner

mar

gin

More contestants did not reduce the winner marginKarnataka, Assembly Elections 2008

Page 28: Data visualization for social problems

0%

10%

20%

30%

40%

50%

60%

0 2 4 6 8 10 12 14 16 18

# contestants

Ru

nn

er-u

p m

argi

n

More contestants did reduce the runner-up marginKarnataka, Assembly Elections 2004

Page 29: Data visualization for social problems

Adult Educat

ion

Adminisrative

Reforms

Agricultura

l Marketing

AgricultureAnimal

Husbandry

Cooperative

Excise

Finance

Fisheries

Fisheries &

Inland

water

transport

Food & Civil

Supplies

Forest

Fuel

Haz & Wakf

Health and

family welfare

Higher Educati

on

Home Horticu

lture

Housing

Information

& Technology

Kannada &

Culture

Labour

Law &

Human Righ

ts

Major & Medium Industri

es

Medical Educatio

n

Medium and

Large Industrie

sMines

& Geolo

gy

Minor Irrigati

on

Muzrai

P.W.D.

Parliamentar

y Affairs

and Human Rights

Planning

Planning

and Statist

ics

Primary and

Secondary Education

Primary Educati

on

Prison

Public

Library

Revenue

Rural Developme

nt and Panchayat

Raj

Rural Wate

r Suppl

y

Rural Water Supply

and Sanitat

ion

Sericulture

Small

Scale Industrie

s

Small Indust

riesSocial Welfar

e

Sugar

Textile

Tourism

Transport

Transportatio

n

Urban Development

Water Resourc

es

Woman & Child

Development

Youth and

Sports

Youth

Service & Spor

ts

BJP focus

JD(S)focus

INC focus

What topics did parties focus on during questions?Karnataka, 2008-2012

Page 30: Data visualization for social problems

P.W.D.

Health and family

welfare

Revenue

Rural Developme

nt and Panchayat

Raj

Social Welfar

e

Urban Development

Water Resour

ces

Minor Irrigati

on

Fuel

Housing

Agriculture

Primary Educati

on

Primary and Secondary Education

Woman & Child

Development

Higher Educati

on

HomeCoope

rative

Forest

Adminisrative

Reforms

Labour

Food & Civil

Supplies

Tourism

Finance

Animal Husbandry

Transportation

Horticulture

Muzrai

Haz & Wakf

TransportMedical

Education

Medium and Large Industries

Excise

Major & Medium Industrie

s

Kannada &

Culture

Textile

Fisheries

Parliamentary Affairs

and Human Rights

Adult Educati

on

Rural Water Supply

and Sanitati

on

Mines &

Geology

Small Industr

ies

Youth and

Sports

Sugar

Planning and Statisti

cs

Agricultural

Marketing

Rural Water Supply

Fisheries &

Inland water transport

Small Scale Industries

Youth

Service & Sport

s

Sericultur

e

Law &

Human

Rights

Prison

Planning

Information

& Technology

Public

Library

What topics did the young & old focus on during questions?Karnataka, 2008-2012

Young Old

Page 31: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

… to connect the dots for your readers

Page 32: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

Page 33: Data visualization for social problems
Page 34: Data visualization for social problems
Page 35: Data visualization for social problems
Page 36: Data visualization for social problems
Page 37: Data visualization for social problems

https://gramener.com/aapdonations

Page 38: Data visualization for social problems
Page 39: Data visualization for social problems
Page 40: Data visualization for social problems

EXPLORING THE MAHABHARATA

How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics?

Can this ‘unstructured data’ be processed to extract analytical insights?

What does sentiment analysis of this tome convey?

Is there a better way to explore relations between characters?

How can closeness of characters be analysed & visualized?

Page 41: Data visualization for social problems

SHOWme what is happening

with the data

EXPLAINto me why it’s

happening

Allow me to

EXPLOREand figure it out

Just

EXPOSEthe data to me

… to allow your users to tell stories

Page 42: Data visualization for social problems

VISUALISATION IS IMPERATIVE FOR

DATA → INSIGHTS → ACTIONSpot the unusual Communicate patterns Simplify decisions

Page 43: Data visualization for social problems

We handle terabyte-size data via non-traditional analytics and visualise it in real-time.

A data analytics and visualisation company

gramener.com

for more examples