development of dynamic census -...

14
Development of Dynamic Census: Estimating demographics and trajectories of actual populations in Bangladesh using CDR data University of Tokyo Shibasaki & Sekimoto Lab. Dynamic Census Development Team Ayumi Arai* Apichon Witayangkurn Hiroshi Kanasugi Zipei Fan Ryosuke Shibasaki What is CDR data? Localization Trajectory How the data look like Starting time of calls Location of antenna CDR data can provide partial views of large- scale human mobility and distribution Call Detail Records = CDR data

Upload: phamliem

Post on 11-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

Development of Dynamic Census:�Estimating demographics and trajectories of actual populations in Bangladesh using CDR data �

University of Tokyo Shibasaki & Sekimoto Lab.

Dynamic Census Development Team Ayumi Arai*

Apichon Witayangkurn Hiroshi Kanasugi

Zipei Fan Ryosuke Shibasaki

��

What is CDR data? �

��

Localization Trajectory

How the data look like Starting time of calls

Location of antenna

CDR data can provide partial views of large-scale human mobility and distribution

Call Detail Records = CDR data

Page 2: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

��

Motivation �¡  Popula'onsta's'csareimportantforac'vi'esbothinprivateandpublicsectors.Butaretheseenoughforunderstandinghumanac'vity?

¡  CDRdataareusefulforunderstandinghumanmobility.But..¡  Interpreta'onofanalysisresultsmaybemisleadingifCDRscan

representlimitedpartofsociety(James&Versteeg,2007;Tatem&Smith,2010)

¡  Difficulttoexaminetheimpactofrepresenta'vebiaswithoutknowingwhichpartofsocietyCDRsdepict(Wesolowskietal.,2013).

Canwedevelophumantrajectorydata,whicharelabeledwithdemographicaSributesandrepresentactualpopula'onsusingCDR?

��

Advantages and challenges of CDR data �¡  Advantages

1.  Poten'allyhighpopula'oncoverage2.  Nearreal-'mehumanmobility3.  Rou'nelycollectedbythemobilenetworkoperator(MNO)

¡  Challenges

1.  Recordedatirregularintervals2.  Spa'alresolu'ondependingoncellantennaloca'ons3.  Anonymized4.  Representa'venessbias

A novel data set “Dynamic Census” is developed by addressing these challenges

Page 3: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

��

What is “Dynamic Census”? �

¡  Trajectoriesanddemographicsofactualpopula3oninareascoveredbyCDRdata

¡  Griddedpopula3onsta3s3csonkeydemographicaSributesathourlybasis,e.g.workingmale,housewife,student,andother

Working males’ 24 hour population distribution in Dhaka

CDRs �Census � Dynamic Census�x�

y �

Years �

x�

y �

x�

y �

Hours� Hours�

T0 �

T0+5 �

t0 �

t0+1 �

t0+2

t0+3 �

t0 �

t0+3 �

Page 4: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

Impacts and Uniqueness of Dynamic Census �

CancaptureBOPwhichhasnon-marginalimpactsoneconomy.Difficult-to-reachpopula'onforfieldsurveycanbealsocaptured. �

95% World’s cellular

network coverage�

Applicableanywherecoveredwithcellularnetworks �

�1% Cost necessary for

developing Dynamic Census�

40% Those who belong to Base of Pyramid �

Timeandfinancialcostsaremuchlowerthanconduc'ngconven'onalcensus�

CDR data �

Interpola3on

Spa3aldisaggrega3on&routeinterpola3on

Es3ma3onoftheunobservablepopula3on

How to address challenges in CDR data�

Dynamic Census�

Loca3onlabeling

Irregularrecordinterval

Non-uniformresolu3on

Anonymized

Challenges in CDR data

DemographicaDributees3ma3on

Representa3veness

① �

④ �

② �

③ �

Fieldsurveydata�mobilephoneusers�

Fieldsurveydata&buildingdata

(users&non-users)

Supplement data

Buildingmapdata&Roadnetworkdata

Page 5: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

��

CDR data �

Interpola3on

Spa3aldisaggrega3on&routeinterpola3on

Es3ma3onoftheunobservablepopula3on

Dynamic Census�

Loca3onlabeling

Irregularrecordinterval

Non-uniformresolu3on

Anonymized

Challenges in CDR data

DemographicaDributees3ma3on

Representa3veness

① �

④ �

② �

③ �

Fieldsurveydata�mobilephoneusers�

Fieldsurveydata&buildingdata

(users&non-users)

Supplement data

Buildingmapdata&Roadnetworkdata

���

1. Irregular record interval�

Home

Office

Calledat8:40am

Calledat3:15pm

Interpolate CDRs based on the routine observed from longer-term data

H

W

Time� Place�8:00~8:59 � H �9:00~9:59 � W�

10:00~10:59 � W�11:00~11:59 � W�12:00~12:59 � W�13:00~13:59 � W�14:00~14:59 � W�15:00~15:59 � W�

Interpola3on

Time� Place�8:00~8:59 � H �9:00~9:59 �

10:00~10:59 �

11:00~11:59 �

12:00~12:59 �

13:00~13:59 �

14:00~14:59 �

15:00~15:59 � W�

CDR data Actual behavior

8:00 Departure�

8:45 Arrival�

Noinforma'oninCDRdata

Page 6: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

Interpolation �¡  Extracting routine patterns

¡  Atopicmodelisemployed

¡  Rou'nepaSernisexpressedastheprobabilitydistribu'onofkeyloca'ons(Home,Work,andOther)

¡  Spatiotemporal interpolation

¡  HiddenMarkovModelisemployed('mingoftransi'onisiden'fied)

Topic model Hidden Markov Model

Collaborative filtering approach

+

���

CDR data �

Interpola3on

Spa3aldisaggrega3on&routeinterpola3on

Es3ma3onoftheunobservablepopula3on

Dynamic Census�

Loca3onlabeling

Irregularrecordinterval

Non-uniformresolu3on

Anonymized

Challenges in CDR data

DemographicaDributees3ma3on

Representa3veness

① �

④ �

② �

③ �

Fieldsurveydata�mobilephoneusers�

Fieldsurveydata&buildingdata

(users&non-users)

Supplement data

Buildingmapdata&Roadnetworkdata

Page 7: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

2. No demographic attribute info�

Popula'onsinCDRsdonotalwaysrepresentthepopula'onunderstudy

Canspecifythepopula'onunderstudy

DemographicaDributesarees3mated

Target population group�

���

Demographic attribute estimation�¡  Approach

¡  RandomForestisemployedforbuildinganes'ma'onmodel

¡  One-month-call-recordsfrom58volunteersareusedastrainingdata

¡  One-day-call-recordsfrom922mobilephoneusersareusedforexaminingrela'onshipbetweencallingbehavioranddemographicaSributes

¡  Estimated features

¡  Workingmale,housewife,student,andother

¡  Incomelevel(individual)andAgegroup(-20/21-35/36-60/61-)←Resultstobeimproved

Class � Accuracy � Precision� Recall�Working male� 0.79� 0.63� 0.70�

Housewife� 0.67� 0.47� 0.82�

Student� 0.89� 0.40� 0.22�

Other� 0.63� 0.20� 0.07�

Estimation results

Page 8: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

Calling behavior survey to relate demographic attributes and CDR data�¡  Purpose

¡  Relatecallingbehavior(callrecords)anddemographicaSributes¡  Surveyed area and population

¡  15Wardsarechosenbasedonlanduse.ForeachWard,18HHseacharechosenfrom3incomegroupsinGreaterDhaka(Two-stagestra'fiedsampling)

¡  Allmembersareinterviewed¡  InterviewedondemographicaSribute,travel-ac'vity,andmobilephoneuse

¡  Key of this survey ¡  Incomelevelisdeterminedbasedonthetypeofbuildings

Interviewataslumhousehold �

Interviewatahighincomehousehold �

���

CDR data �

Interpola3on

Spa3aldisaggrega3on&routeinterpola3on

Es3ma3onoftheunobservablepopula3on

Dynamic Census�

Loca3onlabeling

Irregularrecordinterval

Non-uniformresolu3on

Anonymized

Challenges in CDR data

DemographicaDributees3ma3on

Representa3veness

① �

④ �

② �

③ �

Fieldsurveydata�mobilephoneusers�

Fieldsurveydata&buildingdata

(users&non-users)

Supplement data

Buildingmapdata&Roadnetworkdata

Page 9: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

��

3. Non-uniform spatial resolution�

Spatial disaggregation

Loca'oninCDRdataisatantennalevel

Disaggregatedbasedonthedistribu'onofbuildings

Disaggregated

��

Stay point reallocation �¡  Modifying spatial resolution

¡  Stay points are reallocated to building POIs ¡  Antennabasisloca'onsarereallocatedtobuildingPOIswithinvoronoi¡  Eachvoronoicellisconsideredtobeanareacoveredbyanantenna

¡  Allocation probability is based on the area size of building

¡  Types of buildings are used as the proxy of the income level

��2016.03.11�

Distribution of POIs�

Stay point (antenna) �

Voronoi cell generated from antenna location �

POIs within voronoi�

Page 10: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

CDR data �

Interpola3on

Spa3aldisaggrega3on&routeinterpola3on

Es3ma3onoftheunobservablepopula3on

Dynamic Census�

Loca3onlabeling

Irregularrecordinterval

Non-uniformresolu3on

Anonymized

Challenges in CDR data

DemographicaDributees3ma3on

Representa3veness

① �

④ �

② �

③ �

Fieldsurveydata�mobilephoneusers�

Fieldsurveydata&buildingdata

(users&non-users)

Supplement data

Buildingmapdata&Roadnetworkdata

���

4. Representativeness�

mobile users �

Populations in CDR data

Suppose you have CDRs from Dhaka … �

Those who are not

included in CDRs �

Unobservable in CDR data

Entire population in Dhaka

Scalingfactorsarecomputedtoes'mate

thispart�

Page 11: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

Entire living populations �

(B) People in HHs which do NOT include any GP users Unobservables�

(A) People in HHs which include GP users GP users + unobservables�

Understanding population covered by CDRs on household basis

���

Estimation of the unobservable �¡  Scaling factor

¡  Approx.householdnumberiscalculatedbasedonthenumberofbuildings

¡  ScalingfactoriscomputedfromthetypicalHHstructure,obtainedthroughfieldsurvey

Real population of areas covered by CDR data Non users �

Users�

Non users (Not appear in CDR

data)�

Users�

Non users �

User

x scaling factor �

(a) HHs including users

User

x scaling factor �

(b) HHs consisting of no users alone User

x (scaling factor �+�)

Page 12: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

Purpose •  Inves'gatethepopula'onstructureforeachincomelevel•  Obtaindatatocalculatescalingfactorstocomputethenumberof

popula'onsfromthedistribu'onofbuildingbyincomelevel

Surveyed Voronoi area

Surveyed area and population •  En'repopula'onsinaVoronoicellwere

surveyedinDecember2014•  2,839HHsconsis'ngof11,521people

from366buildings(outof367buildings)

Key of SCC •  Incomelevelisdeterminedbasedonthe

typeofbuildings

Small-scale census survey (SSC) to see population structure for each income level

Average HH structures obtained from survey

HHs not including GP users HHs including GP users

Page 13: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

���

Type of building and income level

Contents of the map data •  Approx.650,000buildings(withthetypeofbuildings)•  Residen'albuildingsareclassifiedintofourgroupsbytheheightof

buildings

Sample of the map

Criteria of the type of buildings •  High(Sevenormorestories)•  Middle(Morethantwostories)•  Low(Onetotwostories)•  Slum(Onestory)

:High

:Middle

:Low

:Slum

Legend of the type of building

���

Future work �

CDR data �

Dynamic Census�

Spa3otemporalinterpola3on

DemographicaDributees3ma3on

Es3ma3onoftheunobservablepopula3on

Staypointextrac3on&loca3onlabeling

Spa3aldisaggrega3on&routeinterpola3on

Willbeimprovedwiththeuseofsmartphones

Resultsdependondataquality→Compara3vestudies

Buildingmapdataiscostly→Satelliteimageprocessing

Varietyinlife-styles→Compara3vestudies

Page 14: Development of Dynamic Census - LIRNEasialirneasia.net/wp-content/uploads/2016/07/DynamicCensus_13July2016.… · Development of Dynamic Census: ... hourly basis, e.g. working male,

Anyques'ons/sugges'[email protected]