how to use data for good
TRANSCRIPT
#DataTalk How to Use Data for Good
Join our #DataTalk on Thursdays at 5 p.m. ET
This week, we talked with DataKind, Real Impact Analytics, Elissa Redmiles, a Data Science for Social good Summer Fellow at the University of Chicago, Nick Eng, Data Scientist at the Center for Data Science and Public Policy at the University of Chicago, and Kevin Chen, Chief Scientist at the Experian Data Lab.
Check out the resources and tweets from this chat:
ex.pn/dataforgood
What is a “data for good” project?
Real Impact Analytics@RIAnalytics
Data for good is the use of big data to help policymakers and aid workers foster the social public good and maximize impact.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1 ex.pn/datatalk
#DataTalk
Data for good projects use data and data science to help nonprofits better reach their
mission and assist their target audience.
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng ex.pn/datatalk
#DataTalk
Projects that use data to improve society as a whole, rather than any single individual. To be more specific: communities/cities.
DataKind@DataKind
ex.pn/datatalk#DataTalk
Data and subject matter experts working together to use data to address humanitarian
challenges. Collaboration is key!
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
ex.pn/datatalk#DataTalk
Projects that improve social equality by leveraging public and private data.
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng ex.pn/datatalk
#DataTalk
Using data to help the underprivileged, especially for those who might not know
how data can be used as a tool.
What are some favorite examples of how data has been used for good?
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
One of the projects that drew me to @DataSciFellows was the
#NurseFamilyParnership project, which used data science to predict people in need.
ex.pn/datatalk#DataTalkDataKind
@DataKind
Using anonymous mobile location data in aggregated ways to identify mobility patterns
and design better public transportation.
ex.pn/datatalk#DataTalkLitterati
@Litterati
We leverage data to get smarterabout our litter patterns.
ex.pn/datatalk#DataTalk
Our favorite D4G example is the use of telecom data with the Global Pulse UN team in Uganda to detect and tackle food
crises and prioritize actions against poverty. More specifically, we have developed a mapping of income inequality and income
shocks in Africa using changes in pre-paid patterns.
Real Impact Analytics@RIAnalytics
ex.pn/datatalk#DataTalkDataKind
@DataKind
Another neat one from Data Science Bowl: convolutional neural nets to predict ocean health.
CLICK HERE
ex.pn/datatalk#DataTalk
A second powerful example of D4G is the use of telecom mobility data to identify, prevent and treat contagious diseases
such as Ebola, malaria and cholera. We have been able to identify micro-communities as well as mobility patterns. This leads towards identifying key routes to block and assess the
potential impact on the spread of a disease.
Real Impact Analytics@RIAnalytics
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Beyond predictive models and confidential datasets, products like clearstreets.org simplify
our lives using open data.
ex.pn/datatalk#DataTalkDataKind
@DataKind
@DataKindUK volunteers mapped public data to help @SSChospices find children in need of
hospice care.
ex.pn/datatalk#DataTalk
Melissa Correia@melissacorreia
Child welfare agencies are using sophisticated analyses to improve outcomes for kids in foster care.
What challenges do organizations face when working on data
philanthropy projects?
ex.pn/datatalk#DataTalkDataKind
@DataKind
One challenge is defining a clear question upfront for the project that will help an organization
maximize impact.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Organizational culture is very important.Having good data to analyze and resources directed
toward analysis are key.
ex.pn/datatalk#DataTalk
Finding balance between retaining proprietary knowledge on either data or technology and applying
to data for good projects can be hard.
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Implementation! Fancy models or cool visualizations is only step one. Making these tools part of the day-to-day is number two.
We can see 3 types of challenges: (i) design of the tools/apps; (ii) access to data; (iii) align the eco-system.
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
The operational challenge is mostly to repackage research insights to generate real impact on decisions of aid
workers in the field. Many tools are not simple enough for a daily field use or less actionable and have usually not
been designed around an actual worker’s needs.
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
The technical challenge is to be able to connect to relevant data sources, being external data sources (e.g. WHO,
World Bank) or telecom data sources.
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
The legal and regulatory challenge is to syndicate our approach with local regulators and secure the data
handling process, in terms of privacy, anonymization of data or remote access. All data must remain at the telecom operator premises within the country. This last challenge
can be partly address through securing a sustainable eco-system involving all parties.
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
And figuring out what the problem exactly is, and framing it. We don’t always know the domain.
We need your help and feedback.
ex.pn/datatalk#DataTalkIoT Channel
@IoTchannel
Key challenge is to maintain protection of user/client info and data without it being compromised/leaked.
ex.pn/datatalk#DataTalkDataKind
@DataKind
There is also the challenge (and fun) of prepping and cleaning data before you dive in.
What type of data can be used for data for good projects?
ex.pn/datatalk#DataTalkDataKind
@DataKind
Time series, text, audio, geo, etc. We need to make sure privacy is preserved and it
doesn’t promote discrimination.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Many different formats are usable: database data, excel data, csv data are all
easily processable, but text and web data work, too.
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
Telecom data are particularly unique in emerging markets, as they are collected systematically, locally and in real time. These data can be
complemented by 2 data sources: (i) external or public databases, such as occurrences of a specific disease in a specific location;
(ii) additional / ad-hoc data which are collected through a mobile application. The most important limitation is the possibility to identify
back individual people based on the shared insights or tools. This would dramatically undermine the scaling up of Data for Good.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Real good can be done with access to internal data without releasing this data publicly.
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Open data and APIs are a great start.Check out the new CitySDK from the census.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
We focus more on internal vs. external data & complete data, more than formats.
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
And when structured data isn’t available, you can get creative to make your own data
(e.g. scraping websites).
ex.pn/datatalk#DataTalkDataKind
@DataKind
Totally agree with Nick Eng on getting creative with scraping websites or not forgetting about
data sources like satellite imagery.
ex.pn/datatalk#DataTalkDataKind
@DataKind
One example, @kvarshney worked with @Give_Directly using satellite imagery to target
villages in need.
What are some best practices for using data for good?
#DataTalkKevin ChenChief Data Scientist, Experian Data Lab @kevincchen
Garbage in, garbage out. Validate and carefully examine the data.
#DataTalkReal Impact Analytics@RIAnalytics
The best D4G solutions provide action-oriented insights to end-users, which are supported by
science and easily accessible by mobile.
#DataTalkReal Impact Analytics@RIAnalytics
We need to understand the actual needs of the potential users, assess correlation between
available data and possible actions and outcomes and adapt apps and algorithms accordingly.
#DataTalkReal Impact Analytics@RIAnalytics
We need to be able to refresh and operationalize the tools offering a mobile access to insights; we need to technically secure the access to the data and ensure privacy; and we need to be able to measure impact and correct algorithm accordingly. Overall, trust is one of
the key overarching success factors, as it allows to have a smooth decision flow and maximize impact. Therefore, we need to build
strong partnerships with international institutions to ensure global impact and scalability of our actions.
#DataTalkEWD Rozier@PrarieScience
The biggest step for using data for good is finding a committed, involved, partner who
will help transition to practice.
#DataTalkDataKind@DataKind
Love this guide from @engrnroom.Great read on how to practice responsible
development data.
#DataTalkNick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
When doing a project, make sure it’s a constant partnership with your other
stakeholders (e.g. nonprofits).
#DataTalkElissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Talk to SMEs and find the domain knowledge you don’t have. Data is only
have the puzzle.
What are ways to use data for good, while protecting privacy?
#DataTalkex.pn/datatalk
EWD Rozier@PrarieScience
Right now, it’s very ad-hoc; to move forward we need new data privacy solutions, which allow
proofs of privacy preservation.
#DataTalkex.pn/datatalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
@DataSciFellows keeps data secure while letting the code for processing the
data be open source.
#DataTalkex.pn/datatalk
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
Use the data in aggregates (e.g. finding activity patterns of city dwellers using
aggregated mobile phone activity).
#DataTalkex.pn/datatalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
We keep data science code open source so that other nonprofit organizations can use these
resources to process their own data.
#DataTalkex.pn/datatalk
EWD Rozier@PrarieScience
We’ve been working on solutions for homomorphisms for database operations to
create a privacy aware kernel for data science.
#DataTalkex.pn/datatalk
DataKind@DataKind
Shouting out @CrisisTextLine: they provide personalized care to those in crisis via text
messages while protecting privacy.
#DataTalkex.pn/datatalk
EWD Rozier@PrarieScience
The hard part about privacy preserving operations are the current limits on
performable homomorphisms.
#DataTalkex.pn/datatalk
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
Add noise to the data, bucket the values (e.g. age) or use coarser level of info (e.g. zip3
vs zip5) when possible are a few ways.
What type of data philanthropy would you like to see happen?
#DataTalkex.pn/datatalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
We need more public info showcasing the impact of using data science for good.
#DataTalkex.pn/datatalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
And more scalable ways to help nonprofits determine how data can help them.
#DataTalkex.pn/datatalk
I would really like to see projects that use data to help understand, prevent, intervene,
and treat cancers.
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
#DataTalkex.pn/datatalk
EWD Rozier@PrarieScience
Biggest plausible projects I want to see are cities becoming more data savvy like Chicago.
Public access democratizes science.
#DataTalkex.pn/datatalk
Real Impact Analytics@RIAnalytics
We would like to co-design a sustainable, scalable and open ecosystem of mobile anti-poverty apps together with other
developers, NGOs, international agencies and philanthropists. There will be different types of apps required, such as apps supporting NGO’s in disaster relief or sudden outbreak of a contagious disease or apps supporting ministries or public
authorities in their decision-making, optimizing the targeting and impact of public policies. Most of emerging countries lack
data about their populations.
#DataTalkex.pn/datatalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
I also think it’s important for more corporations like Experian and IBM to raise
awareness of #Data4Good projects.
What are ways to support organizations and data scientists
working in data philanthropy?
#DataTalkex.pn/datatalk
Real Impact Analytics@RIAnalytics
Philanthropists can best support D4G by joining the dialogue with app developers and end-users on the public
questions to address. There is a clear need to fund specific apps and an operational platform to ensure that
Data for Good becomes not only a science but foster also operational impact. Securing such platform with a first set of apps will generate spillovers and a positive dynamics among the communities of developers, NGOs and public
institutions.
#DataTalkex.pn/datatalk
EWD Rozier@PrarieScience
Focus on open source tools. I will be controversial: we need to move away from prototyping in python
to a performable ecosystem.
#DataTalkex.pn/datatalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Agree with @PrarieScience. @NSF funding for outcomes based
#DataScience projects is key especially for training data scientists.
#DataTalkex.pn/datatalk
Corporates can provide funding and recognitions to their data scientists to
encourage participation in data philanthropy projects.
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
#DataTalkex.pn/datatalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Maybe start by finding your local #Data4Good community. Strength in numbers!
ex.pn/datatalk#DataTalkDataKind
@DataKind
Funders can play a big role supporting nonprofits to expand the
use of data beyond reporting.
Any final tips for data scientists who want to use data for good?
ex.pn/datatalk#DataTalkReal Impact Analytics
@RIAnalytics
Our main tip is to collaborate, as an operational open ecosystem is critical to realize our shared vision of a healthy and poverty-free
world. Data for Good is at the cross-road of multiple skill sets, such as data sciences, software development, algorithm design, epidemiology, traffic modelling, field work involving poor
communities in emerging markets, telecom regulation. There is no chance one organization could offer these internally. Data for Good
needs to offer both operational tools and scientific insights.
ex.pn/datatalk#DataTalk
Elissa RedmilesData Science for Social Good Summer Fellow at the University of Chicago@eredmil1
Don’t be discouraged by imperfect data!
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Data will always be messy.Especially from nonprofits.
ex.pn/datatalk#DataTalk
Kevin ChenChief Data Scientist, Experian Data Lab @kevincchen
Let the data speak.Interpret the results objectively.
ex.pn/datatalk#DataTalk
Nick EngData Scientist, Center for Data Science at the University of Chicago@nick_eng
Start simple. Simple projects can sometimes make the
biggest impact.
Join our #DataTalk on Twitter on Thursdays at 5 p.m. ET.
experian.com/datatalk