get your data ready for ai - nethope · building a data driven culture - nethope webinar v3 (1)...
Post on 21-Jul-2020
2 Views
Preview:
TRANSCRIPT
© Copyright Microsoft Corporation. All rights reserved.
Ria SankarDirector of Program Management, AI for Good Research Lab
Get your Data ready for AI
Types of AI systems
AI amplifies human ingenuity: Balancing interactions between humans and AI
!"#$%&'()**+,'--.&/0#.1%".-2/340/5&,,0,-5#06/0789309:0$,*9+$"/#%*,9;</=<>%%/?>3
Pre-AI
AI Inside
)**+,'--@@@15#,07&,,07,0/&$1%".-3.3A"7936&B394"0%&9$&%"$/078,9)"@9*"93%%&,,9*)&.9C<DE9=)**+,'--@@@137/$"0/%&7*$361%".-8""86&93,,0,*37*
AI First
)**+,'--@@@1*&,631%".-3#*"+06"*)**+'--@@@13#*"."*04&*&%)7"6"80&,1%".-3#*"7"."#,9,&6:9/$040789%3$,
Becoming data ready
Becoming data ready… do you need AI? Becoming data ready… do you need AI?
Becoming data ready… starts with a diverse team
Step 1: Understand your Target Audience
F Circumstance
F Desired Progress
F Definition of Quality
F Barriers
F Workarounds
!"#$%&'()*+*,&-*./"$0
!"#$%&'(1&23"4&/"$0(56$&-*2'(7869
CASE STUDY: Jobs-to-be-done FrameworkSeeing tasks from a customer vs. program context
!"#$%&'#():;&<4(,&(=*1-(23&(*+,%-.')"#%++/, ="$(0&,(,%+/-/+,1'#,+>
2'$&():?*@&(@"<#12&&$A(.(%3.1%&(2"($'##+$%-)*23(,B(<*+&C,*1-&-(4&&$A>
34'%&'#():?*@&(@&2&$.1A(.(1&)(,+#,+-'5-4&,,&'#-(#0-1"/1',+>
Step 2: Define your Problem
F Link to key strategies
F Prioritize learning goals
F Make educated guesses
F Write hypotheses
!"#$%&'(3#DA4"2/1&2
..to find measurable KPIs
2()+,-355+$%&.+#+,,
..to find measurable KPIs
2()+,-355+$%&.+#+,,
..to find measurable KPIs
2()+,-355+$%&.+#+,,
..to find measurable KPIs
The WHY: F Disaster Operations require organized, effective teams
Partner’s goal: F Scale up formation of teams and management of volunteer
deployments
Complexity: F 10,000s of volunteers across the world with various types of
skills, levels of seniority and availability
CASE STUDY: AI for Humanitarian
Action
Problem Statement communicated: 1. Need to validate 100,000s of documents with
volunteer skills2. Need to improve team assignment process is currently
manual and sub-optimal
Algorithm to match people to tasks
VOLUNTEERS
EXPERTISE
LOCATION
AVAILABILITY
SKILLS
LOCATION
DATES
OPERATION
!+&%036(*)37G,'(H3)#6(I"/)03
!+&%036(*)37G,'(H3)#6(I"/)03
Algorithm to create teams
Seniority
Deployment History
Notification and Scheduling
Algorithm to match people to tasks
!+&%036(*)37G,'(H3)#6(I"/)03
Step 3: Prepare your Data
F Where?
F How?
F How good?
F When?
F What?
E#-*&1%&
F.2.G#A*1&AA(H$"D<&,
AI
You might need more data:
F To reduce bias & noise
F Across categories
F To find new segments
F
F
F
Data preparation is essential for AI systems
How can you help?
1. Provide a data dictionary
2. Remember 5Cs of high quality data
F Correct
F Conforms
F Current
F Consistent
F Consolidated!"#$%&'()**+'--?15+156"8,+"*1%".-
Data scientists spend a staggering 70% of their time on data preparationData scientists spend a staggering 70% of their time on data preparation
!+&%036(*)37G,'(J$0,*07(K"66&
Data Preparation Stages: 1. Identify your (diverse) team across marketing, legal, privacy, data
science, business development – collaborate!2.Build a Data Dictionary for internal data 3.Run a privacy / legal review 4.Identify public or partner datasets needed to supplement dictionary5.Map data to problem statements defined in Step 2 – focus ONLY on the
data you need 6.3Rs: Is your dataset reliable, repeatable, reproducible? 7. Analyze data issues: Gaps/Missing data, Duplicates, Null values, Joins,
Long tail of distribution (if unsure, share sample dataset )
CASE STUDY: AI for Humanitarian
Action
Problem Statement: Build a recommendation algorithm to match new sponsors with beneficiaries
Graphs can help you find issues in your data -101
MAP CHARTMAP CHART
CASE STUDY: AI for Humanitarian
Action
I1A#$&(D.<.1%&-(-.2.(.%$"AA(%.2&0"$*&A
Step 4: Design your AI solution
F Art, not science
F Iterative process
F Remember DISCF Details
F Insights
F Simple
F Consistent
4 main types of models
)**+,'--/&4A#.1%".-@+9%"7*&7*-#+6"3/,-C<D=-<=-5&,*9.3%)07&96&3$70789%)&3*9,)&&*,>1L+8
)**+,'--3A#$&1.0%$",":*1%".-&79#,-56"8-377"#7%07893#*".3*&/9.69%3+35060*M90793A#$&9.3%)07&96&3$7078-NOK1.%P0/Q3A#$&9+$&,&7*3*0"7963AA&$0
!"#$%&'$()*%+$(,&)**+,'--/"%,1.0%$",":*1%".-&79#,-3A#$&-.3%)07&96&3$7078-,*#/0"-.&/03-368"$0*).9%)&3*9,)&&*-.3%)07&96&3$70789368"$0*).9%)&3*9,)&&*9,.366P4P<PR9<D1+78
Factors influencing model selection:
F Supervised vs. Unsupervised
F Sample size
F Predicting categories vs. values
J3.2(,"-&<()*<<(B"#(%3""A&K(F 6<.AA*=B*10(%3&&A&(*12"(G$*&L(M&N*%.1L(H.$,&A.1L(M"OO.$&<<.
J3.2(,"-&<()*<<(B"#(%3""A&K(
F E1.<BO*10()&.23&$(4.22&$1A(2"(#1%"@&$(2$&1-A(2"(4$&-*%2(*=((-.B()*<<(D&(:!#11B>L(:6<"#-B>L(:8.*1B>L("$(:J*1-B>
J3.2(,"-&<()*<<(B"#(%3""A&K(
F E1.<BO*10()&.23&$(4.22&$1A(2"(PQ8I6E!R(23&(2&,4&$.2#$&(="$(23&(1&N2(ST(-.BA
J3.2(,"-&<()*<<(B"#(%3""A&K(
F R"(=*1-(.1",.<*&A(*1(B"#$(-.2.A&2K
F P"$(A4.,L(=$.#-(=*<2$.2*"1K(
J3.2(,"-&<()*<<(B"#(%3""A&K(
F R"(=*1-(.1",.<*&A(*1(B"#$(-.2.A&2K
F P"$(A4.,L(=$.#-(=*<2$.2*"1K(
A 4 step process to get your Data ready for AI
Understand your Target Audience
Define your Problem
Prepare your Data
Design your AI solution
Lessons learned with Data/AI/ML+ the importance of bias and ethics
Terrible news forleft handed people
U In 1991, Halpern and Coren of California State University at San Bernardino and University of British Columbia [8]
U Random sample of people that died. Asked their family if they were left handed
U They concluded that left-handed people die 9 years younger…
U Study was published in The New England Journal of Medicine, a peer-reviewed medical journal published by the Massachusetts Medical Society and it is among the most prestigious in the world
U It was also cited in The New York Times [9]
If this were true, being left-handed = smoking 120 cigarettes a day
[8] Psychol Bull. 1991 Jan;109(1):90-106.,Left-handedness: a marker for decreased survival fitness. Coren S1, Halpern DF.[9] http://www.nytimes.com/1991/04/04/us/being-left-handed-may-be-dangerous-to-life-study-says.html
Terrible news forleft handed people
U In 1991, Halpern and Coren of California State University at San Bernardino and University of British Columbia [8]
U Random sample of people that died. Asked their family if they were left handed
U They concluded that left-handed people die 9 years younger…
U Study was published in The New England Journal of Medicine, a peerpeer-reviewed medical journal published by the Massachusetts Medical Society and it is among the most prestigious in the world
U It was also cited in The New York Times [9]
[8] Psychol Bull. 1991 Jan;109(1):90-106.,Left-handedness: a marker for decreased survival fitness. Coren S1, Halpern DF.[9] http://www.nytimes.com/1991/04/04/us/being-left-handed-may-be-dangerous-to-life-study-says.html !+&%036(*)37G,'(S#37(T340,*3
Be wary of bias Lessons learned in AI/ML
What's the problem with the study?
Study assumed that the % of left-handed people over time was steady.
Population, even though random, is biased against left-handed people.[10]
[7] http://en.wikipedia.org/wiki/Handedness
Be wary of bias Lessons learned in AI/ML
What's the problem with the study?
Study assumed that the % of left-handed people over time was steady.
Population, even though random, is biased against left-handed people.[10]
[7] http://en.wikipedia.org/wiki/Handedness
!+&%036(*)37G,'(S#37(T340,*3
;*A2"$*%.<(D&3.@*"$(%.1(D&(.(D.-(*1=<#&1%&V&AA"1A(<&.$1&-(*1(EWXMV
!+&%036(*)37G,'(S#37(T340,*3
Correlation does not imply causationLessons learned in AI/ML
!+&%036(*)37G,'(S#37(T340,*3
Unintended consequences - the Cobra Effect Lessons learned in AI/ML
D1 !&*(07(*)&(*0.&,(":(U$0*0,)(V7/031C1 W"4&$7.&7*("::&$&/(3($&@3$/(:"$(&3%)(/&3/(%"5$31>1 X7*$&+$&7&#$,(5&837(*"($30,&(%"5$3,(:"$(*)&($&@3$/(5M(
/&604&$078(*)&.(/&3/
!"#$%&'()*+*,&-*./(0$&.1*2&(0",,"34
!+&%036(*)37G,'(S#37(T340,*3
8&%",,&1-.2*"1(YS'(ZA&(PERI(H$*1%*4<&A(.A(.(=$.,&)"$+(="$(2$#A2
51164'778889,*%$"4":19%",7&3;#47.*7"#$;.66$".%5;1";.*
!"#$%&'(0$&.1*2&(0",,"34
F 6",4$&3&1A*@&(2&A2(%.A&A
F F.2.(A2$.2*=*%.2*"1(
F F*@&$A&()"$+="$%&([.@"*-(2&%3(
D$"(EW\(
F Z1%"1A%*"#A(D*.A(] $&@*&)(
,"-&<("#24#2A(="$(%"$$&<.2*"1A(
2"($.%&(.1-(0&1-&$
8&%",,&1-.2*"1(Y^'(R.+&(%"1%$&2&(A2&4A(*1(B"#$(-.2.A&2(2"(4$&@&12(D*.A
!"#$%&'(0$&.1*2&(0",,"34/($&49%<"#-*3.$=9%",
F 6",4$&3&1A*@&(2&A2(%.A&A
F F.2.(A2$.2*=*%.2*"1(
F F*@&$A&()"$+="$%&([.@"*-(2&%3(
D$"(EW\(
F Z1%"1A%*"#A(D*.A(] $&@*&)(
,"-&<("#24#2A(="$(%"$$&<.2*"1A(
2"($.%&(.1-(0&1-&$
8&%",,&1-.2*"1(Y^'(R.+&(%"1%$&2&(A2&4A(*1(B"#$(-.2.A&2(2"(4$&@&12(D*.A
Even with these challenges, the power of data is real..
© Copyright Microsoft Corporation. All rights reserved.
Thank you!
top related