Download - Wrangling Messy Data - A True Story
Public
Jason Cao/SAP Digital Experience Marketing – Las VegasElizabeth Imm/SAP Market Introduction Services – Berlin
EA104 – Wrangling Messy DataA True Story
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 2Public
Disclaimer
This presentation outlines our general product direction and should not be relied on in making apurchase decision. This presentation is not subject to your license agreement or any other agreementwith SAP. SAP has no obligation to pursue any course of business outlined in this presentation or todevelop or release any functionality mentioned in this presentation. This presentation and SAP'sstrategy and possible future developments are subject to change and may be changed by SAP at anytime for any reason without notice. This document is provided without a warranty of any kind, eitherexpress or implied, including but not limited to, the implied warranties of merchantability, fitness for aparticular purpose, or non-infringement. SAP assumes no responsibility for errors or omissions in thisdocument, except if such damages were caused by SAP intentionally or grossly negligent.
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 3Public
Agenda
The Setting – The SAP Community Network (SCN)
The Players – Understanding Roles in Data Analysis
The Plot – Introducing Gamification on SCNThe goals for gamifying SCN
The Conflict – Messy data & other data problems
The Resolution – Approaching the problem3 SAP Lumira Use-Cases with SCN Gamification Data
The End – “And they lived happily ever after”…or do they?What’s next for SCN’s Gamification team?
The SAP Community Network
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 5Public
About SAP Community Network (SCN)A High-Tech, Professional Community
A 11-year old, mature community: Open to allNeed to encourage quality contributions
A place to grow reputation: Contributions showcase expertiseSerious game
Gamification on SCN
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 7Public
What is Gamification: Definition
Gamification is the use of game-thinking and game mechanics in anon-game context in order toengage users and solve problems.Gamification is used inapplications and processes toimprove user engagement, ROI,data quality, timeliness, andlearning.
Source: Wikipedia
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 8Public
Why Introduce Advanced Game Mechanics?
BoostParticipation
BuildReputation
Inject Fun
DriveBehaviors
AddressChallenges
with OldSystem
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 9Public
The SCN ExperienceGoal: Boost Participation
Registration:
Missions and badges forfirst-time login and
onboarding
Contributions:
Missions and badgesfor first contributions
and more
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 10Public
The SCN ExperienceGoal: Drive Behaviors
Ethics:
Requirements inmissions to read the
rules of engagement andhow to search
Feedback:
Pay It Forwardmission to reward
positive engagement
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 11Public
The SCN ExperienceGoal: Build Reputation
Quality:
Missions that require notonly content creation butreceiving good feedback
on the content
Quality:
Prerequisite of a certainlevel before you canearn some missions
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 12Public
The SCN ExperienceGoal: Inject Fun
Hidden missions,unexpectedrecognition
SCN 10-yearanniversary
Data Geek Challenge
Messy Data & Other DataProblems
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 14Public
Mission Completion (Badges Awarded)
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 15Public
Member Activity Comparison (Total Action Count)
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 16Public
Increase in Active Users (Those who logged an action)
0
5000
10000
15000
20000
25000
30000
35000
40000
APR-28-2013 MAY-5-2013 MAY-12-2013 MAY-19-2013 MAY-26-2013
Active Users(Logged an Action)
Active Users
WK 1 19,445
WK 2 21,728
WK 3 22,623
WK 4 36,084
143%
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 17Public
The Plot Thickens…
Duplicate data
Senseless data
Disparate data
Corrupted data
Roles in Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 19Public
People of Many Hats
DECISION MAKERS ANALYSTS DESIGNERSLeverage Knowledge Activate Data Build Data
Assets
Approaching the Problem
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 21Public
Whiteboarding for our Proof of Concept
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 22Public
Result of Initial Data Cleansing
Total100MRows
3M Clean Rows
Sensible data…
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 23Public
DataAcquisition Prepare Room (and Object Picker)
Visualize/ComposeRoom
PredictRoom
ShareRoom
Seven Common Data Analysis Tasks
Data analyst workflows often consists of seven high-level activity groups:
• Iterative activities, not linear
• Not all equal weight/time
Find
Wrangle
Profile
Enhance
Visualize
Predict
Share
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 24Public
Seven High-level Data Analysis Tasks
Find – Examples:
• Data Connectivity / Opening Files
• Knowing who to ask
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 25Public
Seven High-level Data Analysis Tasks
Wrangle – Examples:
• Merge datasets
• Clean data
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 26Public
Seven High-level Data Analysis Tasks
Profile:
• Comparing data (e.g. spelling)
• Leading spaces or zero's
• Structured and unstructured data
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 27Public
Enhance:
• Add/Remove columns
• Hierarchies and calculations
• Geography and time
Seven High-level Data Analysis Tasks
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 28Public
Seven High-level Data Analysis Tasks
Visualize:
• Creating charts and visualizations
• (Note: a table is considered a visualization as well)
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 29Public
Seven High-level Data Analysis Tasks
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 30Public
Seven High-level Data Analysis Tasks
Share:• Cloud• Server• Email (visualization and dataset)
Data Analysis
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 31Public
Sharing with Lumira Cloud and Lumira Server
Desktop
IT
SAP LumiraServer
MobileWeb
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 32Public
3 Data Workflows for SAP Lumira Server
Lumira Datasets(real-time or static)
Lumira Stories
SAPLumira
SAP HANAStudio
Use Case 1
How are missions influencing community adoption?
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 34Public
Playing Around With Visualizations
Too many segments to be useful! Interesting visualization, but irrelevant!
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 35Public
Data Preparation Basics
Replace Filter
Rank
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 36Public
Insight to Help Us Focus on What Matters
Email remindersadd limited valueto onboarding.
Stop mass emails,& focus on closingonboarding gap.
Better use ofresources to focuson activatingmembers.
Mass emailcampaign
Use Case 2
What’s the best day to launch missions on SCN?
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 38Public
Data Preparation Basics
Group Replace
39© 2014 SAP SE or an SAP affiliate company. All rights reserved.
40© 2014 SAP SE or an SAP affiliate company. All rights reserved.
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 41Public
Increasing Mission Design Maturity
Design + planningcan influencemission success.
Revise launchpractice to includetiming andpromotion.
More maturelaunch plan, withgreater communityawareness.
Use Case 3
Can we reduce bad behavior?
~ quoted from SCN discussion thread
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 43Public
Observations | 9 Months After Launch
Current Challenges Opportunities
Increased point cheating and plagiarism(perceived as disproportionate to activityincrease).
Dissatisfaction of loyal, well-establishedmembers and newer members alike.
High operational effort needed to address theseissues.
Strengthen quality requirements in badgedmission.
Further de-emphasize quantity and points infavor of increased quality and meaningfulengagement.
Think of a transforming approach in 2 steps.
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 44Public
Data Preparation Basics
Group Hierarchy and Filter
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 45Public
Turning Data Into Insight
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 46Public
Feedback from Moderators
“Wow! These features really did wonders.Most of the people we reported from SDspace are gone, that itself indicates theimpact.” (Jyoti Prakash)
“I especially think the removal of pointsfrom Likes has eliminated about 2/3 of thepoint games.” (Michael Appleby)78% of Moderators who responded
to our poll report that changeshad a positive impact
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 47Public
Creating Awareness for Community Challenges
We can influencespecific memberbehaviors.
Create missionsthat harnessgaming energy todesired behaviors.
Greater awarenessof communitytopics. Goodbehaviors.
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 48Public
Conclusion
Focus on your goal- cut out the rest.
The more weknow, the more
questions we have.
Start with the endin mind.
Epilogue
50© 2014 SAP SE or an SAP affiliate company. All rights reserved.
Get intotheGame!
while (true)
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 50Public
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 51Public
SAP d-code Virtual Hands-on Workshops and SAP d-code OnlineContinue your SAP d-code education after the event!
SAP d-code OnlineAccess replays of keynotes, Demo Jam, SAP d-codelive interviews, select lecture sessions, and more!Hands-on replays
http://sapdcode.com/online
SAP d-code Virtual Hands-on WorkshopsAccess hands-on workshops post-eventStarting January 2015Complementary with your SAP d-code registration
http://sapdcodehandson.sap.com
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 52Public
Further Information
SAP Education and Certification Opportunitieswww.sap.com/education
Watch SAP d-code Onlinewww.sapcode.com/online
SAP Public Webscn.sap.com/community/lumirawww.saplumira.comwww.sap.com/LearnBI
53© 2014 SAP SE or an SAP affiliate company. All rights reserved.
FeedbackPlease complete your session evaluation for
<session EA104>.
Jason Cao {[email protected]}Follow me on Twitter {@JayChaos}
Thanks for attending this SAP TechEd && d-code session.© 2014 SAP SE or an SAP affiliate company. All rights reserved. 53Public
© 2014 SAP SE or an SAP affiliate company. All rights reserved. 54Public
© 2014 SAP SE or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an
SAP affiliate company.
SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE(or an SAP affiliate company) in Germany and other countries. Please see http://global12.sap.com/corporate-en/legal/copyright/index.epx for additional trademarkinformation and notices.
Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors.
National product specifications may vary.
These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or itsaffiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE orSAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothingherein should be construed as constituting an additional warranty.
In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop orrelease any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible futuredevelopments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time forany reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to placeundue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.