exploring the hidden unknown using self-learning text ... · how about if we have a self-learning...
TRANSCRIPT
Exploring the Hidden Unknown Using Self-Learning Text Analytics
‹#›
Current Approach in Data Discovery
• Activities are a form of customer engagement• User training – limited to a few presumably relevant keywords for
searching• Most dashboards provide:
• content reach, fans, retweet, followers, …• Sentiment analysis:
• These metrics disregard context
‹#›
Current Approach
“Without an appropriate query, the analyzed data might be irrelevant and the end results inaccurate” infegy Blog
Search/FilteringNoise/Irrelevancy
Missing Information Data is not relevant/
result is not accurate
additionally, there is about 327,000,000 documents on search tips on Google
‹#›
An Example of Noise[monitoring tweets on first days of TIFF]
‹#›
What’s happening when you define rules & keywords?
What do you see here?
Source: http://nexalogy.com/
Cozy restaurant
Quiet neighborhood
Beach side
Beautiful view
‹#›
Full StoryThe smaller version of the photo was exactly 10% of the overall photograph
Source: http://nexalogy.com/
“What you do not know is far more relevant than what you know.” Nassim Nicholas Taleb
‹#›
Pain Points
• Too much information• Labour Intensive• Time consuming• Noisy data • Missing Information
“The costs, the shortcomings with accuracy, and the time needed to build and refine data dictionaries are frustrating at times.”
Text Analytics 2014, Alta Plana
‹#›
Pain Points
• Too much information• Labour Intensive• Time consuming• Noisy data • Missing Information
“The costs, the shortcomings with accuracy, and the time needed to build and refine data dictionaries are frustrating at times.”
Text Analytics 2014, Alta Plana
We are turning to an unstructured world
‹#›
Unstructured World
• Uber• Airbnb• Facebook• etc
Tom goodwin "Something interesting is happening"
‹#›
Profound Changes in Approaching Data
In the past
Big data era
• People knew what to collect and how to use data before collecting data
• Data was collected in an controlled environment.
• Looking for causes in data analysis approach
• Data is generated and collected without knowing what its purpose will be
• Data is noisy and irrelevant in large scale
• Looking for correlations in data
‹#›
How about if we have a self-learning Text analytics solution
‘Traditional’ Content Analytics
difficult & expensive process for cleansing, training ,curation, classification, categorization
human bias, noise, missing data, incorrect and incomplete results
Traditional Content Analytics
requires no trainingintuitive user interfacesystem is self-learningfind hidden unknownsmore meaningful results
Let’s Change the Paradigm
‹#›
Self-Learning Text Analytics
• Ability to learn based on prior knowledge obtained from text itself• Understand meaning in the context• Automatic discovery of relevancy, correlations and concept• Cold start KB• Root cause analysis
Deep Learning Text
analytics
• No training• No taxonomy, no dictionary is required• Find hidden unknown• Qualitative analysis• No human labeling, Boolean queries
Automatic Discovery
• No preprocessing, cleaning• Robust to noise• Real-time analytics
Self Analytics Service
‹#›
Applied in various segment
Insurance
Eliminate preprocessing, training data
Dramatically reduces
operational expense and time
of delivery
Self service analytics
Customer Experience
Increasing positive customer experience (CX)
Improving the client retention
Recommendation
Qualitative Analysis
Open-ended question and
qualitative surveys
Automatic categorization-
eliminating manual labeling
Enabling in-hand insights
Risk Management/ Public Safety
Early detection of evidence of fraud
Pilot Project
Public Safety
Risk Management
Retail
Category Management
Classification
Customer Intent
Increase sale
‹#›
Pain Points
• Too much information• Labour Intensive• Time consuming• Noisy data • Missing Information
“The costs, the shortcomings with accuracy, and the time needed to build and refine data dictionaries are frustrating at times.”
Text Analytics 2014, Alta Plana
Case Studies
‹#›
Case Study #1[Analysing Social media for the purpose of public safety]
Data source: tweets on April 15, 2013
‹#›
Case Study #1[Analysing Social media for the purpose of public safety]
‹#›
Case Study #2[Analyzing 1M tweets on Chrysler brand]
• Data: 1 Million tweets• Challenges:
– Noise, irrelevancy & volume• Ram (Chrysler brand) vs. RAM (memory) vs. Ram
(name)• Compass (Chrysler brand) vs. Compass(device) vs.
Compass (name of watch/group…)• Core (Chrysler brand) vs. Core(processor) • ….
‹#›
Case Study #2[Analyzing 1M tweets on Chrysler brand]
‹#›
Case study #3[Analyzing Blogs & Forums]
• Source: watchuseek community discussions
‹#›
- discover hidden unknowns- Beyond keyword search- Finding correlations hips between topics- Discovery of the knowledge in the data without using external resources
Finding correlationships between water- resistant, watch and stowa marine
Beyond keyword indexingDiscovery the relationship between Bottadesign , Germany and Swiss heart
Understanding 600t divingstar is a doxawatch even without mentioning the keyword “doxa”
Case study #3[Analyzing Blogs & Forums]
‹#›
Case Study #4[Root cause analysis by self-learning text analytics]
‹#›
Case study #5[2012-2015 US customer Complaints in the Financial Industry]
Finding unknown issues Financial Institutions
‹#›
Case Study #6[Qualitative Survey]
• City Council's Executive Committee requested the City Manager to seek the public's input on the establishment of a casino in Toronto.
Survey Response Form
‹#›
Case Study[Qualitative Survey]
• www.Kaypok.com• Twitter: @KaypokINC
Confidential Information – Not for Distribution
Contact Information
Razieh Niazi, Founder & [email protected]://ca.linkedin.com/pub/razieh-niazi/8/9b8/ba8416-731-3624 (Mobile)