exploring the hidden unknown using self-learning text ... · how about if we have a self-learning...

Exploring the Hidden Unknown Using Self-Learning Text Analytics

‹#›

Current Approach in Data Discovery

• Activities are a form of customer engagement• User training – limited to a few presumably relevant keywords for

searching• Most dashboards provide:

• content reach, fans, retweet, followers, …• Sentiment analysis:

• These metrics disregard context

‹#›

Current Approach

“Without an appropriate query, the analyzed data might be irrelevant and the end results inaccurate” infegy Blog

Search/FilteringNoise/Irrelevancy

Missing Information Data is not relevant/

result is not accurate

additionally, there is about 327,000,000 documents on search tips on Google

‹#›

An Example of Noise[monitoring tweets on first days of TIFF]

‹#›

What’s happening when you define rules & keywords?

What do you see here?

Source: http://nexalogy.com/

Cozy restaurant

Quiet neighborhood

Beach side

Beautiful view

‹#›

Full StoryThe smaller version of the photo was exactly 10% of the overall photograph

Source: http://nexalogy.com/

“What you do not know is far more relevant than what you know.” Nassim Nicholas Taleb

‹#›

Pain Points

• Too much information• Labour Intensive• Time consuming• Noisy data • Missing Information

“The costs, the shortcomings with accuracy, and the time needed to build and refine data dictionaries are frustrating at times.”

Text Analytics 2014, Alta Plana

‹#›

Pain Points




We are turning to an unstructured world

‹#›

Unstructured World

• Uber• Airbnb• Facebook• etc

Tom goodwin "Something interesting is happening"

‹#›

Profound Changes in Approaching Data

In the past

Big data era

• People knew what to collect and how to use data before collecting data

• Data was collected in an controlled environment.

• Looking for causes in data analysis approach

• Data is generated and collected without knowing what its purpose will be

• Data is noisy and irrelevant in large scale

• Looking for correlations in data

‹#›

How about if we have a self-learning Text analytics solution

‘Traditional’ Content Analytics

difficult & expensive process for cleansing, training ,curation, classification, categorization

human bias, noise, missing data, incorrect and incomplete results

Traditional Content Analytics

requires no trainingintuitive user interfacesystem is self-learningfind hidden unknownsmore meaningful results

Let’s Change the Paradigm

‹#›

Self-Learning Text Analytics

• Ability to learn based on prior knowledge obtained from text itself• Understand meaning in the context• Automatic discovery of relevancy, correlations and concept• Cold start KB• Root cause analysis

Deep Learning Text

analytics

• No training• No taxonomy, no dictionary is required• Find hidden unknown• Qualitative analysis• No human labeling, Boolean queries

Automatic Discovery

• No preprocessing, cleaning• Robust to noise• Real-time analytics

Self Analytics Service

‹#›

Applied in various segment

Insurance

Eliminate preprocessing, training data

Dramatically reduces

operational expense and time

of delivery

Self service analytics

Customer Experience

Increasing positive customer experience (CX)

Improving the client retention

Recommendation

Qualitative Analysis

Open-ended question and

qualitative surveys

Automatic categorization-

eliminating manual labeling

Enabling in-hand insights

Risk Management/ Public Safety

Early detection of evidence of fraud

Pilot Project

Public Safety

Risk Management

Retail

Category Management

Classification

Customer Intent

Increase sale

‹#›

Pain Points




Case Studies

‹#›

Case Study #1[Analysing Social media for the purpose of public safety]

Data source: tweets on April 15, 2013

‹#›

Case Study #1[Analysing Social media for the purpose of public safety]

‹#›

Case Study #2[Analyzing 1M tweets on Chrysler brand]

• Data: 1 Million tweets• Challenges:

– Noise, irrelevancy & volume• Ram (Chrysler brand) vs. RAM (memory) vs. Ram

(name)• Compass (Chrysler brand) vs. Compass(device) vs.

Compass (name of watch/group…)• Core (Chrysler brand) vs. Core(processor) • ….

‹#›

Case Study #2[Analyzing 1M tweets on Chrysler brand]

‹#›

Case study #3[Analyzing Blogs & Forums]

• Source: watchuseek community discussions

‹#›

- discover hidden unknowns- Beyond keyword search- Finding correlations hips between topics- Discovery of the knowledge in the data without using external resources

Finding correlationships between water- resistant, watch and stowa marine

Beyond keyword indexingDiscovery the relationship between Bottadesign , Germany and Swiss heart

Understanding 600t divingstar is a doxawatch even without mentioning the keyword “doxa”

Case study #3[Analyzing Blogs & Forums]

‹#›

Case Study #4[Root cause analysis by self-learning text analytics]

‹#›

Case study #5[2012-2015 US customer Complaints in the Financial Industry]

Finding unknown issues Financial Institutions

‹#›

Case Study #6[Qualitative Survey]

• City Council's Executive Committee requested the City Manager to seek the public's input on the establishment of a casino in Toronto.

Survey Response Form

‹#›

Case Study[Qualitative Survey]

• www.Kaypok.com• Twitter: @KaypokINC

Confidential Information – Not for Distribution

Contact Information

Razieh Niazi, Founder & [email protected]://ca.linkedin.com/pub/razieh-niazi/8/9b8/ba8416-731-3624 (Mobile)

http://www.kaypok.com/

exploring the hidden unknown using self-learning text ... · how about if we have a self-learning...

Documents