morgan floyd - intuit's live community

153
Floyd Morgan [email protected] @fmorgan Lucene Revolution, 2011

Upload: lucid-imagination

Post on 17-Mar-2016

217 views

Category:

Documents


2 download

DESCRIPTION

Floyd Morgan [email protected] @fmorgan Lucene Revolution, 2011 Agenda • About Me • About Live Community • Live Community Search • NLP • Next Steps • Questions? Answers? About Me • Principal Software Engineer at Intuit  

TRANSCRIPT

Page 1: Morgan Floyd - Intuit's Live Community

Floyd Morgan [email protected]

@fmorgan Lucene Revolution, 2011

Page 2: Morgan Floyd - Intuit's Live Community

Agenda

•  About Me •  About Live Community •  Live Community Search •  NLP •  Next Steps •  Questions? Answers?

Page 3: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit  

Page 4: Morgan Floyd - Intuit's Live Community

Intuit Inc. is a leading provider of business and financial management solutions for small and mid-sized businesses; financial institutions, including banks and

credit unions; consumers and accounting professionals.

More  than  200  applica0ons  and  7700  employees  worldwide.  

Intuit QuickBase

Page 5: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit •  TurboTax Engineering  

Page 6: Morgan Floyd - Intuit's Live Community

TurboTax is the nation’s No. 1 rated, best-selling, do-it-yourself tax preparation software. TurboTax helps more than 20 million people a

year.

$1 billion in revenue

Page 7: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit •  TurboTax Engineering

– Core tax engine  

Page 8: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit •  TurboTax Engineering

– Core tax engine – TurboTax Online

Page 9: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit •  TurboTax Engineering

– Core tax engine – TurboTax Online – TurboTax Live Community

Page 10: Morgan Floyd - Intuit's Live Community

About Me

•  Principal Software Engineer at Intuit •  TurboTax Engineering

– Core tax engine – TurboTax Online – TurboTax Live Community

•  Central Technology Organization – Live Community Platform

Page 11: Morgan Floyd - Intuit's Live Community
Page 12: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A

Page 13: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A •  It can be integrated into an application, contextually

–  Page-to-page relevance

Page 14: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A •  It can be integrated into an application, contextually

–  Page-to-page relevance •  We use social, technology and data

–  To create our value proposition…assisting users

Page 15: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A •  It can be integrated into an application, contextually

–  Page-to-page relevance •  We use social, technology and data

–  To create our value proposition…assisting users •  We launched our Beta in 2007

–  TurboTax Online Home & Business

Page 16: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A •  It can be integrated into an application, contextually

–  Page-to-page relevance •  We use social, technology and data

–  To create our value proposition…assisting users •  We launched our Beta in 2007

–  TurboTax Online Home & Business •  We use open source…primarily open source

–  Apache HTTP, Ruby on Rails, MySQL, memcached ...

Page 17: Morgan Floyd - Intuit's Live Community

About Live Community •  It’s a user contribution system

–  Q&A •  It can be integrated into an application, contextually

–  Page-to-page relevance •  We use social, technology and data

–  To create our value proposition…assisting users •  We launched our Beta in 2007

–  TurboTax Online Home & Business •  We use open source…primarily open source

–  Apache HTTP, Ruby on Rails, MySQL, memcached ... •  It’s a platform

–  APIs, skinning, dynamic provisioning (AWS in progress)

Page 18: Morgan Floyd - Intuit's Live Community

Intuit Money Manager, India

Page 19: Morgan Floyd - Intuit's Live Community

QuickBooks Online, UK

Page 20: Morgan Floyd - Intuit's Live Community

devZone, Intuit dev

Page 21: Morgan Floyd - Intuit's Live Community

QuickBooks Online, US

Page 22: Morgan Floyd - Intuit's Live Community

TurboTax Desktop & Online, US

Page 23: Morgan Floyd - Intuit's Live Community

Terminology

Page 24: Morgan Floyd - Intuit's Live Community

Consumers (in the millions)

Page 25: Morgan Floyd - Intuit's Live Community

Contributors (in the thousands)

Page 26: Morgan Floyd - Intuit's Live Community

Top Contributors (in the hundreds)

Page 27: Morgan Floyd - Intuit's Live Community

Employees (contribute too)

Page 28: Morgan Floyd - Intuit's Live Community

Officially begins on December 1 and ends

on April 15.

Tax Season

Page 29: Morgan Floyd - Intuit's Live Community

About TurboTax Live Community

•  Largest community – 150+ servers, 200 thousand concurrent users

Page 30: Morgan Floyd - Intuit's Live Community

About TurboTax Live Community

•  Largest community – 150+ servers, 200 thousand concurrent users

•  Over 23 million users have used the service – Over 8 million last tax season alone

Page 31: Morgan Floyd - Intuit's Live Community

About TurboTax Live Community

•  Largest community – 150+ servers, 200 thousand concurrent users

•  Over 23 million users have used the service – Over 8 million last tax season alone

•  Over 32 million pages views last tax season –  In-product views in the billions

Page 32: Morgan Floyd - Intuit's Live Community

About TurboTax Live Community

•  Largest community – 150+ servers, 200 thousand concurrent users

•  Over 23 million users have used the service – Over 8 million last tax season alone

•  Over 32 million pages views last tax season –  In-product views in the billions

•  Over 750 thousand answered questions – 10 thousand questions asked on peak day

Page 33: Morgan Floyd - Intuit's Live Community

About TurboTax Live Community

•  Largest community – 150+ servers, 200 thousand concurrent users

•  Over 23 million users have used the service – Over 8 million last tax season alone

•  Over 32 million pages views last tax season –  In-product views in the billions

•  Over 750 thousand answered questions – 10 thousand questions asked on peak day

•  Our contributors answers thousands of questions – Top contributor – 70 thousand answers

Page 34: Morgan Floyd - Intuit's Live Community

Demo

Page 35: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 36: Morgan Floyd - Intuit's Live Community
Page 37: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality  

Page 38: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality •  Ease of integration  

Page 39: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality •  Ease of integration •  We can scale it independently  

Page 40: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s

ok – Community and Lucid Imagination!

 

Page 41: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s

ok – Community and Lucid Imagination!

•  Search is really important – Search everywhere…

 

Page 42: Morgan Floyd - Intuit's Live Community

Why Solr?

•  Lots of features/functionality •  Ease of integration •  We can scale it independently •  You’ll need some search expertise…that’s

ok – Community and Lucid Imagination!

•  Search is really important – Search everywhere…

 

Page 43: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 44: Morgan Floyd - Intuit's Live Community
Page 45: Morgan Floyd - Intuit's Live Community
Page 46: Morgan Floyd - Intuit's Live Community
Page 47: Morgan Floyd - Intuit's Live Community
Page 48: Morgan Floyd - Intuit's Live Community

Auto suggest

•  Provides a glimpse of our vast content

Page 49: Morgan Floyd - Intuit's Live Community

Auto suggest

•  Provides a glimpse of our vast content •  facet query (Solr 1.2)

Page 50: Morgan Floyd - Intuit's Live Community

Auto suggest

•  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP…

Page 51: Morgan Floyd - Intuit's Live Community

Auto suggest

•  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP… •  It’s used on every search touch point

Page 52: Morgan Floyd - Intuit's Live Community

Auto suggest

•  Provides a glimpse of our vast content •  facet query (Solr 1.2) •  We use NLP… •  It’s used on every search touch point •  Second most frequent request

Page 53: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 54: Morgan Floyd - Intuit's Live Community
Page 55: Morgan Floyd - Intuit's Live Community
Page 56: Morgan Floyd - Intuit's Live Community

In-product “mini” search

•  Primary search interface for consumers  

Page 57: Morgan Floyd - Intuit's Live Community

In-product “mini” search

•  Primary search interface for consumers •  It appears integrated  

Page 58: Morgan Floyd - Intuit's Live Community

In-product “mini” search

•  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface  

Page 59: Morgan Floyd - Intuit's Live Community

In-product “mini” search

•  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface •  It makes all content available  

Page 60: Morgan Floyd - Intuit's Live Community

In-product “mini” search

•  Primary search interface for consumers •  It appears integrated •  Now the most utilized search interface •  It makes all content available •  Over 3 million users last tax season  

Page 61: Morgan Floyd - Intuit's Live Community

# using Solr is easy!  require 'solr’c = Solr::Connection.new( "http://localhost:8090/solr/posts" )c.search( "how do i input 1099”, :filter_queries => "post_status: #{Post::ANSWERED}" )

Page 62: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 63: Morgan Floyd - Intuit's Live Community
Page 64: Morgan Floyd - Intuit's Live Community
Page 65: Morgan Floyd - Intuit's Live Community

Web-site “full” search

•  Primary search interface for contributors and employees

Page 66: Morgan Floyd - Intuit's Live Community

Web-site “full” search

•  Primary search interface for contributors and employees

•  More real estate, more facets, more suggestions ...

Page 67: Morgan Floyd - Intuit's Live Community

Web-site “full” search

•  Primary search interface for contributors and employees

•  More real estate, more facets, more suggestions ...

•  Faceted search empowers development teams to narrow on issues

Page 68: Morgan Floyd - Intuit's Live Community

Web-site “full” search

•  Primary search interface for contributors and employees

•  More real estate, more facets, more suggestions ...

•  Faceted search empowers development teams to narrow on issues

•  200+ TurboTax issues discovered last tax season

Page 69: Morgan Floyd - Intuit's Live Community
Page 70: Morgan Floyd - Intuit's Live Community
Page 71: Morgan Floyd - Intuit's Live Community

# using Solr is easy!  require 'solr’c = Solr::Connection.new( "http://localhost:8090/solr/posts" )c.search( ”bug”, :filter_queries => "post_status: #{Post::OPEN}" )

Page 72: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 73: Morgan Floyd - Intuit's Live Community
Page 74: Morgan Floyd - Intuit's Live Community

Instant answer

•  Present similar answered question

Page 75: Morgan Floyd - Intuit's Live Community

Instant answer

•  Present similar answered question •  Search with the terms of the new question

Page 76: Morgan Floyd - Intuit's Live Community

Instant answer

•  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject

Page 77: Morgan Floyd - Intuit's Live Community

Instant answer

•  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject •  Show snippet of a recommended answer

Page 78: Morgan Floyd - Intuit's Live Community

Instant answer

•  Present similar answered question •  Search with the terms of the new question •  Narrow the focus to the subject •  Show snippet of a recommended answer •  Accidental A/B test

Page 79: Morgan Floyd - Intuit's Live Community

Demo

Page 80: Morgan Floyd - Intuit's Live Community

# using Solr is easy!  require 'solr’c = Solr::Connection.new( "http://localhost:8090/solr/posts" )c.search( "how do i input 1099”, { :query_fields => "subject", :filter_queries => "post_status: #{Post::ANSWERED}" } )

Page 81: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 82: Morgan Floyd - Intuit's Live Community
Page 83: Morgan Floyd - Intuit's Live Community

Instant question

•  Present similar unanswered questions

Page 84: Morgan Floyd - Intuit's Live Community

Instant question

•  Present similar unanswered questions •  Answer reuse

Page 85: Morgan Floyd - Intuit's Live Community

Instant question

•  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered

question

Page 86: Morgan Floyd - Intuit's Live Community

Instant question

•  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered

question •  Narrow the focus to the subject

Page 87: Morgan Floyd - Intuit's Live Community

Instant question

•  Present similar unanswered questions •  Answer reuse •  Search with the terms of the answered

question •  Narrow the focus to the subject •  We also use a date filter

Page 88: Morgan Floyd - Intuit's Live Community

“Aren’t  we  addicted  enough!”  

Page 89: Morgan Floyd - Intuit's Live Community

Demo

Page 90: Morgan Floyd - Intuit's Live Community

# using Solr is easy!  require 'solr’c = Solr::Connection.new( "http://localhost:8090/solr/posts" )today =

DateTime.now.at_beginning_of_day.utc.to_timedate_from = 7.to_i.days.ago

( today ).getutc.iso8601c.search( "how do i input 1099", { :query_fields

=> "subject", :filter_queries => "post_status: #{Post::OPEN} AND created_at_d:[#{date_from} TO *]" } )

Page 91: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 92: Morgan Floyd - Intuit's Live Community
Page 93: Morgan Floyd - Intuit's Live Community

Answer bot

•  We continue to search for you – The day after you ask

Page 94: Morgan Floyd - Intuit's Live Community

Answer bot

•  We continue to search for you – The day after you ask

•  Send an email

Page 95: Morgan Floyd - Intuit's Live Community

Answer bot

•  We continue to search for you – The day after you ask

•  Send an email •  Runs for 7 days

Page 96: Morgan Floyd - Intuit's Live Community

Answer bot

•  We continue to search for you – The day after you ask

•  Send an email •  Runs for 7 days •  We only send another email if the results

have changed

Page 97: Morgan Floyd - Intuit's Live Community

Answer bot

•  We continue to search for you – The day after you ask

•  Send an email •  Runs for 7 days •  We only send another email if the results

have changed •  From our explicit feedback

– 39% answered question

Page 98: Morgan Floyd - Intuit's Live Community
Page 99: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 100: Morgan Floyd - Intuit's Live Community
Page 101: Morgan Floyd - Intuit's Live Community

Advertising

•  We use our user generated content in advertising

 

Page 102: Morgan Floyd - Intuit's Live Community

Advertising

•  We use our user generated content in advertising

•  Has 300% higher click through rate than static banner ads

 

Page 103: Morgan Floyd - Intuit's Live Community

Advertising

•  We use our user generated content in advertising

•  Has 300% higher click through rate than static banner ads

•  Ads displayed throughout the tax season on many ad networks

 

Page 104: Morgan Floyd - Intuit's Live Community

Advertising

•  We use our user generated content in advertising

•  Has 300% higher click through rate than static banner ads

•  Ads displayed throughout the tax season on many ad networks

•  Content selection is automated and continuous

 

Page 105: Morgan Floyd - Intuit's Live Community
Page 106: Morgan Floyd - Intuit's Live Community

LogsLogsLogs

Carrot2

Solr

Heuristics

MapReduce

Page 107: Morgan Floyd - Intuit's Live Community

<?xml version="1.0" encoding="UTF-8"?> <lc_trending end_date="2011-05-21" include_popular="true" type="queries" duration="day"> <topic> <rank>1</rank> <text>Ptp</text> <post> <post_id>aBHMBWxzar4lKMacfArRo0</post_id> <subject>Final K-1 Disposition of PTP Units</subject> <detail>I bought units in a PTP in five separate transactions in 2008; I sold all my units in five separate transactions in 2010. TT does not allow me to report all 5 transactions while stepping through the K-1 form -- these transactions are reported on Schedule D, but also need to be on Form 4797, Part II, Box 10. I can't seem to make the linkage work. I would appreciate some guidance on how to make this happen.</detail> <response>OK, several steps needed for your situation: 1) on the K-1 on the screen entitled Describe the Partnership Disposal, choose "Disposition was not via a sale" 2) Then search for the topic "sale of business property" - you will be taked to a topic entitled "Any Other Property Sales?" - select the first option. Ove rthe next few screens here you will have the opportunityut to enter the sale amounts associated witht he Form 4797. 3) then choose the topic on the income landing table for "Stocke, Mutual Funds, Bonds, other - here you will enter the rest of the sale, that portion attributable to capital gains. Hope this helps you, </response> <viewsCount>60</viewsCount> <answersCount>2</answersCount> <asker>Xuxan</asker> <display_post_url>https://ttlc.intuit.com/post/show_full/aBHMBWxzar4lKMacfArRo0?rmode=ad</display_post_url> </post>  

Page 108: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 109: Morgan Floyd - Intuit's Live Community
Page 110: Morgan Floyd - Intuit's Live Community
Page 111: Morgan Floyd - Intuit's Live Community

Search everywhere

•  Search first, ask second – Used to be ask first, search later or never!

Page 112: Morgan Floyd - Intuit's Live Community
Page 113: Morgan Floyd - Intuit's Live Community
Page 114: Morgan Floyd - Intuit's Live Community

Search everywhere

•  Search first, ask second – Used to be ask first, search later or never!

•  Auto complete everywhere too – 64 bit Linux, 10 (8 core) slaves, 300 req/s

Page 115: Morgan Floyd - Intuit's Live Community

Search everywhere

•  Search first, ask second – Used to be ask first, search later or never!

•  Auto complete everywhere too – 64 bit Linux, 10 (8 core) slaves, 300 req/s

•  Search requests – 900 % increase

Page 116: Morgan Floyd - Intuit's Live Community

Search everywhere

•  Search first, ask second – Used to be ask first, search later or never!

•  Auto complete everywhere too – 64 bit Linux, 10 (8 core) slaves, 300 req/s

•  Search requests – 900 % increase

•  Questions asked – 50 % decrease…is that good?

Page 117: Morgan Floyd - Intuit's Live Community

Search everywhere

•  Search first, ask second – Used to be ask first, search later or never!

•  Auto complete everywhere too – 64 bit Linux, 10 (8 core) slaves, 300 req/s

•  Search requests – 900 % increase

•  Questions asked – 50 % decrease…is that good?

•  Increased consumption – 38% users, 43% content…very good!

Page 118: Morgan Floyd - Intuit's Live Community

Live Community Search •  Why Solr? •  Auto suggest •  In-product search •  Web-site search •  Instant answer •  Instant question •  Answer bot •  Advertising •  Search everywhere •  Architecture

Page 119: Morgan Floyd - Intuit's Live Community

App server

Search cluster

Indexing server

Database cluster

Page 120: Morgan Floyd - Intuit's Live Community

NLP

•  Search is not enough…unfortunately

Page 121: Morgan Floyd - Intuit's Live Community

NLP

•  Search is not enough…unfortunately •  Our domain is noisy…ugly at times

Page 122: Morgan Floyd - Intuit's Live Community

Uh, what?

Page 123: Morgan Floyd - Intuit's Live Community

Too much what!

Page 124: Morgan Floyd - Intuit's Live Community

?

Page 125: Morgan Floyd - Intuit's Live Community

I wish NLP could help!

Page 126: Morgan Floyd - Intuit's Live Community

NLP

•  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works…

Page 127: Morgan Floyd - Intuit's Live Community

HwO do iput 10 99 i don,t know what to do need help

help me.

Page 128: Morgan Floyd - Intuit's Live Community

Where do I enter a 1099?

Page 129: Morgan Floyd - Intuit's Live Community

schema.xml <fieldtype name="text" class="solr.TextField" positionIncrementGap="100">

<analyzer type="index"> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.HTMLStripStandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>

</fieldtype>  

Page 130: Morgan Floyd - Intuit's Live Community

dictionary <?xml version="1.0" encoding="US-ASCII"?>

<dictionary> <entry score="10" root="none" synonym="none" domain="ttlc" id="suitcas">suitcase</entry> <entry score="10" root="form" synonym="none" domain="ttlc" id="2210"></entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="xrai">x-ray</entry> <entry score="10" root="none" synonym="townhom" domain="ttlc" id="townhous">townhouse</entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="grosssal">gross sale</entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="trinidad">Trinidad</entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="home"></entry> <entry score="10" root="none" synonym="know" domain="ttlc" id="knew"></entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="massachusett">Massachusetts</entry> <entry score="10" root="none" synonym="none" domain="ttlc" id="denver">Denver</entry> <entry score="5" root="none" synonym="none" domain="ttlc" id="instead"></entry> <entry score="10" root="none" synonym="unallow" domain="ttlc" id="disallow">not allowed</entry> <entry score="5" root="none" synonym="see" domain="ttlc" id="saw"></entry>  

Page 131: Morgan Floyd - Intuit's Live Community

regular expressions (many) if text =~ / any/

text.gsub!(/ any where /, ' anywhere ') text.gsub!(/ any(body| body| one) /, ' anyone ') text.gsub!(/ any( thing| things|things) /, ' anything ') text.gsub!(/ any(one|thing|where) else /, ' any\1 ’)

endif text =~ / don /

text.gsub!(/ don i /, ' do not i ') text.gsub!(/ don (have|know|see|want) /, ' do not \1 ') text.gsub!(/ (are|be|have|is|was|were) don /, ' \1 done ’)

text.gsub!(/ don (not|nt|t) /, ' do not ’)end

text.gsub!(/ (do|can) (ai|ii) /, ' \1 i ’)text.gsub!(/ d (oyou|you) /, ' do you ')

text.gsub!(/ (1|ai|ii|my) (did|do|had|have|was) /, ' i \2 ’)text.gsub!(/ crap{1,10} /, ' crap ’)text.gsub!(/ gr{1,} /, ' ')

 

Page 132: Morgan Floyd - Intuit's Live Community

Spell Checker

Stemmer (Porter)

Word Collocation

Stop Phrase Correction

Stop Word Removal

Synonyms Substitution

Tax Domain Correction

Phrase Encoding

Page 133: Morgan Floyd - Intuit's Live Community

# NLP is not easy!  # this class wraps our NLPsf = SemanticFilter.new# does it work?sf.act_on_post( "HwO do iput 10 99 i don,t know what to do need help help me." )

=>[" wheretoent 1099 ”]sf.act_on_post( "Where do I enter a 1099?" )=>[" wheretoent 1099 ”]  

Page 134: Morgan Floyd - Intuit's Live Community

NLP

•  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works… •  It works well, but it’s not perfect

Page 135: Morgan Floyd - Intuit's Live Community

“Stop guessing what I’m looking for!”

Page 136: Morgan Floyd - Intuit's Live Community

NLP

•  Search is not enough…unfortunately •  Our domain is noisy…ugly at times •  How it works… •  It works well, but it’s not perfect •  Not just for search…

Page 137: Morgan Floyd - Intuit's Live Community
Page 138: Morgan Floyd - Intuit's Live Community

Recommendations

•  Deliver unanswered questions to contributors

Page 139: Morgan Floyd - Intuit's Live Community

Recommendations

•  Deliver unanswered questions to contributors

•  Too much content to scan manually

Page 140: Morgan Floyd - Intuit's Live Community

Recommendations

•  Deliver unanswered questions to contributors

•  Too much content to scan manually •  Based on past answering behavior

Page 141: Morgan Floyd - Intuit's Live Community

Recommendations

•  Deliver unanswered questions to contributors

•  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple

contributors

Page 142: Morgan Floyd - Intuit's Live Community

Recommendations

•  Deliver unanswered questions to contributors

•  Too much content to scan manually •  Based on past answering behavior •  Recommend a question to multiple

contributors •  Uses Mahout machine learning library

Page 143: Morgan Floyd - Intuit's Live Community

Post

vectors

Mahout

Heuristics

User

vectors

NLPNLP

UnansweredAnswered

Page 144: Morgan Floyd - Intuit's Live Community
Page 145: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it!

Page 146: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;)

Page 147: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;) •  Real-time indexing

Page 148: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query

Page 149: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback

– Page ranking

Page 150: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback

– Page ranking •  Social dictionaries

– Content classification

Page 151: Morgan Floyd - Intuit's Live Community

Next Steps

•  We’re going to rewrite it! … most of it ;) •  Real-time indexing •  Question vs. Query •  Social feedback

– Page ranking •  Social dictionaries

– Content classification •  Beer?!

Page 152: Morgan Floyd - Intuit's Live Community

Thank  you.    

[email protected]  @fmorgan  

Page 153: Morgan Floyd - Intuit's Live Community

Appendix  

•  User  search  •  SEO