![Page 1: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/1.jpg)
Identifying Customer Needs from User-Generated Content
The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
Citation Timoshenko, Artem and John R. Hauser. "Identifying CustomerNeeds from User-Generated Content." Marketing Science 38, 1(January 2019): 1-192, ii-ii © 2019 INFORMS
As Published http://dx.doi.org/10.1287/mksc.2018.1123
Publisher Institute for Operations Research and the Management Sciences(INFORMS)
Version Author's final manuscript
Citable link https://hdl.handle.net/1721.1/124203
Terms of Use Creative Commons Attribution-Noncommercial-Share Alike
Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/
![Page 2: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/2.jpg)
IdentifyingCustomerNeedsfromUser-GeneratedContent
by
ArtemTimoshenko
and
JohnR.Hauser
June2018
ArtemTimoshenkoisaPhDstudentattheMITSloanSchoolofManagement,MassachusettsInstituteof
Technology,E62-584,77MassachusettsAvenue,Cambridge,MA02139,(617)803-5630,
JohnR.HauseristheKirinProfessorofMarketing,MITSloanSchoolofManagement,Massachusetts
InstituteofTechnology,E62-538,77MassachusettsAvenue,Cambridge,MA02139,(617)253-2929,
WethankJohnMitchell,StevenGaskin,CarmelDibner,AndreaRuttenberg,PattiYanes,KristynCorrigan
andMeaghanFoleyfortheirhelpandsupport.WethankReginaBarzilay,ClarenceLee,DariaDzyabura,
DeanEckles,DuncanSimester,EvgenyPavlov,GuilhermeLiberali,TheodorosEvgeniou,andHema
Yoganarasimhanforhelpfulcommentsanddiscussions.WethankKenDealandEwaNowakowskafor
suggestionsonearlierversionsofthispaper.Thispaperhasbenefitedfrompresentationsatthe2016
SawtoothSoftwareConferenceinParkCityUtah,theMITMarketingGroupSeminar,the39thISMS
MarketingScienceConference,andpresentationsatAppliedMarketingScience,Inc.andCornerstone
Research,Inc.Theapplicationsin§6werecompletedbyAppliedMarketingScience,Inc.Finally,we
thanktheanonymousreviewersandAssociateEditorforconstructivecommentsthatenabledusto
improveourresearch.
![Page 3: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/3.jpg)
1
IdentifyingCustomerNeedsfromUser-GeneratedContent
Abstract
Firmstraditionallyrelyoninterviewsandfocusgroupstoidentifycustomerneedsformarketing
strategyandproductdevelopment.User-generatedcontent(UGC)isapromisingalternativesourcefor
identifyingcustomerneeds.However,establishedmethodsareneitherefficientnoreffectiveforlarge
UGCcorporabecausemuchcontentisnon-informativeorrepetitive.Weproposeamachine-learning
approachtofacilitatequalitativeanalysisbyselectingcontentforefficientreview.Weusea
convolutionalneuralnetworktofilteroutnon-informativecontentandclusterdensesentence
embeddingstoavoidsamplingrepetitivecontent.Wefurtheraddresstwokeyquestions:AreUGC-
basedcustomerneedscomparabletointerview-basedcustomerneeds?Dothemachine-learning
methodsimprovecustomer-needidentification?Thesecomparisonsareenabledbyacustomdatasetof
customerneedsfororalcareproductsidentifiedbyprofessionalanalystsusingindustry-standard
experientialinterviews.Theanalystsalsocoded12,000UGCsentencestoidentifywhichpreviously
identifiedcustomerneedsand/ornewcustomerneedswerearticulatedineachsentence.Weshowthat
(1)UGCisatleastasvaluableasasourceofcustomerneedsforproductdevelopment,likelymore-
valuable,thanconventionalmethods,and(2)machine-learningmethodsimproveefficiencyof
identifyingcustomerneedsfromUGC(uniquecustomerneedsperunitofprofessionalservicescost).
Keywords:VoiceoftheCustomer;MachineLearning,User-generatedContent;CustomerNeeds;Online
Reviews;MarketResearch;TextMining;DeepLearning;NaturalLanguageProcessing
![Page 4: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/4.jpg)
2
1.Introduction
Marketingpracticerequiresadeepunderstandingofcustomerneeds.Inmarketingstrategy,
customerneedshelpsegmentthemarket,identifystrategicdimensionsfordifferentiation,andmake
efficientchannelmanagementdecisions.Forexample,Park,Jaworski,andMacInnis(1986)describe
examplesofstrategicpositioningbasedonfulfillingcustomerneeds:“attirefortheconservative
professional”(BrooksBrothers)or“aworldapart—letitexpressyourworld”(LenoxChina).Inproduct
development,customerneedsidentifynewproductopportunities(Herrmann,Huber,andBraunstein
2000),improvethedesignofnewproducts(KrishnanandUlrich2001;Sullivan1986;Ulrichand
Eppinger2004),helpmanageproductportfolios(Stone,etal.2008),andimproveexistingproductsand
services(MatzlerandHinterhuber1998).Inmarketingresearch,customerneedshelptoidentifythe
attributesusedintheconjointanalysis(Orme2006).
Understandingofcustomerneedsisparticularlyimportantforproductdevelopment(Kano,etal.
1984;MikulićandPrebežac2011).Forexample,considerthebreakthroughlaundrydetergent,“Attack,”
developedbytheKaoGroupinJapan.BeforeKao’sinnovation,firmssuchasProcter&Gamble
competedinfulfillingthe(primary)customerneedsofexcellentcleaning,readytowearafterwashing,
value(qualityandquantityperprice),easeofuse,smellgood,goodformeandtheenvironment,and
personalsatisfaction.Newproductsdevelopedformulationstocompeteontheseidentifiedprimary
customerneeds,e.g.,theproductsthatwouldcleanbetter,smellbetter,begentlefordelicatefabrics,
andnotharmtheenvironment.Themarketwashighlycompetitive;perceivedvalueplayedamajorrole
inmarketinganddetergentsweresoldinlarge“high-value”boxes.KaoGroupwasfirsttorecognizethat
Japanesecustomerswanted“adetergentthatiseasytotransporthomebyfootorbicycle,”“ina
containerthatfitsinlimitedapartmentspace,”but“getsmyclothesfreshandclean.”Guidedbythis
insight,Kaolaunchedahighly-concentrateddetergentinaneasy-to-storeandeasy-to-carrypackage.
![Page 5: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/5.jpg)
3
Despiteapremiumprice,Attackquicklycommandedalmost50%oftheJapaneselaundrymarket(Kao
Group2016).Americanfirmssoonintroducedtheirownconcentrateddetergents,butbybeingthefirst
toidentifyanunfulfilledandpreviouslyunrecognizedcustomerneed,Kaogainedacompetitiveedge.
Thereisanimportantdistinctionbetweencustomerneedsandproductattributes.Acustomer
needisanabstractcontext-dependentstatementdescribingthebenefits,inthecustomer’sownwords,
thatthecustomerseekstoobtainfromaproductorservice(BrownandEisenhardt1995;Griffin,etal.,
2009).Productattributesarethemeanstosatisfyingthecustomerneeds.Forexample,whendescribing
theirexperiencewithmouthwashes,acustomermightexpresstheneed“toknoweasilytheamountof
mouthwashtouse.”Thiscustomerneedcanbesatisfiedbyvariousproductattributes(solutions),
includingticksonthecapandtextualorvisualdescriptionsonthebottle.
Toeffectivelycapturerichinformation,customerneedsaretypicallydescribedwithsentencesor
phrasesthatdescribeindetailthebenefitsthecustomerswishtoobtainfromproducts.Complete
formulationscommunicatemoreprecisemessagescomparedto“bagsofwords,”suchasdevelopedby
latentDirichletallocation(LDA),wordcounts,orwordco-occurrence(e.g.,BüschkenandAllenby2017;
LeeandBradlow2011;Netzer,etal.2012;SchweidelandMoe2014).Forexample,considerone“bagof
words”fromBüschkenandAllenby(2017):
“Realpizza:”pizza,crust,really,like,good,Chicago,Thin,Style,Best,One,Just,New,Pizzas,Great,
Italian,Little,York,Cheese,Place,Get,Know,Much,Beef,Lot,Sauce,Chain,Got,Flavor,Dish,Find
WordcombinationsgiveinsightintodimensionsofItalianrestaurants—combinationsthatare
usefultogenerateattributesforconjointanalysis.However,fornewproductdevelopment,product-
developmentteamswanttoknowhowthecustomersusethesewordsincontext.Forexample:
• Pizzaarrivestothetableattherighttemperature(e.g.,nottoohotandnotcold).
• Pizzathatiscookedallthewaythrough(i.e.,nottoodoughy).
• Ingredients(e.g.,sauce,cheese,etc.)areneithertoolightnortooheavy.
• Crustthatisflavorful(e.g.,sweet).
![Page 6: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/6.jpg)
4
• ToppingsstayonthepizzaasIeatit.
Ourpaperfocusesontheproblemofidentifyingthecustomerneeds.Whilerelativeimportances
ofcustomerneedsarevaluabletoproduct-developmentteams,methodssuchasconjointanalysisand
self-explicatedmeasuresarewell-studiedandincommonuse.Weassumethatpreferencemeasuresare
usedlaterinproductdevelopmenttodecideamongproductconcepts(UlrichandEppinger,2016;Urban
andHauser,1993).
Theidentificationofcustomerneedsincontextrequiresadeepunderstandingofacustomer’s
experience.Traditionalmethodsrelyonhumaninteractionswithcustomers,suchasexperiential
interviewsandfocusgroups.However,traditionalmethodsareexpensiveandtime-consuming,often
resultingindelaysintimetomarket.Toavoidtheexpenseanddelays,somefirmsuseheuristics,suchas
managerialjudgmentorareviewofweb-basedproductcomparisons.However,suchheuristicmethods
oftenmisscustomerneedsthatarenotfulfilledbyanyproductthatisnowonthemarket.
User-generatedcontent(UGC),suchasonlinereviews,socialmedia,andblogs,providesextensive
richtextualdataandisapromisingsourcefromwhichtoidentifycustomerneedsmoreefficiently.UGC
isavailablequicklyandatalowincrementalcosttothefirm.Inmanycategories,UGCisextensive—for
example,thereareover300,000reviewsonhealthandpersonalcareproductsonAmazonalone.IfUGC
canbeminedforcustomerneeds,UGChasthepotentialtoidentifyasmany,orperhapsmore,
customerneedsthandirectcustomerinterviewsandtodosomorequicklywithlowercost.UGC
providesadditionaladvantages:(1)itisupdatedcontinuouslyenablingthefirmtoupdateits
understandingofcustomerneedsand(2)unlikecustomerinterviews,firmscanreturntoUGCatlow
costtoexplorenewinsightsfurther.
TherearemultipleconcernswithidentifyingcustomerneedsfromUGC.First,theveryscaleof
UGCmakesitdifficultforhumanreaderstoprocess.Weseekmethodsthatscalewelland,possibly,
makehumanreadersmoreefficient.Second,muchUGCisrepetitiveornotrelevant.Sentencessuchas
![Page 7: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/7.jpg)
5
“Ihighlyrecommendthisproduct”donotexpresscustomerneeds.Repetitiveandirrelevantcontent
makeatraditionalmanualanalysisinefficient.Third,weexpect,andouranalysisconfirms,thatmostof
UGCconcentratesonarelativelyfewcustomerneeds.Althoughsuchinformationmightbeuseful,we
seekmethodstoefficientlysearchmorebroadlyinordertoobtainareasonablycompletesetof
customerneeds(withincostandfeasibilityconstraints),includingrarelymentionedcustomerneeds.
Fourth,UGCdataareunstructuredandmostlytext-based.Toidentifyabstractcontext-dependent
customerneeds,researchersneedtounderstandrichmeaningsbehindthewords.Finally,unlike
traditionalmethodsbasedonarepresentativesampleofcustomers,customersself-selecttopostUGC.
Self-selectionmightcauseanalyststomissimportantcategoriesofcustomerneeds.
Ourprimarygoalsinthispaperaretwo-fold.First,weexaminewhetherareasonablecorpusof
UGCprovidessufficientcontenttoidentifyareasonablycompletesetofcustomerneeds.Weconstruct
andanalyzeacustomdatasetinwhichwepersuadedaprofessionalmarketingconsultingfirmto
provide(a)customerneedsidentifiedfromexperientialinterviewswitharepresentativesetof
customersand(b)acompletecodingofasampleofsentencesfromAmazonreviewsintheoral-care
category.Second,wedevelopandevaluateamachine-learninghybridapproachtoidentifycustomer
needsfromUGC.Weusemachinelearningtoidentifyrelevantcontentandremoveredundancyfroma
largeUGCcorpus,andthenrelyonhumanjudgmenttoformulatecustomerneedsfromselected
content.Wedrawonrecentresearchindeeplearning,inparticular,convolutionalneuralnetworks
(CNN;Collobert,etal.2011;Kim2014)anddensewordandsentenceembeddings(Mikolov,etal.
2013a;Socher,etal.2013).TheCNNfiltersoutnon-informativecontentfromalargeUGCcorpus.Dense
wordandsentenceembeddingsembedsemanticcontentinareal-valuedvectorspace.Weuse
sentenceembeddingstosampleadiversesetofnon-redundantsentencesformanualreview.Boththe
CNNandwordandsentenceembeddingsscaletolargedatasets.Manualreviewbyprofessionalanalysts
remainsnecessaryinthelaststepbecauseofthecontext-dependentnatureofcustomerneeds.
![Page 8: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/8.jpg)
6
WeevaluateUGCasasourceofcustomerneedsintermsofthenumberandvarietyofcustomer
needsidentifiedinafeasiblecorpus.Wethenevaluatetheefficiencyimprovementsachievedbythe
machinelearningmethodsintermsoftheexpectednumberofuniquecustomerneedsidentifiedper
unitofprofessionalservicescosts.Professionalservicescosts,orthebillingratesofexperienced
professionals,arethedominantcostsinindustryforidentifyingcustomerneeds.Ourcomparisons
suggestthat,ifwelimitcoststothatrequiredtoreviewexperientialinterviews,thenUGCprovidesa
comparablesetofcustomerneedstothoseobtainedfromexperientialinterviews.Despitethepotential
forself-selection,UGCdoesatleastaswell(inthetestedcategory)astraditionalmethodsbasedona
representativesetofcustomers.Whenwerelaxtheprofessionalservicesconstraintforreviewing
sentences,butmaintainprofessionalservicescoststobelessthanwouldberequiredtointerviewand
review,thenUGCisabettersourceofcustomerneeds.Wefurtherdemonstratethatmachinelearning
helpstoeliminateirrelevantandredundantcontentand,hence,makesprofessionalservices
investmentsmoreefficient.Byselectingamore-efficientcontentforreview,machinelearningincreases
aprobabilityofidentifyinglow-frequencycustomerneeds.UGC-basedanalysesreduceresearchtime
substantiallyavoidingdelaysintime-to-market.
2.RelatedResearch
2.1.TraditionalMethodstoIdentifyCustomerNeeds(andLinkNeedstoProductAttributes)
Givenasetofcustomerneeds,product-developmentteamsuseavarietyofmethods,suchas
qualityfunctiondeployment,toidentifycustomersolutionsorproductattributesthataddresscustomer
needs(Akao2004;HauserandClausing1988;Sullivan1986).Forexample,ChanandWu(2002)review
650researcharticlesthatdevelop,refine,andapplyQFDtomapcustomerneedstosolutions.Zahay,
Griffin,andFredericks(2004)reviewtheuseofcustomerneedsinthe“fuzzyfrontend,”productdesign,
producttesting,andproductlaunch.Customerneedscanalsobeusedtoidentifyattributesforconjoint
![Page 9: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/9.jpg)
7
analysis(GreenandSrinivasan1978;Orme2006).Kim,etal.(2017)proposeabenefit-basedconjoint-
analysismodelwhichmapsproductattributestolatentcustomerneedsbeforeestimation.
Researchersinmarketingandengineeringhavedevelopedandrefinedmanymethodstoelicit
customerneedsdirectlyfromcustomers.Themostcommonmethodsrelyonfocusgroups,experiential
interviews,orethnographyasinput.Trainedprofessionalanalyststhenreviewtheinput,manually
identifycustomerneeds,removeredundancy,andstructurethecustomerneeds(AlamandPerry2002;
Goffin,etal.2012;Kaulio1998).Someresearchersaugmentinterviewswithstructuredmethodssuchas
repertorygrids(WuandShich2010).
Typically,customer-needidentificationbeginswith20-30qualitativeexperientialinterviews.
Multipleanalystsreviewtranscripts,highlightcustomerneeds,andremoveredundancy(“winnowing”)
toproduceabasicsetofapproximately100abstractcontext-dependentcustomer-needstatements.
Affinitygroupsorclusteredcustomer-cardsortsthenprovidestructureforthecustomerneeds,oftenin
theformofahierarchyofprimary,secondary,andtertiarycustomerneeds(GriffinandHauser1993;
JiaoandChen2006).Together,identificationandstructuringofcustomerneedsareoftencalledvoice-
of-the-customer(VOC)methods.Recently,researchershavesoughttoexplorenewsourcesofcustomer
needstosupplementorreplacecommonmethods.Forexample,SchaffhausenandKowalewski(2015;
2016)proposedusingawebinterfacetoaskcustomerstoentercustomerneedsandstoriesdirectly.
Theythenrelyonhumanjudgmenttostructurethecustomerneedsandremoveredundancy.
2.2.UGCTextAnalysisinMarketingandProductDevelopment
Researchersinmarketinghavedevelopedavarietyofmethodstomineunstructuredtextualdata
toaddressmanagerialquestions.SeereviewsinBüschkenandAllenby(2016)andFaderandWiner
(2012).Theresearchclosesttoourgoalsuseswordco-occurrencesandvariationsofLDAtoidentify
wordgroupingsinproductdiscussions(Archak,Ghose,andIpeirotis2016;BüschkenandAllenby2006;
LeeandBradlow2011;TirunillaiandTellis2014;Netzer,etal.2012).Someresearchersanalyzethese
![Page 10: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/10.jpg)
8
wordgroupingsfurtherbylinkingthemtosales,sentiment,ormovieratings(Archak,Ghoseand
Ipeirotis2016;SchweidelandMoe2014;Ying,Feinberg,andWedel2006).Thelattertwopapersdeal
explicitlywithself-selectionormissingratingsbyanalyzingUGCfromthesamepersonoverdifferent
moviesorfrommultiplesourcessuchasdifferentvenues.Weaddresstheself-selectionconcernby
comparingcustomerneedsidentifiedfromUGCtothecustomerneedsidentifiedfromtheinterviews
witharepresentativesampleofcustomers.Weassumethatresearcherscanrelyonstandardmethods
tomapcustomerneedstotheoutcomemeasuressuchaspreferencesforproductconceptsineach
customersegment(GriffinandHauser1993;Orme2006).
Inengineering,theproductattributeelicitationliteratureisclosesttothegoalsofourpaper,
althoughthefocusisprimarilyonphysicalattributesratherthanmore-abstractcontext-dependent
customerneeds.Jin,etal.(2015)andPeng,Sun,andRevankar(2012)proposeautomatedmethodsto
identifyengineeringcharacteristics.Thesepapersfocusonparticularpartsofspeechormanually
identifiedwordcombinationsanduseclusteringtechniquesorLDAtoidentifyproductattributesand
levelstobeconsideredinproductdevelopment.Kuehl(2016)proposesidentifyingintangibleattributes
togetherwithphysicalproductattributeswithsupervisedclassificationtechniques.Ourmethods
augmenttheliteraturesinbothmarketingandengineeringbyfocusingonthemore-context-dependent,
deeper-semanticnatureofcustomerneeds.
2.3.DeepLearningforNaturalLanguageProcessing
Wedrawontwoliteraturesfromnaturallanguageprocessing(NLP):convolutionalneural
networks(CNNs)anddensewordandsentencerepresentations.ACNNisasupervisedprediction
techniquewhichisparticularlysuitedtocomputervisionandnaturallanguageprocessingtasks.ACNN
oftencontainsmultiplelayerswhichtransformnumericalrepresentationsofsentencestocreateinput
forafinallogit-basedlayer,whichmakesthefinalclassification.CNNsdemonstratestate-of-the-art
performancewithminimumtuninginsuchproblemsasrelationextraction(NguyenandGrishman
![Page 11: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/11.jpg)
9
2015),namedentityrecognition(ChiuandNichols2016),andsentimentanalysis(dosSantosandGatti
2014).Wedemonstratethat,onourdata,CNNsdoatleastaswellasasupport-vectormachine(SVM),a
multichannelCNN(Kim2014),andaRecurrentNeuralNetworkwithLongShort-TermMemorycells
(LSTM;HochreiterandSchmidhuber1997).
Densewordandsentenceembeddingsarereal-valuedvectormappings(typically20-300
dimensions),whicharetrainedsuchthatvectorsforsimilarwords(orsentences)arecloseinthevector
space.ThetheoryofdenseembeddingsisbasedontheDistributionalHypothesis,whichstatesthat
wordsthatappearinasimilarcontextsharesemanticmeaning(Harris1954).High-qualitywordand
sentenceembeddingscanbeusedasaninputfordownstreamNLPapplicationsandmodels(Lample,et
al.2016;Kim2014).Somewhatunexpectedly,high-qualitywordembeddingscapturenotonlysemantic
similarity,butalsosemanticrelationships(Mikolov,etal.2013b).Usingtheconventionofboldtypefor
vectors,thenif!(′word()isthewordembeddingfor‘word,’Mikolovetal.(2013b)demonstratethat
wordembeddingstrainedontheGoogleNewsCorpushavethefollowingproperties:
! king − ! man + ! woman ≈ ! queen
! walking − ! swimming + ! swam ≈ ! walked
! Paris − ! France + ! Italy ≈ !(Rome)
Wetrainwordembeddingsusingalargeunlabeledcorpusofonlinereviews.Wethenapplythetrained
wordembeddings(1)toenhancetheperformanceoftheCNNand(2)toavoidrepetitivenessamongthe
sentencesselectedformanualreview.
3.AProposedMachineLearningHybridMethodtoIdentifyCustomerNeeds
WeproposeamethodthatusesmachinelearningtoscreenUGCforsentencesrichinadiverse
setofcontext-dependentcustomerneeds.Identifiedsentencesarethenreviewedbyprofessional
analyststoformulatecustomerneeds.Machine-humanhybridshaveproveneffectiveinabroadsetof
![Page 12: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/12.jpg)
10
applications.Forexample,Qian,etal.(2001)combinemachinelearningandhumanjudgmenttolocate
researchwhenauthors’namesareambiguous(e.g.,thereare117authorswiththenameLeiZhang).
Supervisedlearningidentifiesclustersofsimilarpublicationsandhumanreadersassociateauthorswith
theclusters.Theresultinghybridismoreaccuratethanmachinelearningaloneandmoreefficientthan
humanclassification.Colson(2016)describesStitchFix’smachine-humanhybridinwhichmachine
learninghelpscreateashortlistofapparelfromvastcatalogues,thenhumancuratorsmakethefinal
recommendationstoconsumers.
Figure1summarizesourapproach.Theproposedmethodconsistsoffivestages:
1. PreprocessUGC.WeharvestreadilyavailableUGCfromeitherpublicsourcesorpropriety
companydatabases.WesplitUGCintosentences,eliminatestop-words,numbers,and
punctuation,andconcatenatefrequentcombinationsofwords.
2. TrainWordEmbeddings.Wetrainwordembeddingsusingaskip-grammodel(§3.2)on
preprocessedUGCsentences,andusewordembeddingsasaninputinthefollowingstages.
3. IdentifyInformativeContent.Welabelasmallsetofsentencesintoinformative/non-informative,
andthentrainandapplyaCNNtofilteroutnon-informativesentencesfromtherestofthe
corpus.WithouttheCNN,humanreaderswouldsamplecontentrandomlyandlikelyreviewmany
uninformativesentences.
4. SampleDiverseContent.Weclustersentenceembeddingsandsamplesentencesfromdifferent
clusterstoselectasetofsentenceslikelytorepresentdiversecustomerneeds.Thisstepis
designedtoidentifycustomerneedsthataredifferentfromoneanothersothat(1)theprocessis
moreefficientand(2)hard-to-identifycustomerneedsarelesslikelytobemissed.
5. ManuallyExtractCustomerNeeds.Professionalanalystsreviewthediverse,informative
sentencestoidentifycustomerneeds.Thecustomerneedsarethenusedtoidentifynew
opportunitiesforproductdevelopment.
![Page 13: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/13.jpg)
11
FigureA1intheAppendixillustrateseachofthefourstepswithanexampledrawnfor
oneproductreview.Ourarchitectureachievesthesamegoalsasvoice-of-the-customer
approachesinindustry(§2.1).ThepreprocessedUGCreplacesexperientialinterviews,the
automatedsamplingofinformativesentencesisanalogoustomanualhighlightingof
informativecontent,andtheclusteringofwordembeddingsisanalogoustomanual
winnowingtoidentifyasmanydistinctcustomerneedsasfeasible.Methodstoidentifya
hierarchicalstructureofcustomerneedsand/ormethodstomeasurethetradeoffs
(preferences)amongcustomerneeds,ifrequired,canbeappliedequallywelltocustomer
needsgeneratedfromUGCorfromexperientialinterviews.
Figure1 SystemArchitectureforIdentifyingCustomerNeedsfromUGC
3.1.Stage1:PreprocessingRawUGC
PriorexperienceinthemanualreviewofUGCbyprofessionalanalystssuggeststhatsentencesare
mostlikelytocontaincustomerneedsandareanaturalunitbywhichanalystsprocessexperiential
PreprocessUGC
SampleDiverseContent
IdentifyInformativeContent
TrainWordEmbeddings
1. SplitUGCintosentences2. Remove stop-words,punctuation,etc.3. Identifyfrequentcombinationsofwords
1. Estimatewordembeddings onalargeUGCcorpus(skip-grammodel)
1. Labelasmallsampleofsentences intoinformative/non-informative
2. Trainamachine learningclassifier (CNN)3. Identifyinformative contentintherestofthecorpus
Manually ExtractCustomerNeeds
1. Averagewordembeddings tocreatesentenceembeddings
2. Clustersentenceembeddings usingWard’salgorithm3. Sampleonesentence fromeachofYclusters
1. Review theYselected sentencesandformulatecustomerneeds
![Page 14: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/14.jpg)
12
interviewsandUGC.WepreprocessrawUGCtotransformtheUGCcorpusintoasetofsentencesusing
anunsupervisedsentencetokenizerfromthenaturallanguagetoolkit(KissandStrunk2006).We
automaticallyeliminatestop-words(e.g.,‘the’and‘and’)andnon-alphanumericsymbols(e.g.,question
marksandapostrophes),andtransformnumbersintonumbersignsandletterstolowercase.
Wejoinwordsthatappearfrequentlytogetherwiththe‘_’character.Forexample,inoralcare,
thebigram‘OralB’istreatedasacombinedwordpair,’oral_b.’Wejoinwords‘a’and‘b’intoasingle
phraseiftheyappeartogetherrelativelyofteninthecorpus.Thespecificcriterionis:
@ABCD E, G − H@ABCD E ⋅ @ABCD G ⋅ J > L
whereJisthetotalvocabularysize.Thetuningparameter,H,preventsconcatenatingveryinfrequent
words,andthetuningparameter,L,isbalancedsothatthenumberofbigramsisnottoofewortoo
manyforthecorpus.Bothparametersaresetbyjudgment.Forourinitialtest,weset H, L = 5,10 .
Wedropsentencesthatarelessthanfourwordsorlongerthanfourteenwordsafterpreprocessing.The
boundsareselectedtodropapproximately10%oftheshortestand10%ofthelongestsentences.(Long
sentencesareusuallyanartifactofmissingpunctuation.Inourcase,thedroppedsentenceswere
subsequentlyverifiedtocontainnocustomerneedsthatwerenototherwiseidentified.)
Asistypicalinmachinelearningsystems,ourmodelhasmultipletuningparameters.Weindicate
whicharesetbyjudgmentandwhicharesetbycross-validation.Whenwesettuningparametersby
judgment,wedrawontheliteratureforsuggestionsandwechooseparameterslikelytoworkinmany
categories.Whenthereissufficientdata,theseparameterscanalsobesetbycross-validation.
3.2.Stage2:TrainingWordEmbeddingswithaSkip-GramModel
Wordembeddingsarethemappingsofwordsontoanumericalvectorspace,whichincorporate
contextualinformationaboutwordsandserveasaninputtoStages3and4(Baroni,Dinu,and
Kruszewski,2014).Toaccountforproduct-categoryandUGC-source-specificwords,wetrainourword
![Page 15: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/15.jpg)
13
embeddingsonthepreprocessedUGCcorpususingaskip-grammodel(Mikolov,etal.2013a).Theskip-
grammodelisapredictivemodelwhichmaximizestheaveragelog-likelihoodofwordsappearing
[email protected],ifQisthenumberofwordsinthecorpus,Risthesetof
allfeasiblewordsinthevocabulary,and!S ared-dimensionalreal-vectorwordembeddings,weselect
the!S tomaximize:
1Q TAU V WAXYSZ[ WAXYS
\]^[^][_`
a
Sbc
V WAXY[ WAXYS =deV ![!S(
deV !f!S(|h|fbc
Tomakecalculationsfeasible,weuseten-wordnegativesamplingtoapproximatethedenominatorin
theconditionalprobabilityfunction.(SeeMikolov,etal.2013bfordetailsonnegativesampling.)Forour
application,weuseY = 20and@ = 5.
Thetrainedwordembeddingsinourapplicationcapturesemanticmeaninginoralcare.For
example,thethreewordsclosestto‘toothbrush’are‘pulsonic’,‘sonicare’and‘tb’,withthelastbeinga
commonly-usedabbreviationfortoothbrush.Similarly,variationsinspellingsuchas‘recommend’,
‘would_recommend’,‘highly_recommend’,‘reccommend’,and‘recommed’arecloseinthevector
space.
3.3.Stage3:IdentifyingInformativeSentenceswithaConvolutionalNeuralNetwork(CNN)
Dependingonthecorpus,UGCcancontainsubstantialamountsofcontentthatdoesnot
representcustomerneeds.Suchnon-informativecontentincludesevaluations,complaints,andnon-
informativelistsoffeaturessuchas“ThisproductcanbefoundatCVS.”or“Itreallydoescomedownto
personalpreference.”Informativecontentmightinclude:“Thisproductcanmakeyourteethsuper-
sensitive.”or“Theproductistooheavyanditisdifficulttoclean.”Machinelearningimprovesthe
efficiencyofmanualreviewbyeliminatingnon-informativecontent.Forexample,supposethatonly
![Page 16: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/16.jpg)
14
40%ofthesentencesareinformativeinthecorpus,butaftermachinelearningscreening,80%are
informative.Ifanalystsarelimitedinthenumberofsentencestheycanreview(professionalservices
costsconstraint),theycanidentifycustomerneedsmuchmoreefficientlybyfocusingonasampleofj
prescreenedsentencesrichininformativecontentthanonjrandomlyselectedsentences.Withhigher
concentrationofinformativesentences,low-frequencycustomerneedsaremorelikelybefoundinthe
jprescreenedsentencesthaninthejrandomlyselectedsentences.
Totrainthemachinelearningclassifier,somesentencesmustbelabeledbyprofessionalanalysts
asinformative(k = 1)ornon-informative(k = 0).Thereareefficiencygainsbecausesuchlabeling
requiressubstantiallylowerprofessionalservicescoststhanformulatingcustomerneedsfrom
informativesentences.Moreover,inasmall-samplestudy,wefoundthatAmazonMechanicalTurk
(AMT)hasapotentialtoidentifyinformativesentencesfortrainingdataatacostbelowthatofusing
professionalanalysts.Withfurtherdevelopmenttoreducecostsandenhanceaccuracy,AMTmightbea
viablesourceoftrainingdata.
Weuseaconvolutionalneuralnetwork(CNN)toidentifyinformativesentences.Amajor
advantageoftheCNNisthatCNNsquantifyrawinputautomaticallyandendogenouslybasedonthe
trainingdata.CNNsapplyacombinationofconvolutionalandpoolinglayerstowordrepresentationsto
generate“features,”whicharethenusedtomakeaprediction.(“Features”intheCNNshouldnotbe
confusedwithproductfeatures.)Incontrast,traditionalmachine-learningclassificationtechniques,such
asasupport-vectormachineordecisiontrees,dependcriticallyonhandcraftedfeatures,whicharethe
transformationsoftherawdatadesignedbyresearcherstoimprovepredictioninaparticular
application.High-qualityfeaturesrequiresubstantialhumaneffortforeachapplication.CNNshavebeen
proventoprovidecomparableperformancetotraditionalhandcrafted-featuremethods,butwithout
substantialapplication-specifichumaneffort(Kim2014;Lei,Barzilay,andJaakkola2015).
AtypicalCNNconsistsofmultiplelayers.Eachlayerhashyperparameters,suchasthenumberof
![Page 17: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/17.jpg)
15
filtersandthesizeofthefilters.Wecustomselectthesehyperparameters,andthenumberandtypeof
layers,bycross-validation.Eachlayeralsohasnumericalparameters,suchastheparametersofthe
filtersusedintheconvolutionallayers.Theseparametersarecalibratedduringtraining.Wetrainthe
CNNbyselectingtheparametervaluesthatmaximizetheCNN’sabilitytolabelsentencesasinformative
vs.non-informative.
Figure2illustratesthearchitectureoftheCNNinourapplication.Westackaconvolutionallayer,
apoolinglayer,andasoftmaxlayer.ThisspecificationmodifiesKim’s(2014)architectureforsentence
classificationtasktoaccountfortheamountoftrainingdataavailableincustomer-needapplications.
Figure2 ConvolutionalNeuralNetworkArchitectureforSentenceClassification
3.3.1.NumericalRepresentationsofWordsforUseintheCNN
Foreverywordinthetextcorpus,theCNNstoresanumericalrepresentationoftheword.
Numericalrepresentationsofwordsaretherealvectorparametersofthemodelwhicharecalibratedto
improveprediction.TofacilitatetrainingoftheCNN,weinitializerepresentationswithword
embeddingsfromStage2.However,weallowtheCNNtoupdatethenumericalrepresentationsto
enhancepredictiveability(Lample,etal.2016).Inourapplication,thisflexibilityenhancesout-of-
sampleaccuracyofprediction.
TheCNNquantifiessentencesbyconcatenatingwordembeddings.If!S isthewordembedding
forthelmnwordinthesentence,thenthesentenceisrepresentedbyavector!
![Page 18: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/18.jpg)
16
! = !c, … , !p ∈ ℝs×p
whereCisthenumberofwordsinthesentenceandY = 20isthedimensionalityoftheword
embeddings.
3.3.2.ConvolutionalLayer
Convolutionallayerscreatemultiplefeaturemapsbyapplyingconvolutionaloperationswith
varyingfilterstothesentencerepresentation.Afilterisareal-valuedvector,um ∈ ℝs×nv,whereℎmisa
sizeofthefilter.Filtersareappliedtodifferentpartsofthevector!tocreatefeaturemaps(xm):
xm = [@cm, … , @p\nvZcm ]
@Sm = { um ⋅ !S:SZnv\c + Gm
whereDindexesthefeaturemaps,σ ⋅ isanon-linearactivationfunctionwhere{ e = max(0, e),
Gm ∈ ℝisanintercept,and!S:SZnv\cisaconcatenationofrepresentationsofwordsltol + ℎm − 1inthe
sentence:
!S:SZnv\c = [!S, … , !SZnv\c]
Weconsiderfiltersofthesizeℎm ∈ 3, 4, 5 ,andusethreefiltersofeachsize.Thenumberof
filtersandtheirsizeareselectedtomaximizepredictiononthevalidationset.Thenumericalvaluesfor
filters,um,andintercepts,Gm,arecalibratedwhentheCNNistrained.Asanillustration,Figure3shows
howafeaturemapisgeneratedwithafilterofsize,ℎm = 3.Ontheleftisasentence,!,consistingof
fivewords.Eachwordisa20-dimenionalvector(only5dimensionsareshown).Sentence!issplitinto
tripletsofwordsasshowninthemiddle.Representationsofwordtripletsarethentransformedtothe
real-valued@Sm’sinthenextcolumn.TheDmnfeaturemap,xm,isthevectorofthesevalues.Processing
sentencesinthiswayallowstheCNNtointerpretwordsthatarenexttooneanotherinasentence
together.
![Page 19: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/19.jpg)
17
Figure3 ExampleFeatureMap,xÅGeneratedwithaFilter,uÅ,ofSizeÇÅ = É.
3.3.3.PoolingLayer
Thepoolinglayertransformsfeaturemapsintoshortervectors.Theroleofthepoolinglayeristo
reducedimensionalityoftheoutputoftheconvolutionallayertobeusedinthenextlayer.Poolingto
theÑmnlargestfeaturesorsimplyusingthelargestfeaturehasproveneffectiveinNLPapplications
(Collobert,etal.2011).WeselectedÑ = 1withcross-validation.Theoutputofthepoolinglayerisa
vector,Ö,thatsummarizestheresultsofpoolingoperatorsappliedtothefeaturemaps:
Üm = áEe[@cm, … , @p\nvZcm ]
Ö = [Üc, Üà, … , Üâ]
Thevector,Ö ∈ ℝâ,isnowanefficientnumericalrepresentationofthesentenceandcanbeusedto
classifythesentenceaseitherinformativeornotinformative.ThenineelementsinÖrepresentfilter
sizes(3)timesthenumberoffilters(3)withineachsize.
3.3.4.SoftmaxLayer
ThefinallayeroftheCNNiscalledthesoftmaxlayer.Thesoftmaxlayertransformstheoutputof
thepoolinglayers,Ö,intoaprobabilisticpredictionofwhetherthesentenceisinformativeornot
informative.Marketingresearcherswillrecognizethesoftmaxlayerasabinarylogitmodelwhichuses
theÖvectorasexplanatoryvariables.Theestimateoftheprobabilitythatthesentenceisinformative,
![Page 20: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/20.jpg)
18
ä k = 1 Ö ,isgivenby:
ä k = 1 Ö =1
1 + d\ãÖ
Theparametersofthelogitmodel,ã,aredeterminedwhentheCNNistrained.Inourapplication,we
declareasentencetobeinformativeifä k = 1 Ö > 0.5,althoughothercriteriacouldbeusedand
tunedtoatargettradeoff.
3.3.5.CalibrationoftheParametersoftheCNN
Forourapplication,wecalibratetheninefilters,um ∈ ℝs×nv,andthenineintercepts,Gm,inthe
convolutionallayer,andthevectorãinthesoftmaxlayer.Inaddition,wefinetunetheword
embeddings,!ç,toenhancetheabilityoftheCNN’spredictions(e.g.,Kim2014).Wecalibrateall
parameterssimultaneouslybyminimizingthecross-entropyerroronthetrainingsetofprofessionally
labeledsentences(uisaconcatenationoftheum’s):
u, é, ã, ! = EXUáEeu,é,ã,!è(u, é, ã, !)
è u, é, ã, ! = −1ê ëkp TAU kp + 1 − kp TAU 1 − kp
í
pbc
êisthesizeofthetrainingset,kparethemanuallyassignedlabels,andkparethepredictionsofthe
CNN.Theparameter,ë,enablestheusertoweightfalsenegativesmore(orless),thanfalsepositives.
Weinitiallysetë = 1sothatidentifyinginformativesentencesandeliminatingnon-informative
sentencesareweighedequally,butwealsoexamineasymmetriccosts(ë > 1)inwhichweplacemore
weightonidentifyinginformativesentencesthaneliminatinguninformativesentences.
WesolvedtheoptimizationproblemiterativelywiththeRMSPropoptimizeronmini-batchesof
size32andadroprateof0.3.Optimizationterminatedwhenthecross-entropyerroronthevalidation
setdidnotdecreaseoverfiveconsecutiveiterations.SeeTielemanandHinton(2012)fordetailsand
definitionsoftermssuchas“droprate.”
![Page 21: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/21.jpg)
19
3.3.6.EvaluatingthePerformanceoftheCNN
WeevaluatethequalityoftheCNNclassifierusinganìcscore(Wilson,Wiebe,andHoffmann
2005):
ìc =VXd@lîlAC ∙ Xd@ETTñó òôö]SõSúpZôö]ùûû
whereprecisionistheshareofinformativesentencesamongthesentencesidentifiedasinformative
andrecallistheshareofinformativesentencescorrectlyidentifiedbytheclassifier.Accuracy,when
reported,isthepercentofclassificationsthatwerecorrect.
3.4.Stage4:ClusteringSentenceEmbeddingsandSamplingtoReduceRedundancy
UGCisrepetitiveandoftenfocusesonasmallsetofcustomerneeds.Considerthefollowing
sentences:
• “WhenIamdone,myteethdofeel`squeakyclean.’"
• “EverytimeIusetheproduct,myteethandgumsfeelprofessionallycleaned.”
• “Iamstillshockedathowcleanmyteethfeel.”
Thesethreesentencesaredifferentarticulationsofacustomerneedthatcouldbesummarizedas
“Mymouthfeelsclean.”Manualreviewofsuchrepetitivecontentisinefficient.Moreover,
repetitivenessmakesthemanualreviewonerousandboringforprofessionalanalysts,causinganalysts
tomissexcitementcustomerneedsthatarementionedrarely.Iftheanalystsmissexcitementcustomer
needs,thenthefirmmissesvaluablenewproductopportunitiesand/orstrategicpositionings.Toavoid
repetitiveness,weseekto“spantheset”ofcustomerneeds.Weconstructsentenceembeddingswhich
encodesemanticrelationshipsbetweensentences,andusesentenceembeddingstoreduceredundancy
bysamplingcontentformanualreviewfrommaximallydifferentpartsofthespaceofsentence
embeddings.
Researchersoftencreatesentenceembeddingsbytakingasimpleaverageofwordembeddings
correspondingtothewordsinthesentence(Iyyeretal.,2015),explicitlymodelingsemanticand
![Page 22: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/22.jpg)
20
syntacticstructureofthesentenceswithneuralmethods(Tai,SocherandManning2015),ortraining
sentenceembeddingstogetherwithwordembeddings(LeandMikolov,2014).Becauseaveraging
demonstratessimilarperformancetoothermethodsandisbothscalableandtransferable(Iyyeretal.,
2015),weuseaveraginginourapplication.
Beingtheaverageofwordembeddings,sentenceembeddingsrepresentsemanticsimilarity
amongsentences.Forexample,thethreesimilarsentencesmentionedabovehavesentence
embeddingsthatarereasonablyclosetooneanotherinthesentence-embeddingvectorspace.Using
thisproperty,wegroupsentencesintoclusters.WechooseWard’shierarchicalclusteringmethod
becauseitiscommonlyusedinVOCstudies(GriffinandHauser1993),andotherareasofmarketing
research(Dolnicar2003).ToidentifyYsentencesforprofessionalanalyststoreview,wesampleone
sentencerandomlyfromeachofYclusters.Iftheclusteringworkedperfectly,sentenceswithineachof
thejclusterswouldarticulatethesamecustomerneed,andeachofthejclusterswouldproducea
sentencethatananalystwouldrecognizeasadistinctcustomerneed.Inrealdata,redundancyremains,
but,hopefullylessredundancythanthatwhichwouldbepresentinjrandomlysampledsentences.
3.5.Stage5:ManuallyExtractingCustomerNeeds
Toachievehighrelevancyinformulatingabstractcontext-dependentcustomerneeds,thefinal
extractionofcustomerneedsisbestdonebytrainedanalysts.Weevaluatein§5whethermanual
extractionbecomesmoreefficientusinginformative,diversesentencesidentifiedwiththeCNNand
sentence-embeddingclusters.
4.EvaluationofUGC’sPotentialintheOral-CareProductCategory
Weuseempiricaldatatoexaminetwoquestions.(§4)DoesUGCcontainsufficientrawmaterial
fromwhichtoidentifyabroadsetofcustomerneeds?And(§5)Doeachofthemachine-learningsteps
enhanceefficiency?Weaddressbothquestionswithacustomdatasetintheoral-carecategory.We
selectedoralcarebecauseoral-carecustomerneedsaresufficientlyvaried,butnotsonumerousasto
![Page 23: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/23.jpg)
21
overcomplicatecomparisons.Asaproof-of-concepttest,ouranalysesestablishakeyexample.We
discussapplicationsinothercategoriesin§6.
4.1.BaselineComparison:ExperientialInterviewsinOralCare
Weobtainedadetailedsetofcustomerneedsfromanoral-carevoice-of-the-customer(VOC)
analysisthatwasundertakenbyaprofessionalmarketresearchconsultingfirm.Thefirmhasalmost
thirtyyearsofVOCexperiencespanninghundredsofsuccessfulproduct-developmentapplications
acrossawide-varietyofindustries.Theoral-careVOCprovidedvaluableinsightstotheclientandledto
successfulnewproducts.TheVOCwasbasedonstandardmethods:experientialinterviews,with
transcriptshighlightedbyexperiencedanalystsaidedbythefirm’sproprietarysoftware.After
winnowing,customerneedswerestructuredbyacustomer-basedaffinitygroup.Theoutputis86
customerneedsstructuredintosixprimaryand22secondaryneedgroups.Anappendixliststheprimary
andsecondaryneedgroupsandprovidesanexampleofatertiaryneedfromeachsecondary-need
group.Examplesofcustomerneedsinclude:“Oralcareproductsthatdonotcreateanyoddsensations
inmymouthwhileusingthem(e.g.tingling,burning,etc.)”or“MyteethfeelsmoothwhenIglidemy
tongueoverthem.”Suchcustomerneedsaremorethantheircomponentwords;theydescribea
desiredoutcomeinthelanguagethatthecustomerusestodescribethedesiredoutcome.
Theunderlyingexperientialinterviewtranscriptswerebasedonarepresentativesampleoforal
carecustomersandwerenotsubjecttoself-selectionbiases.IfUGCcanidentifyasetofcustomerneeds
thatiscomparabletothebenchmark,thenwehaveinitialevidenceinatleastoneproductcategorythat
UGCself-selectiondoesnotunderminethebasicgoalsoffindingareasonablycompletesetofcustomer
needs.
Professionalanalystsestimatethattheprofessional-servicecostsnecessarytoreview,highlight,
andwinnowcustomerneedsfromexperiential-interviewtranscriptsisslightlymorethanthe
professionalservicescostsrequiredtoreview8,000UGCsentencestoidentifycustomerneeds.The
![Page 24: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/24.jpg)
22
professionalservicescostsrequiredtoreview,highlight,andwinnowcustomerneedsisabout40%-55%
oftheprofessionalservicescostsrequiredtoscheduleandinterviewcustomers.Atthisrate,
professionalanalystscouldreviewapproximately22,000to28,000UGCsentencesusingthemethods
andprofessionalservicescostsinvolvedinatypicalVOCstudy.
4.2.Fully-CodedUGCDatafromtheOral-CareCategory
TocompareUGCtoexperientialinterviewsandevaluateaproposedmachinelearningmethod,
weneededafully-codedsampleofaUGCcorpus.Inparticular,weneededtoknowandclassifyevery
customerneedineverysentenceintheUGCsample.Wereceivedin-kindsupportfromprofessional
analyststogenerateacustomdatasettoevaluateUGCandthemachine-learningefficiencies.Thein-
kindsupportwasapproximatelythatwhichthefirmwouldhaveallocatedtoatypicalVOCstudy—a
substantialtime-and-costcommitmentfromthefirm.
Fromthe115,099oral-carereviewsonAmazonspanningtheperiodfrom1996to2014,we
randomlysampled12,000sentencessplitintoaninitialsetof8,000sentencesandasecondsetof4,000
sentences(McAuley,et.al.2015).Tomaintainacommonleveloftrainingandexperienceforreviewing
UGCandexperientialinterviewtranscripts,thesentenceswerereviewedbyagroupofthree
experiencedanalystsfromthesamefirmthatprovidedtheinterview-basedVOC.Theseanalystswere
notinvolvedintheinitialinterview-basedVOC.UsingateamofanalystsisrecommendedbyGriffinand
Hauser(1993,p.11).
Wechose8,000sentencesforourprimaryevaluationbecausetheprofessionalservicescoststo
review8,000sentencesarecomparable,albeitslightlylessthan,theprofessionalservicescoststo
reviewatypicalsetofexperiential-interviewtranscripts.Forthesesentences,theanalystsfullycoded
everysentencetodeterminewhetheritcontainedacustomerneedand,ifso,whetherthecustomer
needcouldbemappedtoacustomerneedidentifiedbytheVOC,orwhetherthecustomerneedwasa
![Page 25: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/25.jpg)
23
newlyidentifiedcustomerneed.MatchingneedsfromtheUGCtotheinterview-basedneedsisfuzzy.
Forexample,thethreesentencesthatweremappedto“Mymouthfeelsclean.”werejudgedbythe
analyststoarticulatethatcustomerneedeventhoughthewordingwasnotexact(§3.4).
Inadditiontothefully-coded8,000sentences,wewereabletopersuadetheanalyststoexamine
anadditional4,000sentencestofocusonanycustomerneedsthatwereidentifiedbythetraditional
VOC,butnotidentifiedfromtheUGC.Thisseconddatasetenablesustoaddresswhetherthereexist
customerneedsthatarenotinUGCperse,orwhetherthecustomerneedsaresufficientlyrarethat
morethan8,000sentencesarerequiredtoidentifythem.Finally,toassesscodingreliability,weasked
anotheranalyst,blindtothepriorcoding,torecode200sentencesusingtwodifferenttaskdescriptions.
4.3.DescriptiveStatisticsandComparisons
UsingAmazonreviews,thethreehumancodersdeterminedthat52%ofthe8,000sentences
containedatleastonecustomerneedand9.2%ofthesentencescontainedtwoormorecustomer
needs.However,thecorpuswashighlyrepetitive;10%ofthemostfrequentcustomerneedswere
articulatedin54%oftheinformativesentences.Ontheotherhand,17customerneedswerearticulated
nomorethan5timesinthecorpusof8,000sentences.
Weconsiderfirstthe8,000sentences—inthisscenarioanalystsallocateatmostasmuchtime
codingUGCastheywouldhaveallocatedtoreviewexperientialinterviewtranscripts.Thissection
addressesthepotentialoftheUGCcorpus,hence,forthissection,wedonotyetexploitmachine-
learningefficiencies.Fromthe8,000sentences,analystsidentified74ofthe86tertiaryexperiential-
interview-basedcustomerneeds,butalsoidentifiedanadditional8needs.
Wenowconsiderthesetof4,000sentencesasasupplementtothefully-coded8,000
sentences—inthisscenarioanalystsstillallocatesubstantiallylesstimethantheywouldtointerview
customersandreviewtranscripts.Fromthesecondsetof4,000sentences,theanalystsidentified9of
12missingcustomerneeds.With12,000sentences,thatbringsthetotalto83ofthe86experiential-
![Page 26: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/26.jpg)
24
interview-basedcustomerneedsand91ofthe94totalneeds(97%).Inthesecondsetof4,000
sentences,theanalystsdidnottrytoidentifyanycustomerneedsotherthanthe12missingneeds.Had
wehadtheresourcestodoso,wewouldlikelyhaveincreasedthenumberofUGC-basedincremental
customerneeds.Overall,analystsidentified91customerneedsfromUGCand86customerneedsfrom
experientialinterviews.TheseresultsaresummarizedinFigure4.Atleastinoralcare,analyzingUGC
hasthepotentialtoidentifyatleastasmany,possiblymore,customerneedsataloweroverallcostof
professionalservices,evenwithoutmachine-learningefficiencies.Furthermore,becausethe
experiential-interviewbenchmarkisdrawnfromarepresentativesampleofconsumers,thepotentialfor
self-selectioninUGCoral-carepostingsdoesnotseemtoimpairthebreadthofcustomerneeds
containedinUGCsentences.Wecannotruleoutself-selectionissuesforotherproductcategories.
Whenself-selectionisfeared,werecommendanalysesthatbuildonmultiplesourcessuchasthe
methodsdevelopedbySchweidelandMoe(2014).
Figure4. ComparisonofCustomerNeedsObtainedfromExperientialInterviewswith CustomerNeedsObtainedfromanExhaustiveReviewofaUGCSample
WhetherornotcustomerneedsarebasedoninterviewsorUGC,thefinalidentificationofcustomer
needsisbasedonimperfecthumanjudgment.Weaskedananalyst,blindtothepriorcoding,to
evaluate200sentencesusingtwodifferentapproaches.Forthefirstevaluation,theanalyst(1)explicitly
formulatedcustomerneedsfromeachsentence,(2)winnowedthecustomerneedstoremove
duplicates,(3)matchedtheidentifiedcustomerneedstotheinterview-basedhierarchy,(4)addednew
![Page 27: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/27.jpg)
25
needstothehierarchyifnecessary,and(5)mappedeachofthe200sentencestothecustomerneeds.
Forthesecondevaluation,theanalystfollowedthesameproceduresthatproducedFigure4.Thesetwo
evaluationswereconductedtwoweeksapart.
Wecomparethecodesproducedbytheadditionalanalystversusthecodesproducedbythe
threeanalysts.Inter-taskaccuracy(firstvs.secondevaluationbythenewanalyst)was80%,whichis
betterthantheinter-coderaccuracy(newanalystvs.previousanalysts)of70%.Theadditionalanalyst
identified71.4%ofthecustomerneedsthatwerepreviouslyidentifiedbythethreeanalysts.The
additionalanalyst’shitratecomparesfavorablytoGriffinandHauser(1993,p.8)whoreportthattheir
individualanalystsidentified45-68%oftheneeds,wheretheuniversewasallcustomerneedsidentified
bythesevenanalystswhocodedtheirdata.ThisevidencesuggeststhatFigure4isaconservative
estimateofthepotentialoftheUGCasasourceofcustomerneeds.
4.4.PrioritizationofCustomerNeeds
ToaddresswhethertheeightincrementalUGCcustomerneedsand/orthethreeincremental
experiential-interviewcustomerneedswereimportant,weconductedaprioritizationsurvey.We
randomlyselected197customersfromaprofessionalpanel(PureSpectrum),screenedforinterestin
oralcare,andaskedcustomerstoratetheimportanceofeachtertiarycustomerneedona0-to-100
scale.Customersalsoratedwhethertheyfeltthattheircurrentoral-careproductsperformedwellon
thesecustomerneedsona0-to-10scale.SuchmeasuresareusedcommonlyinVOCstudiesandhave
proventoprovidevaluableinsightsforproductdevelopment.(Reviewcitationsin§2.1.)
Table1summarizesthesurveyresults.Onaverage,thecustomerneedsidentifiedinboththe
interviewsandUGCarethemostimportantcustomerneeds.ThosethatareuniquetoUGCoruniqueto
experientialinterviewsareoflowerimportanceandperformance.Wegainfurtherinsightby
categorizingthecustomerneedsintoquadrantsviamediansplits.High-importance-low-performance
customerneedsarealmostperfectlyidentifiedbybothdatasources.Suchcustomerneedsprovide
insightforproductimprovement.
![Page 28: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/28.jpg)
26
Table1. ImportanceandPerformanceScoresforCustomerNeedsIdentifiedfromUGCandfromExperientialInterviews(Imp=Importance,Per=Performance)
Quadrant(mediansplits)
SourceofCustomerNeed
CountAverageImp
AveragePer
HighImp
HighPerHighImpLowPer
LowImpHighPer
LowImpLowPer
InterviewsÇ8,000UGCa 74 65.5 7.85 29 11 11 23
InterviewsÇ4,000UGCb 9 63.9 7.97 6 0 0 3
UGConly 8 50.3 7.12 0 0 1 7
Interviewsonly 3 52.8 7.47 0 1 0 2
aBasedonthefirst8,000UGCsentencesthatwerefully-coded
bBasedonthesecond4,000UGCsentencesthatwerecodedtotestforinterview-identifiedcustomerneeds
Focusingonhighlyimportantcustomerneedsistempting,butwecannotignorelow-importance
customerneeds.Innewproductdevelopment,identifyinghiddenopportunitiesforinnovationoften
leadstosuccessfulnewproducts.Customersoftenevaluateneedsbelowthemediansonimportance
andperformancewhentheyanticipatethatnocurrentproductfulfillsthosecustomerneeds(e.g.,
Corrigan2013).Ifthenewproductsatisfiesthecustomerneed,customersreconsideritsimportance,
andtheinnovatorgainsavaluablestrategicadvantage.Thus,wedefinelow-importance–low-
performancecustomerneedsashiddenopportunities.Bythiscriterion,theUGC-uniquecustomerneeds
identify20%ofthehiddenopportunitiesandtheinterview-uniqueneedsidentify8%ofthehidden
opportunities.Forexample,twoUGC-uniquehiddenopportunitiesare“Anoral-careproductthatdoes
notaffectmysenseoftaste,”and“Anoralcareproductthatisquiet.”Aninterview-basedhidden
opportunityis“Oralcaretoolsthatcaneasilybeusedbyleft-handedpeople.”
Insummary,UGCidentifiesthevastmajorityofcustomerneeds(97%),opportunitiesforproduct
improvement(92%),andhiddenopportunities(92%).UGC-uniqueneedsidentifyatleastsevenhidden
opportunitieswhileinterview-onlyneedsidentifytwohiddenopportunities.Wehavenotbeenableto
identifyanyqualitativeinsightsfromthecomparisonofthecustomerneedsbetweentwosources
suggestingthatthereisnothingsystematicthatismissingintheUGC.TableA2intheappendixlistsall
elevencustomerneedsthatareuniquetoeitherUGCorexperientialinterviews.
![Page 29: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/29.jpg)
27
4.5.TestsofNon-Machine-LearningPrescreeningofUGCData
4.5.1.HelpfulnessRatings
Reviewsareoftenratedbyotherusersbasedontheirhelpfulness.Inourdata,41%ofthereviews
areratedonhelpfulness.Becausehelpfulreviewstendtobelonger,thiscorrespondsto52%ofthe
sentences.Weexaminewhetherornothelpfulreviewsareparticularlyinformativeusingthe8,000fully-
codedsentences.Fifty-fourpercent(54%)ofnon-ratedreviewscontainacustomerneedcomparedto
51%ofratedreviews,48%ofreviewswithratingabovethemedian,and48%ofreviewswithratingin
theupperquartile.Helpfulnessisnotcorrelatedwithinformativeness(ü = −0.01, V = 0.56).Whenwe
examineindividualsentences,weseethatasentencecanberatedashelpful,butnotnecessarily
describeacustomerneed(beinformative).Twoexamplesofhelpfulbutuninformativesentencesare:"I
finallygotthistoothbrushafterIhaveseenalotofpeopleusethem."or"I'msohappyI'mjustabout
besidemyselfwithit!"Overall,helpfulnessdoesnotseemtoimplyinformativeness.
4.5.2NumberofTimesaCustomerNeedisMentioned
Forexperientialinterviews,thefrequencywithwhichacustomerneedismentionedisnot
correlatedwiththemeasuredimportanceofthecustomerneed(GriffinandHauser1993,p.13).
However,inexperientialinterviews,theinterviewerprobesexplicitlyfornewcustomerneeds.Thelack
ofcorrelationmaybeduetoendogeneityintheinterviewingprocess.InUGC,customersdecide
whetherornottopost,hencefrequencymightbeanindicatoroftheimportanceofacustomerneed.
Fororal-care,frequencyofmentionismarginallysignificantlycorrelatedwithimportance(ü = 0.21, V =
0.06).Frequencyofmentionisnotsignificantlycorrelatedwithperformance(ü = 0.09, V = 0.44).
However,ifweweretofocusonlyoncustomerneedswithfrequencyabovethemedianof7.9
mentions,wewouldmiss29%ofthehigh-importancecustomerneeds,44%ofthehigh-performance
customerneeds,and72%ofthehiddenopportunities.Thus,whilefrequencyisrelatedtoimportance,it
doesnotenhancetheefficiencywithwhichcustomerneedsornew-productideascanbeidentified.
![Page 30: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/30.jpg)
28
5.OralCare:EvaluationofMachine-HumanHybridMethod
5.1.CNNtoEliminateNon-InformativeSentences
ThereisatradeofftobemadewhentrainingaCNN.Withalargertrainingsample,theCNNis
betteratidentifyinginformativecontent,butthereisanopportunitycosttousinganalyststoclassify
informativesentences.Fortunately,labelingsentencesasinformativeornotisfasterandeasierthan
identifyingabstractcontext-dependentcustomerneedsfromsentences.Theratiooftimespenton
identifyinginformativesentencesvs.formulatingcustomerneedsisapproximately20%.Furthermore,
asdescribedearlier,exploratoryresearchsuggeststhatAmazonMechanicalTurkmightbeusedasa
lower-costwaytoobtainatrainingsample.
Figure5plotstheF1-scoreoftheCNNasafunctionofthesizeofthetrainingsample.Weconduct
100iterationswherewerandomlydrawatrainingset,traintheCNNwiththearchitecturedescribedin
§3.3,andmeasureperformanceonthetestset.Figure5suggeststhatperformanceoftheCNN
stabilizesafter500trainingsentences,withsomeslightimprovementafter500trainingsentences.We
plotprecisionandrecallasafunctionofthesizeofthetrainingsampleintheappendix,FigureA2.
Figure5. ìcscoreasaFunctionoftheSizeoftheTrainingSample
Totestwhetherwemightimproveperformanceusingalternativenatural-languageprocessing
methods,wetrainamultichannelCNN(Kim2014),asupport-vectormachine,andarecurrentneural
![Page 31: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/31.jpg)
29
networkwithlongshort-termmemorycells(LSTM,HochreiterandSchmidhuber1997).Wealsotraina
CNNwithahigherpenaltyforfalsepositives(g=3)toinvestigatetheeffectofasymmetriccostsonthe
performanceofthemodel.Theevaluationisbasedonthe6,700of8,000fully-codedsentencesthat
remainafterweeliminatedsentencesthatweretooshortandtoolong.Fromthe6,700sentences,we
randomlyselect3,700sentencestotrainthemethodsand3,000toactasholdoutsentencestotestthe
performanceofthealternativemethods.WesummarizetheresultsinTable2.
Table2. AlternativeMachine-LearningMethodstoIdentifyInformativeSentences
Method Precision Recall Accuracy ¢£ConvolutionalNeuralNetwork(CNN) 74.4% 73.6% 74.2% 74.0%
CNNwithAsymmetricCosts(g=3) 65.2% 85.3% 70.0% 74.0%
RecurrentNeuralNetwork-LSTM 72.8% 74.0% 73.2% 73.4%
MultichannelCNN 70.5% 74.9% 71.8% 72.6%
SupportVectorMachine 63.7% 67.9% 64.6% 65.7%
FocusingonF1,theCNNoutperformstheothermethods,althoughtheotherdeep-learning
methodsdoreasonablywell.ConditionedonagivenF1,wefavormethodsthatmissfewerinformative
sentences(higherrecall,attheexpenseofalowerprecision).Thus,insubsequentanalyses,weusethe
CNNwithasymmetriccosts.
Thedeeplearningmethodsachieveaccuraciesintherangeof70-74%,whichislowerthanthat
achievedinsomesentence-classificationtasks.Forexample,Kim(2014)reportsaccuraciesintherange
of45-95%acrosssevendatasetsandeighteenmethods(average80%).Amore-relevantbenchmarkis
thecapabilitiesofthehumancodersonwhichthedeep-learningmodelsaretrained.Thedeep-learning
modelsachievehigheraccuracyidentifyinginformativesentencesthantheinter-coderaccuracyof70%.
Theabstractcontext-dependentnatureofthecustomerneedsappearstomakeidentifyinginformative
contentmoredifficultthantypicalsentence-classificationtasks.
Tobeeffective,theCNNshouldbeabletocorrectlyidentifybothsentencesthatcontain
frequentlymentionedcustomerneedsandsentencesthatcontainrarelymentionedcustomerneeds.
![Page 32: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/32.jpg)
30
Weconductiterationstoevaluatethisproperty.Ineachiteration,werandomlysplitthe6,700
preprocessedsentencesinto3,700trainingand3,000holdoutsentences,andtraintheCNNusingthe
trainingset.Wethencomparetheneedsintheholdoutsentencesandtheneedsinthesentences
identifiedbytheCNNasinformative.Onaverageoveriterations,theCNNidentifiedsentenceswith
100%ofthefrequentlymentionedcustomerneeds,91%oftherarelymentionedcustomerneeds,and
84%ofthecustomerneedsthatwerenewtotheholdoutdata.Becauseallcustomerneedswere
identifiedinatleastoneiteration,weexpectthesepercentagestoapproach100%ifitwerefeasibleto
expandtheholdoutsetfrom3,000sentencestoalargernumberofsentences,suchasthe12,000
sentencesusedinFigure4.
5.2.ClusteringSentenceEmbeddingstoReduceRedundancy
InStage4oftheproposedhybridapproach,weencodeinformativesentencesintoa20-
dimensionalreal-valuedvectorspace(sentenceembeddings),groupsentenceembeddingsintoY
clusters,andsampleonesentencefromeachcluster.Tovisualizewhetherornotsentenceembeddings
separatethecustomerneeds,weuseaprinciplecomponentsanalysistoprojectthe20-dimensional
sentenceembeddingsontotwodimensions.Informationislostwhenweprojectfrom20dimensionsto
twodimensions,butthetwo-dimensionalplotenablesustovisualizewhethersentenceembeddings
separatesentencesarticulatingdifferentcustomerneeds.(Weuseprinciplecomponentsanalysispurely
asavisualizationtooltoevaluateStage4.Thedimensionalityreductionisnotapartofourapproach.)
Figure6reportstheprojectionfortwoprimaryneeds.Theaxescorrespondtothefirsttwo
principalcomponents.Thereddotsaretheprojectionsofsentenceembeddingsthatwerecoded(by
analysts)asbelongingtotheprimarycustomerneed:“strongteethandgums.”Thebluecrossesare
sentenceembeddingsthatwerecodedas“shopping/productchoice.”(ReviewTableA1inthe
appendix.)Theovalsrepresentthesmallestellipsesinscribing90%ofthecorrespondingset.Figure6
![Page 33: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/33.jpg)
31
suggeststhat,whilenotperfect,theclustersofsentenceembeddingsachievedseparationamong
primarycustomerneedsand,hence,arelikelytoreduceredundancyandenableanalyststoidentifya
diversesetofcustomerneedswhentheyanalyzeYsentences,eachchosenfromoneofYclusters.
Samplingdiversesentenceslikelyincreasestheprobabilitythatlow-frequencycustomerneedsare
containedinasampleofjsentences.
Figure6. Projectionsof20-DimensionalEmbeddingsofSentencesontoTwoDimensions(PCA).
DotsandCrossesIndicateAnalyst-CodedPrimaryCustomerNeeds.
5.3.GainsinEfficiencyDuetoMachineLearning
Weseektodeterminewhethertheproposedcombinationofmachine-learningmethods
improvesefficiencyofidentifyingcustomerneedsfromUGC.Efficiencyisimportantbecausethe
reducedtimeandcostsenablemorefirmstouseadvancedVOCmethodstoidentifynewproduct
opportunities.Efficiencyisalsoimportantbecauseitenhancestheprobabilityofidentifyinglow-
frequencyneedsgivenaconstraintonthenumberofsentencesthatanalystscanprocess.
Inourapproach,machinelearninghelpstoidentifycontentforreviewbyprofessionalanalysts.
![Page 34: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/34.jpg)
32
Wecomparecontentselectionapproachesintermsoftheexpectednumberofuniquecustomerneeds
identifiedinYsentences.Thebaselinemethodforselectingsentencesforreviewiscurrentpractice—a
randomdrawfromthecorpus.ThesecondmethodusestheCNNtoidentifyinformativesentences,and
thenrandomlysamplesinformativesentencesforreview.Thethirdmethodusesthesentence-
embedding-clusterstoreduceredundancyamongsentencesidentifiedasinformativebytheCNN.For
eachmethod,andforeachvalueofY,we(1)randomlysplitthe6,700preprocessedsentences,which
areneithertooshortnortoolong,into3,700trainingand3,000hold-outsamples,(2)traintheCNN
usingthetrainingsample,and(3)drawYsentencesfromthehold-outsampleforreview.Wecountthe
uniqueneedsidentifiedintheYsentencesandrepeattheprocess10,000times.Anupperboundforthe
numberofcustomerneedsidentifiedintheYsentencesisthenumberofcustomerneedscontainedin
3,000hold-outsentences—thisisfewercustomerneedsthanarecontainedintheentirecorpus.
From3,000sentencesintheholdoutsample,thelargestpossiblevalueofYforwhichwecan
evaluatetheCNNisthenumberofsentencesthattheCNNclassifiedasinformative.Thenumberof
sentencesidentifiedbytheCNNasinformativevariesacrossiterations,andinourexperimentthe
minimumis1,790sentences.WhileitistemptingtoconsiderYinthefullrangefrom0to1,790,itwould
bemisleadingtodoso.AtY=1,790,therewouldbe1,790clusters—thesamenumberasifwesampled
allavailableinformativesentences.Tominimizethissaturationeffectontheoral-carecorpus,we
considerY={200,300,…,1200}toevaluateefficiency.
ThebluedashedlineinFigure7reportsbenchmarkperformance.TheCNNimprovesefficiency
asindicatedbythereddottedline.UsingtheCNNandclusteringsentenceembeddingsincreases
efficiencyfurtherasindicatedbythesolidblackline.OvertherangeofY,therearegainsduetousing
theCNNtoeliminatenon-informativesentencesandadditionalgainsduetousingsentenceembeddings
toreduceredundancywithinthecorpus.
WealsointerpretFigure7horizontally.Thebenchmarkrequires,onaverage,824.3sentencesto
![Page 35: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/35.jpg)
33
identify62.4customerneeds.Ifweprescreenwithmachinelearningtoselectnon-redundant
informativesentences,analystscanidentifythesamenumberofcustomerneedsfromapproximately
700sentences—85%ofthesentencesrequiredbythebaseline.Theefficienciesareevengreaterat200
sentences(78%)and400sentences(79%).Atprofessionalbillingratesacrossmanycategories,this
representssubstantialtimeandcostsavingsandcouldexpandtheuseofVOCmethodsinproduct
development.VOCcustomer-needidentificationmethodshasbeenoptimizedoveralmostthirtyyears
ofcontinuousimprovement;weexpectthemachine-learningmethods,themselves,tobesubjectto
continuousimprovementastheyareappliedinthefield.
FigureA3intheAppendixprovidescomparableanalysesforlower-frequencyandforhigher-
frequencycustomerneedsusingamediansplittodefinefrequency.Asexpected,efficiencygainsare
greaterforlower-frequencycustomerneeds.FigureA4pushesthecomparisonfurthertotheleast
frequentcustomerneeds(lowest10%)andforthosecustomerneedsuniquetoUGC.Asexpected,
machine-learningefficienciesareevengreaterfortheleast-frequentcustomerneeds.
Figure7. EfficienciesamongVariousMethodstoSelectUGCSentencesforReview
![Page 36: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/36.jpg)
34
5.4.ScalabilityoftheMachine-LearningMethods
Theproposedmethodsscalewell.Withatrainingsamplesizeof1,000-4,000,theCNNtypically
convergesin20-30epochs(stochasticgradientdescentiterations)anddoessoinunderaminuteona
standardMacBookPro.WeusethefastclusterpackageimplementationoftheWard’sclustering
algorithm.Theasymptoticworst-casetimecomplexityis§ êà .Inourexperiments,clusteringof
500,000informativesentenceswascompletedinunder5minutes.Onceprogrammed,themethodsare
relativelyeasytoapplyasindicatedbytheapplicationsin§6.
5.5.EfficiencyGainsintermsoftheProfessionalServicesCosts
ProfessionalservicescostsdominatetheexpensesinatypicalVOCstudy.Analystsandmanagers
estimatethatthesecostsareallocatedabout40%tointerviewingcustomers,40-55%toidentifyingand
winnowingcustomer-needsfromtranscripts,and5-20%toorganizingcustomerneedsintoahierarchy
andpreparingthefinalreport(§4.1).UGCeliminatesthefirst40%(§4.2).Theproposedmachine-
learninghybridapproachallowsa15-22%reductioninthetimeallocatedtoidentifyingandwinnowing
customerneeds(§5.3).Applyingourmethodsthuseliminatesapproximately46%-52%oftheoverall
professionalservicescosts.Thesearethesubstantialsavingstothefirmanditsclients,whichcan
facilitatemarketresearchfornewproductdevelopment.Furthermore,machine-learningmethods
enhancetheprobabilitythatthelowest-frequencycustomerneedsareidentifiedwithinagivencost
constraint.Thelowest-frequencycustomerneedsmaybethecustomerneedsthatleadtonewproduct
success.
6.AdditionalApplications
Theproposedhuman-machinehybridmethodshavebeenappliedthreemoretimesforproduct
development.Inallcases,thefirmidentifiedattractivenewproductideas.
Kitchenappliances.Duringthisapplication,thefirmidentified7,000onlineproductreviews
![Page 37: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/37.jpg)
35
containingmorethan18,000sentences.Thefirmwantedtoevaluatetheefficiencyofthemachine
learningmethodanddevotedsufficientresourcestomanuallyreview4,000sentences.Fromthese,
2,000sentenceswereselectedrandomlyfromthecorpusand2,000wereselectedusingmachine-
learningmethods.Thetwosetsofsentencesweremerged,processedtoidentifyuniquecustomer
needs(blindtosource),andthenre-splitbysource.Ninety-seven(97)customerneedswereidentifiedin
themachine-learningcorpusand84customerneedswereidentifiedintherandomcorpus.While66
customerneedswereinbothcorpora,moreuniquecustomerneeds(31)wereidentifiedfromthe
machine-learningcorpusthanfromtherandomcorpus(18).Thefirmfoundthecombinedcustomer
needsextremelyhelpfulandwillcontinuetouseUGCinthefuture.Inparticular,insightsobtainedfrom
UGCtendedtobeclosertothecustomer’smomentofexperience.Customerspostwhentheexperience
isfreshintheirminds.Thesepostsaremorelikelytodescribemalfunctions,difficultiesinuseorrepair,
challengeswithcustomerservice,oruniquesurprises.Suchcustomerneedsareoftenamongthemost
usefulcustomerneedsforproductdevelopment.
Skintreatment.Thiswasapureapplicationinwhichthefirmidentifiedarelevantsetofover
11,000onlinereviews,usedmachine-learningtoselectsentencesforreview,andthenidentified
customerneedsfromtheselectedsentences.Thefirmusedafollow-upquantitativestudytoassessthe
importancesofthecustomerneeds.Importantcustomerneeds,thatwerepreviouslyunmetbyany
competitor,providedthebasisforthefirmtooptimizeitsproductportfoliowithnewproduct
introductions.Thefirmfeelsthatithasenhanceditsabilitytocompetesuccessfullyinthemarketfor
skin-treatment.
Preparedfoods.Oneofthelargestprepared-foodfirmsinNorthAmericaappliedmachine
learningtoanalyzeacombinedcorpusofover500,000sentencesextractedfromitssocial-listeningtool
andover10,000sentencesfromproductreviews.Thesociallisteningsourcesincludedforums,blogs,
micro-blogs,andsocialmedia.Theproductreviewswereobtainedfromfivedifferencesources.Inthis
![Page 38: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/38.jpg)
36
application,thereweresynergiesbetweensocial-listeningUGCandproduct-reviewUGCwithabout
two-thirdsofthecustomerneedscomingfromoneortheothersource.BycombiningthetwoUGC
corpora,thefirmidentifiedmorethanthirtycategoriesofcustomerneedstoprovidevaluableinsight
forbothnewproductdevelopmentandmarketingcommunications.Asaresult,thefirmisnowapplying
themachine-humanhybridmethodtoadjacentcategories.
7.Discussion,Summary,andFutureResearch
Weaddressedtwoquestions:(1)CanUGCbeusedtoidentifyabstractcustomerneeds?And(2)
canmachinelearningenhancetheprocess?Theanswertobothquestionsisyes.UGCisatleasta
comparablesourceofcustomerneedstoexperientialinterviews—likelyabettersource.Theproposed
machine-learningarchitecturesuccessfullyeliminatesnon-informativecontentandreducesredundancy.
Inourinitialtest,machinelearningefficiencygainsare15-22%,butsuchgainsarelikelytoincreasewith
moreresearch.OverallgainsofanalyzingUGCwithourapproachoverthetraditionalinterview-based
VOCare46-52%.
Answeringthesequestionsissignificant.Everyyearthousandsoffirmsrelyonvoice-of-the-
customeranalysestoidentifynewopportunitiesforproductdevelopment,todevelopstrategic
positioningstrategies,andtoselectattributesforconjointanalysis.Typically,VOCstudies,while
valuable,areexpensiveandtime-consuming.Time-to-marketsavings,suchasthosemadepossiblewith
machinelearningappliedtoUGC,areextremelyimportanttoproductdevelopment.Inaddition,UGC
seemstocontaincustomerneedsnotidentifiedinexperientialinterviews.Newcustomerneedsmean
newopportunitiesforproductdevelopmentand/ornewstrategicpositioning.
WhileweareenthusiasticaboutUGC,werecognizethatUGCisnotapanacea.UGCisreadily
availablefororalcare,butUGCmightnotbeavailableforeveryproductcategory.Forexample,consider
specializedmedicaldevicesorspecializedequipmentforoilexploration.Thenumberofcustomersfor
![Page 39: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/39.jpg)
37
suchproductsissmallandsuchcustomersmaynotblog,tweet,orpostreviews.Ontheotherhand,
UGCisextensiveforcomplexproductssuchasautomobilesorcellularphones.Machine-learning
efficienciesinsuchcategoriesmaybenecessarytomakethereviewofUGCfeasible.
Althoughourresearchfocusesondevelopingandtestingnewmethods,wearebeginningto
affectindustry.Furtherresearchwillenhanceourabilitytoidentifyabstractcontext-dependent
customerneedswithUGC.Forexample,
• DeepneuralnetworksandsentenceembeddingsareactiveareasofresearchintheNLP
community.Weexpecttheperformanceoftheproposedarchitecturetoimprovesignificantly
withnewdevelopmentsinmachinelearning.
• UGCisupdatedcontinuously.FirmsmightdevelopprocedurestomonitorUGCcontinuously.
Sentenceembeddingscanbeparticularlyvaluable.Forexample,firmsmightconcentrateon
customerneedsthataredistantfromestablishedneedsinthe20-dimenionalvectorspace.
• Futuredevelopmentsmightautomatethefinalstep,oratleastenhancetheabilityofanalyststo
abstractcustomerneedsfrominformative,non-redundantcontent.
• OtherformsofUGC,suchasblogsandTwitterfeeds,maybeexaminedforcustomerneeds.We
expectblogsandTwitterfeedstocontainmorenon-informativecontent,whichmakesmachine
learningfilteringevenmorevaluable.
• Self-selectiontopostUGCisaconcernandanopportunitywithUGC.Fororalcare,the
effectivenessofproductreviewsdidnotseemtobediminishedbyself-selection,atleast
comparedtoexperientialinterviewsofarepresentativesetofcustomers.Inothercategories,
suchasthefoodcategoryin§6,self-selectionandanon-representativesampleissuesmighthave
alargereffect.Firmsmightexaminemultiplechannelsforacompletesetofcustomerneeds.
• Fieldexperimentsmightassesswhether,andtowhatdegree,abstractcontext-dependent
customerneedsprovidemoreinsightsforproductdevelopmentthaninsightsobtainedfromlists
ofwords.
![Page 40: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/40.jpg)
38
• AmazonMechanicalTurkisapromisingmeanstoreplaceanalystsforlabelingtrainingsentences,
butfurtherresearchiswarranted.
![Page 41: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/41.jpg)
39
References
AkaoY(2004)QualityFunctionDeployment(QFD):Integratingcustomerrequirementsintoproduct
design,(NewYork,NY:ProductivityPress).
ArchakN,GhoseA,IpeirotisPG(2016)Derivingthepricingpowerofproductfeaturesbymining
consumerreviews,ManagementScience.57(8):1485-1509.
AlamI.,PerryC.(2002)Acustomer-orientednewservicedevelopmentprocess.Journalofservices
Marketing.16(6):515-534.
BaroniM,DinuG,KruszewskiG(2014)Don'tcount,predict!Asystematiccomparisonofcontext-
countingvs.context-predictingsemanticvectors.Proceedingsofthe52ndAnnualMeetingofthe
AssociationforComputationalLinguistics.Baltimore,MD.238-247.
BrownSL,EisenhardtKM(1995)Productdevelopment:Pastresearch,presentfindings,andfuture
directions.TheAcademyofManagementReview.20(2):343-378.
Büschken,J,AllenbyGM(2016)Sentence-basedtextanalysisforconsumerreviews.MarketingScience.
35(6):953-975.
ChanL-K,WuM-L(2002)QualityFunctionDeployment:AliteratureReview.EuropeanJournalof
OperationalResearch.143:463-497.
ChiuJP,NicholsE(2016).NamedentityrecognitionwithbidirectionalLSTM-CNNs.Transactionsofthe
AssociationforComputationalLinguistics4:357–370.
CollobertR,WestonJ,BottouL,KarlenM,KavukcuogluK,PavelK(2011)Naturallanguageprocessing
(almost)fromscratch.JournalofMachineLearningResearch.12:2493-2537.
ColsonE(2016)Humanmachinealgorithms:InterviewwithEricColson.http://blog.
fastforwardlabs.com/2016/05/25/human-machine-algorithms-interview-with-eric.html.
CorriganKD(2013)Wisechoice:Thesixmostcommonproductdevelopmentpitfallsandhowtoavoid
![Page 42: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/42.jpg)
40
them.MarketingNews.(September)39-44.
DolnicarS(2003)Usingclusteranalysisformarketsegmentation–typicalmisconceptions,established
methodologicalweaknessesandsomerecommendationforimprovement.AustralasianJournalof
MarketResearch.11(2):5-12.
dosSantosCN,GattiM(2014)Deepconvolutionalneuralnetworksforsentimentanalysisofshorttexts.
Proceedingsthe25thInternationalConferenceonComputationalLinguistics:TechnicalPapers.
Dublin,Ireland,69–78,
FaderPS,WinerRS(2012)Introductiontothespecialissuesontheemergencecanimpactofuser-
generatedcontent.MarketingScience.31(3):369-371.
GoffinK,VarnesCJ,vanderHovenC,KonersU(2012)Beyondthevoiceofthecustomer:Ethnographic
marketresearch.ResearchTechnologyManagement.55(4):45-53.
GreenPE,SrinivasanV(1978)Conjointanalysisinconsumerresearch:issuesandoutlook.Journalof
ConsumerResearch5(2):103-123.
GriffinA.,HauserJR(1993)Thevoiceofthecustomer.MarketingScience.12(1):1-27.
GriffinA,PriceRL,MaloneyMM,VojakBA,SimEW(2009)Voicesfromthefield:howexceptional
electronicindustrialinnovatorsinnovate.JournalofProductInnovationManagement.26:222-240.
Harris,Z.S.(1954)Distributionalstructure.Word,10(2-3),146-162.
HauserJR,ClausingD(1988)Thehouseofquality.HarvardBusinessReview.66(3):63-73.
HerrmannA,HuberF,BraunsteinC(2000)Market-drivenproductandservicedesign:Bridgingthegap
betweencustomerneeds,qualitymanagement,andcustomersatisfaction.InternationalJournal
ofProductionEconomics.66(1):77-96.
HochreiterS,SchmidhuberJ(1997)Longshort-termmemory.NeuralComputation.9(8):1735-1780.
IyyerM,ManjunathaV,Boyd-GraberJ,DauméIIIH.(2015)Deepunorderedcompositionrivalssyntactic
![Page 43: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/43.jpg)
41
methodsfortextclassification.Proceedingsofthe53rdAnnualMeetingoftheAssociationfor
ComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguage
Processing,Beijing,China.1:1681-1691.
JiaoJ,ChenCH(2006)Customerrequirementmanagementinproductdevelopment:areviewof
researchissues.ConcurrentEngineering:ResearchandApplications.14(3):173-185.
JinJ,HiP,LiuY,andLimSCJ(2015)Translatingonlinecustomeropinionsintoengineeringcharacteristics
inQFD:Aprobabilisticlanguageanalysisapproach.EngineeringApplicationsofArtificial
Intelligence.41:115-127.
KanoN,SerakuN,TakahashiF,TsujiS(1984)Attractivequalityandmust-bequality.TheJapanese
SocietyforQualityControl14(2):39-48.
KaoGroup(2016).http://www.company-histories.com/Kao-Corporation-Company-History.html.
KaulioMA(1998)Customer,consumeranduserinvolvementinproductdevelopment:Aframeworkand
areviewofselectedmethods.TotalQualityManagement.9(1):141-149.
KimY(2014)Convolutionalneuralnetworksforsentenceclassification.arXivpreprintarXiv:1408.5882.
KimDS,BaileyRA,HardtN,AllenbyA(2017)Benefit-basedconjointanalysis.MarketingScience,
36(1):54-69.
KissT,StrunkJ(2006)Unsupervisedmultilingualsentenceboundarydetection.Computational
Linguistics,32(4):485-525.
KrishnanV,UlrichKT(2001)Productdevelopmentdecisions:Areviewoftheliterature.Management
Science.47(1):1-21.
KuehlN(2016)Needmining:Towardsanalyticalsupportforservicedesign.InternationalExploring
ServicesScience.247:187-200.
LampleG,BallesterosM,SubramanianS,KawakamiK,DyerC(2016)Neuralarchitecturesfornamed
![Page 44: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/44.jpg)
42
entityrecognition.Proceedingsof2016NorthAmericanChapteroftheAssociationfor
ComputationalLinguistics:HumanLanguageTechnologies.SanDiego,CA:260-270.
LeQV,MikolovT(2014)Distributedrepresentationsofsentencesanddocuments.Proceedingsofthe
31stInternationalConferenceonMachineLearning,Beijing,China,32,1188-1196.
LeeTY,BradlowET(2011)Automatedmarketingresearchusingonlinecustomerreviews.Journalof
MarketingResearch.48(5),881-894.
LeiT,BarzilayR,JaakkolaT(2015)MoldingCNNsfortext:non-linear,non-consecutiveconvolutions.
Proceedingsof2015ConferenceonEmpiricalMethodsinNaturalLanguageProcessing.Lisbon,
Portugal.1565–1575.
MatzlerK,HinterhuberHH(1998)Howtomakeproductdevelopmentprojectsmoresuccessfulby
integratingKano'smodelofcustomersatisfactionintoqualityfunctiondeployment.Technovation.
18(1):25-38.
McAuleyJ,PandeyR,LeskovecJ(2015)Inferringnetworksofsubstitutableandcomplementary
products.Proceedingsofthe21thACMSIGKDDInternationalConferenceonKnowledgeDiscovery
andDataMining.ACM,785-794.
MikolovT,ChenK,CorradoG,DeanJ(2013a)Efficientestimationofwordrepresentationsinvector
space.arXiv:1301.3781v3[cs.CL]mSept7,1301.3781.
MikolovT,SutskeverI,ChenK.,CorradoGS,DeanJ(2013b)Distributedrepresentationsofwordsand
phrasesandtheircompositionality.AdvancesinNeuralInformationProcessingSystems.26,3111–
3119.
MikulićJ,PrebežacD(2011).AcriticalreviewoftechniquesforclassifyingqualityattributesintheKano
model.ManagingServiceQuality.21(1):46-66.
NetzerO,FeldmanR,GoldenbergJ,FreskoM.(2012)Mineyourownbusiness:Market-structure
![Page 45: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/45.jpg)
43
surveillancethroughtextmining.MarketingScience.31(3),521-543.
NguyenTH,GrishmanR(2015)Relationextraction:Perspectivefromconvolutionalneuralnetworks.
ProceedingsofNorthAmericanChapteroftheAssociationforComputationalLinguistics:Human
LanguageTechnologies.Denver,CO.39-48.
OrmeBK(2006)Gettingstartedwithconjointanalysis:Strategiesforproductdesignandpricing
research,2E.(MadisonWI:ResearchPublishersLLC).
ParkCW,JaworskiBJ,MacInnisDJ(1986)Strategicbrandconcept-imagemanagement.Journalof
Marketing.50:135-145.
PengW,SunT,RevankarS(2012).Miningthe`voiceofthecustomer’forbusinessprioritization.ACM
TransactionsonIntelligentSystemsandTechnology.3(2),38:1-38-17.
QianY-N,HuY,CuiJ,NieZ(2001)Combiningmachinelearningandhumanjudgmentinauthor
disambiguation.Proceedingsofthe20thACMConferenceonInformationandKnowledge
Management.Glasgow,UnitedKingdom.
SchaffhausenCR,KowalewskiTM(2015).Large-scaleneedfindingmethodsofincreasinguser-generated
needsfromlargepopulations.JournalofMechanicalDesign.137(7):071403.
SchaffhausenCR,KowalewskiTM(2016)Assessingqualityofunmetuserneeds:effectsofneed
statementcharacteristics.DesignStudies.44:1-27.
SchweidelDA,MoeWW(2014)Listeninginonsocialmedia:Ajointmodelofsentimentandvenue
formatchoice.JournalofMarketingResearch51(August):387-402.
SocherR,PerelyginA,WuJY,ChuangJ,ManningCD,NgAY,PottsC(2013)Recursivedeepmodelsfor
semanticcompositionalityoverasentimenttreebank.ProceedingsoftheConferenceonEmpirical
MethodsinNaturalLanguageProcessing(EMNLP).StroudsburgPA.1631-1642.
StoneRB,KurtadikarR,VillanuevaN,ArnoldCB(2008)Acustomerneedsmotivatedconceptualdesign
![Page 46: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/46.jpg)
44
methodologyforproductportfolioplanning.JournalofEngineeringDesign.19(6):489-514.
SullivanLP(1986)Qualityfunctiondeployment.QualityProgress.19(6),39-50.
TaiKS,SocherR,ManningCD(2015)Improvedsemanticrepresentationsfromtree-structuredlong
short-termmemorynetworks.Proceedingsofthe53stAnnualMeetingonAssociationfor
ComputationalLinguistics.Stroudsburg,PA.1556-1566.
TirunillaiS,TellisGJ(2014)Miningmarketingmeaningfromonlinechatter:Strategicbrandanalysisof
bigdatausingLatentDirichletAllocation.JournalofMarketingResearch.51:463-479.
TielemanT,HintonG(2012)Lecture6.5-rmsprop:Dividethegradientbyarunningaverageofitsrecent
magnitude.COURSERA:NeuralNetworksforMachineLearning,4.
UlrichKT,EppingerSD(2016)Productdesignanddevelopment,6E.(NewYork,NY:McGraw-Hill).
UrbanGL,HauserJR(1993)DesignandMarketingofNewProducts,2E.(EnglewoodCliffs,NJ:Prentice-
Hall).
WilsonT,WiebeJ,HoffmannP(2005)Recognizingcontextualpolarityinphrase-levelsentimentanalysis.
ProceedingsoftheConferenceOnHumanLanguageTechnologyandEmpiricalMethodsinNatural
LanguageProcessing.VancouverBC.347-354.
WuH-H,ShichJI(2010)ApplyingrepertorygridstechniqueforknowledgeelicitationinQualityFunction
Deployment.QualityandQuantity.44:1139-1149.
YingY,FeinbergF,WedelM(2006)Leveragingmissingratingstoimproveonlinerecommendation
systems.JournalofMarketingResearch43(August):355-365.
ZahayD,GriffinA,FredericksE(2004)Sources,uses,andformsofdatainthenewproductdevelopment
process.IndustrialMarketingManagement.33:657-666.
![Page 47: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/47.jpg)
A1
Appendix
TableA1. VoiceoftheCustomerforOralCareasObtainedfromExperientialInterviews(22examplesofthe86tertiarycustomerneedsareshown—oneforeachsecondarygroup.Afulllistoftertiarycustomerneedsisavailablefromtheauthors.)
PrimaryGroup SecondaryGroup #Needs ExamplesofTertiaryCustomerNeeds(22of86shown)
FeelCleanAndFresh(Sensory)
CleanFeelinginMyMouth 4 MymouthfeelscleanFreshBreathAllDayLong 4 IwakeupwithoutfeelinglikeIhavemorningbreathPleasantTasteandTexture 3 Oralcareliquids,gels,pastes,etc.aresmooth(notgrittyorchalky)
StrongTeethAndGumsPreventGingivitis 5 OralcareproductsandproceduresthatminimizegumbleedingAbletoProtectMyTeeth 5 OralcareproductsandproceduresthatpreventcavitiesWhiterTeeth 4 Canavoiddiscolorationofmyteeth
ProductEfficacy
EffectivelyCleanHardtoReachAreas 3 Abletoeasilygetallparticles,eventhetiniest,outfrombetweenmyteethGentleOralCareProducts 4 Oralcareitemsaregentleanddon’thurtmymouthOralCareProductsthatLast 3 It’sclearwhenIneedtoreplaceanoralcareproduct(e.g.toothbrush,floss)ToolsareEasytoManeuverandManipulate 6 Easytograspanyoralcaretool—itwon’tslipoutofmyhand
KnowledgeAndConfidence
KnowledgeofProperTechniques 5Iknowtherightamountoftimetospendoneachstepofmyoralcareroutine
LongTermOralCareHealth 4 IamawareofthebestoralcareroutineformeMotivationforGoodCheck-Ups 4 IwanttobemotivatedtobemoreinvolvedwithmyoralcareAbletoDifferentiateProducts 3 IknowwhichproductstouseforanyoralcareissueI’mtryingtoaddress
ConvenienceEfficientOralCareRoutine(Effective,Hassle-FreeandQuick)
7 Oralcaretasksdonotrequiremuchphysicaleffort
OralCare“AwayFromtheBathroom” 5 TheoralcareitemsIcarryaroundareeasytokeepclean
Shopping/ProductChoice
FaithintheProducts 5 BrandsoforalcareproductsthatarewellknownandreliableProvidesaGoodDeal 2 IknowI’mgettingthelowestpricefortheproductsI’mbuyingEffectiveStorage 1 Easytokeepextraproductsonhand(e.g.packagedsecurely,doesn’tspoil)EnvironmentallyFriendlyProducts 1 EnvironmentallyfriendlyproductsandpackagingEasytoShopforOralCareItems 3 OralcareitemsIwantareavailableatthestorewhereIshopProductAesthetics 5 Productsthathavea“cool”orinterestinglook
NotetoTableA1.Eachcustomerneedisbasedonanalysts’fuzzymatching.Forexample,thecustomerneedof“Iwanttobemotivatedtobemoreinvolved
withmyoralcare”isbasedonfourteensentencesintheUGC,including:“Savesmoneyandtime(andmotivatesmetoflossmore)...”“Thisflosswasabletodo
theimpossible:getmetoflosseveryday.”“Makesflossingmuchmoreenjoyableerr...tolerable…”“…thistoolisthelazyperson'sanswertoflossing.”
![Page 48: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/48.jpg)
A2
FigureA1. DemonstrationoftheApplicationoftheProposedMachineLearningHybridApproachtoanAmazonReview
![Page 49: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/49.jpg)
A3
FigureA2. PrecisionandRecallasaFunctionoftheSizeoftheTrainingSample
(a) Precision (b)Recall
NotetoFigureA2.Below500sentences,theconfidenceboundsonrecallarelargeinFigureA2.Theeffectontheconfidenceboundson!"(Figure5)isasymmetric.!"isacompromisebetweenprecisionandrecall.Wheneitherprecisionorrecallislow,!"islow.Whenrecallisextremelyhigh,precisionislikely
tobelow,hence!"willalsobelow.Thisexplainswhythelowerconfidenceboundfor500sentencesinFigure5isextremelylow,buttheupperconfidence
boundtracksthemedianwell.
![Page 50: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/50.jpg)
A4
TableA2. CompleteSetofCustomerNeedsthatWereUniquetoEitherUGCorExperientialInterviews
CustomerNeedsUniquetoUGC CustomerNeedsUniquetoExperientialInterviews
Easywaytochargetoothbrush. Oralcaretoolsthatcanbeeasilyusedbyleft-handedpeople.
Anoralcareproductthatisquiet. IamabletotellifIhavebadbreath.
Responsivecustomerservice(e.g.,alwaysanswersmycalloremail,
doesn'tmakemewaitlongforaresponse).
Advicethatisregularlyupdatedsothatitisrelevanttomycurrentoral
careneeds—recognizesthatneedschangeasIage.
Anoralcareproductthatdoesnotaffectmysenseoftaste(e.g.
doesn'taffectmytastebuds).
Oralcarethathelpsmequitsmoking.
Easytostoreproducts.
Maintenanceandrepairsaresimpleandquick.
Customerservicecanalwaysresolvemyissue.
![Page 51: Timoshenko Hauser: Customer needs from UGC June 2018](https://reader031.vdocuments.site/reader031/viewer/2022012411/616ae15cd8269f2ed41f32a4/html5/thumbnails/51.jpg)
A5
FigureA3. EfficienciesamongVariousMethodstoSelectUGCSentencesforReview(Low-andHigh-FrequencyCustomerNeeds)
FigureA4. MachineLearningHybridCanEfficientlyIdentifytheLeastFrequentCustomerNeedsandCustomerNeedsUniquetoUGC