acre text categorization platform greg brewster, phd associate professor, school of computing cto,...
TRANSCRIPT
![Page 1: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/1.jpg)
ACREText Categorization Platform
Greg Brewster, PhDAssociate Professor, School of Computing
CTO, Vertical Data LLC
CSC 594 – Text Mining and AnalyticsOctober 15, 2015
1October 15, 2015 CSC 594
![Page 2: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/2.jpg)
ACRE is…
The Auto-Categorization and Retrieval
Engine:
A scalable Text Document Labeling system
Python (command line) && web2py (web – in
development)
A product (soon) of Vertical Data, LLC
Patent pending
A research project on
Hybrid Text classification methods
Medical NLP 2October 15, 2015 CSC 594
![Page 3: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/3.jpg)
Unstructured Text
Businesses must monitor and manage a massive inflow of unstructured text.
Documents
Social Media
Big Data3October 15, 2015 CSC 594
![Page 4: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/4.jpg)
What Everybody Wants…
Unstructured Text => Structured Data
This can be done by Labeling Text
(ties in with Network Technologies – Multi-Protocol Label Switching (MPLS))
4October 15, 2015 CSC 594
![Page 5: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/5.jpg)
What Labels Let You Do
Filtering Data Routing Monitor / Notify Auto-Categorization Clustering Prioritization Etc.
5October 15, 2015 CSC 594
![Page 6: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/6.jpg)
Label Models An ACRE Label Model analyzes each text
item and selects one or more values from a list or Label Tree (taxonomy), using:
Natural Language Processing Stemming, Part-of-Speech analysis, Named Entity
extraction, synonym substitutions, drop/go lists Pattern Extraction / Rules
Keywords/Pattern match determine label value Machine Learning
Similarity to Trained Word Clouds determine values
Executing 1 Label Model adds 1 column of results (labels) to the Label Table.
![Page 7: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/7.jpg)
Model Execution ResultsEach Label Model Adds a Column
7
Survey Results
Labeled Survey Results
ACRE
Models Executed1. Topic2. Sentiment3. Before_Room4. Alarm_Words
![Page 8: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/8.jpg)
Why ACRE? Adds Value! => Creates new structured
data from existing unstructured data Designed from the ground up for text
analysis as opposed to numerical analytics with text add-on.
Human-guided analysis Intuitive model definition, use and results Combined/extended models Iterative model improvement Fast prototyping Scalable
8
![Page 9: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/9.jpg)
Hierarchical Labeling: Label Value Trees
9
• Each text item labeled with one or more leaf values • Hierarchical evaluation during ML Stage• Tree model for aggregated results visualization
October 15, 2015 CSC 594
![Page 10: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/10.jpg)
Labeling and Indexing Indexing for Search has undergone
massive growth 1980/90s – Isolated topic-specific search
engines 1990/00s – Spiders index WWW – Google search Today – Continuous indexing by OS
Labeling is poised for similar growth and ubiquitous deployment
Today – Isolated auto-classification within specific applications.
Tomorrow – Auto-Labeling of all enterprise data for enhanced search, data filtering, monitoring.
10October 15, 2015 CSC 594
![Page 11: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/11.jpg)
Previous Label Model Examples
Sentiment Model Possible labels = {Positive, Negative, Neutral}
Language Model {English, Spanish, French}
Retention Model {Discard, Retain, Discoverable, Legal Hold}
Emotion Model {Happy, Satisfied, Enthusiastic, Complaining,
Angry, Threatening, Violent} Themes Model, Spam Model, Threat
Analysis Model.
11
![Page 12: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/12.jpg)
Model Library The Model Library contains tested Label Models. Users can copy models and edit for their own use. Users can contribute models.
12
Label Model LibraryLabel Name Description Label Values
RetentionShould data be retained and for how long?
Discard, Retain, Retain: 30 days, Retain: 90 days
Sentiment Overall tone of text Positive, Neutral, Negative
EmotionIdentifies Presence of Words with High Emotional Content
Angry, Anxious, Sad, Happy, Grateful, Enthusiastic, Threatening, None
HashTags List of Hashtags ("#tag") in Extracted text with pattern "#text"
Affi nitiesIdentifies common user interests
MLB, NBA, NFL, Golf, Cars, Shopping, New York, Cooking, Technology, etc.
![Page 13: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/13.jpg)
Social Media Labeling 5 Label Models / 4 Stakeholders
13October 15, 2015 CSC 594
![Page 14: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/14.jpg)
ACRE for Web (Nov. 2015)
14
![Page 15: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/15.jpg)
ACRE with SaaS (2016)
15
Deploy local or cloud. BigML for Text!
![Page 16: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/16.jpg)
Origin of ACRE
Dr. Peter Jackson’s work Pioneered legal document
classification at Thomson Reuters in 1990s
Eliminated >90% of human effort.
ACRE Additions: Combined ES-ML
modeling ‘Vertical’ Expertise
Medical Security
16October 15, 2015 CSC 594
![Page 17: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/17.jpg)
Why is Text Analytics Hard?
Lots of Domain-Specific details Many models are ad-hoc and not very reusable Hard to generalize techniques Extensive dictionaries and topics lists often
required Requires 2 experts: Modeler and Subject
Expert These two are often far apart in knowledge base Requires a lot of time from both
Multiple techniques used – often in combination
NLP Search / Rules Machine Learning
17
![Page 18: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/18.jpg)
Combining ES – ML Models Combined models can reduce effort and yield
improved accuracy: ES Model trains ML Model
Each labeling result from ES model is used as a training instance for ML model
ES Model failsover-to ML Model Evaluate each text input with ES model first,
then fail over to ML model if no ES result. Iterative improvement: Add ES rules to
improve ML model results incrementally (**Next steps**) ML Model => Rules
18October 15, 2015 CSC 594
![Page 19: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/19.jpg)
ACRE Labeling Process
19October 15, 2015 CSC 594
![Page 20: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/20.jpg)
Pattern Models
Metadata Extraction by Pattern Decision Manager defines “Patterns”, which are:
A context pattern containing an extraction pattern in parentheses.
If the context pattern is matched, then extraction pattern match becomes the label value
Example: Extract social security numbers Pattern: “(\n{3}-\n{2}-\n{4})”
Example: Extract hashtags Pattern: “(#\w*)”
Example: Extract word before “Room” Pattern: “(\w*) Room”
20October 15, 2015 CSC 594
![Page 21: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/21.jpg)
Hashtag Extraction Results
21October 15, 2015 CSC 594
![Page 22: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/22.jpg)
Rules Associate text patterns with label value.
When text is processed, rule match assigns the corresponding label value.
Current ACRE Tools: Rules
22October 15, 2015 CSC 594
![Page 23: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/23.jpg)
Rules Definition File
23
• Patterns are regular expressions connected by booleans• If multiple Pattern matches in a given document, then:
• If option Multi_Values = True, then all values retained.
• If option Multi_Values = False, then Value with greatest sum of matched rule weights is selected.
Category Value Pattern WeightSentiment Positive Good 1Sentiment Positive Great 1.5Sentiment Positive "Not bad" 0.4Sentiment Positive Gr[eio].* 1Sentiment Positive Best & Good 1.7Sentiment Negative Terrible 2Sentiment Negative Awful 1.5Sentiment Negative unhappy 1
October 15, 2015 CSC 594
![Page 24: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/24.jpg)
Results Report (CSV)
24October 15, 2015 CSC 594
![Page 25: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/25.jpg)
ACRE Machine Learning Current: Nearest Word Cloud Machine
Learning Model is “trained” on example text for each
label. ACRE applies NLP Processing to each text item to isolate
the best terms to be used in analysis Model stores “Trained Word Cloud” (aggregated term
frequencies calculated from training) for each label value.
Combined model option: Train ML model using Rule matches
To choose label for new text, ACRE calculates Confidence value for each possible label, based on vector similarity between new text cloud and Trained Word Cloud.
Next: ML vector can be constructed from any data fields (structured or unstructured)
25
![Page 26: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/26.jpg)
Categorizing “Sentiment” using ACRE ML Algorithm
26
Compare New Document Word Cloud to the Trained Word Clouds for Positive and Negative. Pick best match, or Neutral if no good match.
October 15, 2015 CSC 594
![Page 27: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/27.jpg)
NLP Processing for ML Vocabulary
To eliminate noise and increase accuracy, it is essential to minimize the “vocabulary” of words included in ML analysis.
ACRE’s NLP Processing does this: *Stemming groups different word forms (run, running, ran) Part-of-Speech (POS) tags word use (noun, verb, adjective, ..) Named Entity (NE) tags names of people, places, institutions,
.. Synonym processing groups words with identical meaning Filtering eliminates data records that match filter rules *Drop List specifies words, phrases, stems, POS tags or NE
tags that should be eliminated from analysis Go List specifies words, phrases, stems, POS tags or NE tags
that must be included in analysis
* Implemented now
27
![Page 28: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/28.jpg)
Tuning Parameters Options for tuning the ML algorithm:
(**More research here**)
Drop List options Term Weights IDF (Inverse Document Frequency)
options Confidence Threshold Max Term caps Similarity Measures
Cosine Similarity, MSE, Absolute error
28October 15, 2015 CSC 594
![Page 29: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/29.jpg)
Matching with Search and “Find Similar”
Search finds all data records for search query, filtered by Labels, POS, NE results.
“Find Similar” uses ML Confidence results to match similar data records.
Step 1: Create new ML model and train it on a single reference data record.
Step 2: Execute the ML model on other data records, generating Confidence factor for each one, measuring its similarity to the reference.
Step 3: Sort results by Confidence to group closest matches together.
29
![Page 30: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/30.jpg)
Example – Find Similar Dataset: 5000 CCN-International tweets. Find
Similar results for first tweet listed.
30
![Page 31: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/31.jpg)
Example: Language Labeling
Dataset: 3625 survey responses from CFI Group. Objective: Determine language of each
response. Results: For each language: Number of
responses, list of responses, word cloud. Approach:
First, Rules-only analysis using common words in each language
Second, combined model where Rules train ML model and then ES fails-over to ML.
31October 15, 2015 CSC 594
![Page 32: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/32.jpg)
Example: Language Labeling
Define Language Label Model Language = {English, Spanish, French} Multi_Values = False (1 language per response)
Create simple Rules: Spanish: (todo OR perfecto OR siempre OR luces OR
excelente) French: (troit OR avec OR vendeuse OR normaux OR
ajouter) English: (clothes OR store OR best OR service OR jeans
OR price OR sale) Experiment #1: Rules-only results:
English: 1237 items Spanish: 38 items French: 9 item Unlabeled: 2346 items 32October 15, 2015 CSC 594
![Page 33: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/33.jpg)
Example: Language Labeling
Experiment #2: Combined model where Rules train ML model and ES fails-over to ML.
Results: English: 3540 items Spanish: 76 items French: 9 items Unlabeled: 5 items (below Threshold)
Combined ES-ML analysis correctly labels Language for nearly every survey response.
(**Research**) What other labeling decisions are well-suited for combined analysis?
33October 15, 2015 CSC 594
![Page 34: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/34.jpg)
Software History May, 2013: Vertical Data, LLC,
incorporated October, 2014: ACRE v1.2 release
Categorization search engine on a Windows/SQL 2014 Server .Net platform – suspended.
July 6, 2015: ACRE v1.3 deployed! Command-line categorization models using
Python/NLTK. November, 2015: ACRE for Web
Web2py deployment of ACRE 2016: ACRE 2.0
Full GUI, visualizations, SaaS interface.34
![Page 35: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/35.jpg)
ACRE v1.3 – Category Model Contents
35
ACRE Label Model (.alm) File contents
![Page 36: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/36.jpg)
ACRE Model Execution Example
36
![Page 37: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/37.jpg)
ACRE v1.3 Example – Label Table Results
37
![Page 38: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/38.jpg)
ACRE v1.3 Example – Summary Results (per model)
38
RoomBathLocationServiceOther
![Page 39: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/39.jpg)
ACRE v1.3: Term Frequency Results
Term frequency table and Term Word Cloud for
every node in the Category Tree.
39
![Page 40: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/40.jpg)
ACRE v1.3 Model Execution Modes Rules-Only Execution:
Select labels based on rule pattern matches Deterministic, auditable, reproducible.
Classic ML Execution: Train using exemplars for each value Select labels based on Nearest Word Cloud
(NWC) Rules with ML Fill-in
Train using rule matches (labeling patterns) Select labels based on rule pattern matches –
then if no rule match, then select based on NWC.
ML with Seeding Patterns: Train using rule matches (seed patterns) Select labels based on Nearest Word Cloud
(NWC)
40
![Page 41: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/41.jpg)
ACRE v1.3 for You!
I can provide Prof. Tomuro with a Python distribution of ACRE v1.3 for CSC 594 students to try out by next week (10/20/2015) if you are interested.
User documentation: http://vertical-data.com Click on PRODUCTS User’s Guide, Command Reference, Options
Reference available.
41
![Page 42: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/42.jpg)
ACRE v2.0 Wire Frames
42
![Page 43: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/43.jpg)
ACRE v2.0 – Word Cloud Viewer
43
![Page 44: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/44.jpg)
Reviewing Word Clouds
44
![Page 45: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/45.jpg)
Reviewing Word Clouds
45
![Page 46: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/46.jpg)
Reviewing Word Clouds
46
![Page 47: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/47.jpg)
Graphing Results
47
![Page 48: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/48.jpg)
Graphing Results
48
![Page 49: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/49.jpg)
Ongoing Trials
Vertical Data has 2 product trials once Web version releases in November: Optimization Group for Customer Survey data Mt. Sinai hospital / HealthIX research study of
radiology electronic medical records: Extracting Findings and SNOMED diagnosis codes
from doctor’s comments. Literature: 2007 Medical NLP Challenge results on
extracting ICD9 codes from radiology reports.
49October 15, 2015 CSC 594
![Page 50: ACRE Text Categorization Platform Greg Brewster, PhD Associate Professor, School of Computing CTO, Vertical Data LLC CSC 594 – Text Mining and Analytics](https://reader036.vdocuments.site/reader036/viewer/2022062423/5697bf781a28abf838c81ff1/html5/thumbnails/50.jpg)
Conclusions
ACRE provides a set of useful tools for text analytics, categorization and search.
I can provide a distribution of ACRE v1.3 (command line) for CSC 594 use by 10/20.
Contact me ([email protected]) if you would like to work on: Research on combined ES/ML model
performance Research on Medical NLP.
50October 15, 2015 CSC 594