metadata mining
DESCRIPTION
Metadata Mining. Masoud Makrehchi Supervisor: Prof. Mohamed Kamel May 2004. Outlines. Metadata Mining Proposed Approach Experimental Results Future Research. Metadata. Metadata Definition: data about data, for example a library catalogue Metadata Application - PowerPoint PPT PresentationTRANSCRIPT
Metadata Mining Metadata Mining
Masoud MakrehchiSupervisor:
Prof. Mohamed Kamel
May 2004
Mas
oud
Mak
rehc
hi2
of
60
Outlines
• Metadata Mining• Proposed Approach• Experimental Results• Future Research
Mas
oud
Mak
rehc
hi3
of
60
Metadata
• Metadata Definition: – data about data, for example a library catalogue
• Metadata Application– cataloging (Item and Collections) – resource discovery – e-commerce and digital signatures– Intelligent Software Agents – content rating – intellectual property rights (IP rights)– Semantic Web – Learning Objects (LO): LOM standards such as IEEE
LOM, DC, SCORM, CANCORE
Metadata Mining
Mas
oud
Mak
rehc
hi4
of
60
Metadata
Metadata Mining
Map legend
LO metadata
Library catalogueRDF
Mas
oud
Mak
rehc
hi5
of
60
Metadata
Metadata Mining
Mas
oud
Mak
rehc
hi6
of
60
Metadata Specifications
Metadata Mining
Content-basedData Mining
Context-basedData Mining
Conceptual data architecture
Mas
oud
Mak
rehc
hi7
of
60
Metadata Specifications
• More structured (usually semi-structured)• Low dimensional (~400,000 vs. 15,791
terms)• More homogenous than raw data• Less noisy (stopwords)• Less time-varying
Metadata Mining
Mas
oud
Mak
rehc
hi8
of
60
Metadata Specifications
Metadata Mining
Mas
oud
Mak
rehc
hi9
of
60
Motivations
• No access to the raw data• The raw data is inconvenient for
mining (heterogeneous formats and non-text format)
• Diversity of metadata standards, the need to merge different metadata repositories
• Attempt to use background knowledge in mining process
Metadata Mining
Mas
oud
Mak
rehc
hi10
of
60
• Either alternative or complementary approach, depends on having access to the raw data
• Metadata enrichment (keyword extraction & automatic title generation), towards automatic metadata generation
• Semi-structured data mining, such as XML,RDF, OWL and other semantic tagged scripts
• Semi-automatic Ontology extraction• information integration based on metadata
(LOs aggregation and integration)
Applications
Metadata Mining
Mas
oud
Mak
rehc
hi11
of
60
Outlines
• Metadata Mining• Proposed Approach• Experimental Results• Future Research
Mas
oud
Mak
rehc
hi12
of
60
• There is a corpora or a metadata collection, usually labeled
• No natural Language Processing (NLP), and no computational linguistic
• No semantic analysis and processing• Metadata as a text document, no graph,
tree or other data structures• No dictionaries, thesauruses, and any
other global vocabularies• Using only frequency-based measures
Assumptions
Proposed Approach
Mas
oud
Mak
rehc
hi13
of
60
• Mining metadata as a text document using classical document mining techniques by considering the background knowledge which is represented by proposed data model and using fuzzy approach to capture expert knowledge
Problem Statement
Proposed Approach
Mas
oud
Mak
rehc
hi14
of
60
• Metadata as a text document (semi-structured format)
• Using only statistical measures– frequency-based measures (the location
and the order of words don’t matter)
Metadata Representation
Proposed Approach
Mas
oud
Mak
rehc
hi15
of
60
Metadata Representation
Mas
oud
Mak
rehc
hi16
of
60
Metadata Representation
Mas
oud
Mak
rehc
hi17
of
60
Metadata Representation
Title’s objective semantic
Mas
oud
Mak
rehc
hi18
of
60
Metadata Representation
Whole document’s semantic about title
Mas
oud
Mak
rehc
hi19
of
60
Metadata Representation
Title’s subjective semantic
Title’s objective semantic
Mas
oud
Mak
rehc
hi20
of
60
• The meaning of a metadata is broken into a set of objective semantics which can be easily extracted using a parser (to extract contents and attributes of every tag)
• Adopting document vector space model (to use all classical document mining techniques)
Metadata Representation
Metadata representation model =a set of objective semantics
+ Salton’s vector space model
Multiple-partition document vector space model
Mas
oud
Mak
rehc
hi21
of
60
Metadata Representation
Concepts about the domain
(discourse)
Expert
Document
Background knowledge
Mas
oud
Mak
rehc
hi22
of
60
Metadata Representation
Concepts about the domain
(discourse)
Expert
Background knowledge- subjective
title
abstract
subject
author
Background knowledge- objective
Mas
oud
Mak
rehc
hi23
of
60
Metadata as a text document
The mining process is translated into a classical document mining.
Metadata mining has no advantages because the background knowledge has been ignored.
Metadata Representation
Proposed Approach
Mas
oud
Mak
rehc
hi24
of
60
Metadata Representation
Metadata a a Multiple-partition Text document
The mining process is translated into a classical document mining
Considering the background knowledge
Proposed Approach
Mas
oud
Mak
rehc
hi25
of
60
Metadata Representation
• Vector Space Model- fuzzy representation
di
Vocabulary
Proposed Approach
Mas
oud
Mak
rehc
hi26
of
60
Metadata Representation
• Multi-Partition Vector Space Model
di
Vocabulary
To model the objective semantic(a potion of background knowledge)
Proposed Approach
Mas
oud
Mak
rehc
hi27
of
60
Metadata Representation
• Multi-Partition Vector Space Model
Proposed Approach
Mas
oud
Mak
rehc
hi28
of
60
Metadata Representation
• Converting to standard vector space model
Proposed Approach
Mas
oud
Mak
rehc
hi29
of
60
Metadata Representation
• Weight of each partition– To be determined by expert, for
example:• Wabstract=1.0,
• Wtitile=1.5.
Proposed Approach
Mas
oud
Mak
rehc
hi30
of
60
Metadata Representation
• Membership degree of each term in every partition– By expert (considering the vocabulary),– Statistical (considering the collection)
• Absolute frequency-based measures (like tfidf),
• Relative frequency-based (Geometric) measures (location of each term in the partition).
Proposed Approach
Mas
oud
Mak
rehc
hi31
of
60
• Types of Frequency Measures – Within partition– Within document: by document-term
frequency (like tf)– Within class: by class-term frequency
(like term significance)– Within collection: by collection-term
frequency (like mean of term significances)
Frequency Measures
Proposed Approach
Mas
oud
Mak
rehc
hi32
of
60
Class-Term Matrix
• Document-Term Matrix – The matrix is very large. (thousands of
documents in the collection and millions of terms in the vocabulary),
– The matrix is sparse. Usually only small number of elements in the matrix are non zero (zipf's law),
– The matrix is dual with respect to terms and documents.
Proposed Approach
Mas
oud
Mak
rehc
hi33
of
60
Class-Term Matrix
• Class-Term Matrix– The matrix is large. (tens of classes and
millions of terms in the vocabulary),– The matrix is less sparse,– The matrix is dual with respect to terms
and classes.
Proposed Approach
Mas
oud
Mak
rehc
hi34
of
60
Class-Term Matrix
Significance factor:Representing the relationship between a term and a class
Proposed Approach
Mas
oud
Mak
rehc
hi35
of
60
Concept Terminology
Terminology• All terms which occur in a class (or concept)• A fuzzy set of all terms in the vocabulary
Proposed Approach
Mas
oud
Mak
rehc
hi36
of
60
Term Definition
Definition• All concepts (classes) which the term belongs to• A fuzzy set of all concepts (classes)
Proposed Approach
Mas
oud
Mak
rehc
hi37
of
60
Term Relationships
Proposed Approach
Mas
oud
Mak
rehc
hi38
of
60
Fuzzy Similarity
For example: “sum” and “product” will be similar, if they are partially (fuzzily) co-occurring.
Similarity is a bi-directional relationship.
Proposed Approach
Mas
oud
Mak
rehc
hi39
of
60
Fuzzy Inclusion
For example: “web” will include “world”, if “world” occurs wherever “web” appears.
Inclusion is one-directional relationship.
Proposed Approach
Mas
oud
Mak
rehc
hi40
of
60
Fuzzy Document Model
• Fuzzy document representation– a fuzzy set of all terms in the vocabulary– obtaining Keyword set for the document;
either a threshold on the fuzzy set or term categorization
Proposed Approach
Mas
oud
Mak
rehc
hi41
of
60
• Term Categorization: Categorizing all terms into three main groups– Features: Most frequent terms within a class– Keywords: More frequent terms within some
documents belonging to a given class– Stopwords: More frequent terms in all classes
Term Categories
Document
Collection
Class
features
keywords
stopwords
Proposed Approach
Mas
oud
Mak
rehc
hi42
of
60
• Stopwords: not contributing to the meaning of the document– General stopwords (a, an, the, in, …)– Domain-specific stopwords
• Politics: Government, State• Medicine: Patient• Education: Learner, Instructor• Social sciences: Society• Anthropology: Human
Domain-specific Stopwords
Proposed Approach
Mas
oud
Mak
rehc
hi43
of
60
Class-Collection Map
Introducing Class-Collection Map to visualize the location of each category
Features
Keywords
Stopwords
Within-collection Frequency
Within
-cla
ss F
requ
ency
Proposed Approach
Mas
oud
Mak
rehc
hi44
of
60
Class-Collection Map
Introducing Class-Collection Map to visualize the location of each category
Proposed Approach
Features
Keywords
Stopwords
Within-collection Frequency
Within
-cla
ss F
requ
ency
Mas
oud
Mak
rehc
hi45
of
60
Class-Collection Map
Introducing Class-Collection Map to visualize the location of each category
Proposed Approach
Features
Keywords
Stopwords
Within-collection Frequency
Within
-cla
ss F
requ
ency
Mas
oud
Mak
rehc
hi46
of
60
• A model for metadata representation: Multiple- partition document vector space model
• Class-term model, representing the relationship between classes and the vocabulary
• New fuzzy representations for documents, terms and concepts (word definition and terminology)
Contributions
Proposed Approach
Mas
oud
Mak
rehc
hi47
of
60
• Class-collection map to visualize the distribution of terms
• Representing the keyword set of a document by a fuzzy model
• New definitions for fuzzy similarity, fuzzy inclusion, and fuzzy term-relation based on fuzzy term definition
• A framework for extracting domain-specific stopwords
Contributions
Proposed Approach
Mas
oud
Mak
rehc
hi48
of
60
Outlines
• Problem Statement: Metadata Mining• Proposed Approach• Experimental Results• Future Research
Mas
oud
Mak
rehc
hi49
of
60
Data Set
• Citeseer computer science directory (http://citeseer.ist.psu.edu/directory.html)
• ~400,000 terms (vocabulary size) • 17 classes• 2,912 documents• Instead of data (in PDF or PS), we collected BibTeX
data (kind of metadata or catalogue) and abstracts of the articles.
Preliminary Results
Mas
oud
Mak
rehc
hi50
of
60
Data Set
Preliminary Results
Mas
oud
Mak
rehc
hi51
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
abilablabnorm
absenc
absolut
abstractaccept accessaccumulaccur
accuraci
achiev
acmacquiracquisit across
action
activactivat
actualacycl
adaboost adapt
addaddit
adequadjustadopt advancadvantagadversariadvocag aggreg
ai
aimalalcnr
algorithm
alialiasalign
allow
alonalong
altern
amount
analog
analysianalyt analyzandrzej
animanneal
annotanomali
answ eranticipappar appear
appli
applicapprentic
approach
appropriapproxim
ararbitragarbitrari
architectur
area
argu
argumentarisenarticul
artif iciascensascent
aspect
assert
assess
asset
assign
assistassociassum
assumpt
attempt
attentattract
attributaugment
automat
autonom
availaverag
backendbackground
backtrack
backupbag
balancballardband
base
basibasicbasisfunctbatteri
bayesian
bayesiannetw orkbeck
becom
begun
behavior
behaviourbehindbelief
bellmanbenchmark
bestbetter
beyondbibliographibinari
biologbipartitbivaribless blockbodi
boost
bootstrapbottleneckboundboundaribrain
branchbreiman
brief
brieflibringbroad
brookbrow sbuffalo buildbuiltbypass
cachcalculi
calculucalendar
call
cameracapaccapturcarin
carri
case
castcategoricenter
centr
central
certaincft
chainchair challengchang
character
characteristchargercheapchemicchenchildrenchosenchricl
class
classic
classifclassif i
clausclear
climbclosecloser
cnco code
coevolv
cognit collectcollis
combin
combinatori
come
commit common
commonlicommonsens
commun
compact
companicompar
comparisoncompetcompetit compilcomplet
complex
componcomposcompresscompromis
comput
computationconcav
concentr
concept
conceptstoconcern
conclusconcret
condit
configurconflictconfus
conjunctconnect
connectionconnelconsensu
consid
consider
consistconstrain constraintconstruct
consum
contain
contentcontext continu
contourcontradict
contrast
contribut
control
conveiconveniconventconvergconvert
convexcooccurr
coopercoordincore
correct
correspond
costcostlicounterpart creatcriteria criticcrossov
crucial
current
curscurvcut
cvdcvdss
daidarpa
data
databasdayandbn
dedeadlock dealdecaddeceptdecid
decis
declar
decomposit
default
defin
definitdegredelet
demonstr
densiti
depend
depth
deriv
describ
descript
design
desir
detaildetectdetermindeterminist
develop
diagnosidiagnostdietterich
differ
diff icult
dimensdimension
diminishdirect
directlidisciplindiscovdiscoveridiscrepdiscret
discrimin
discussdisjunctdistinctdistinguish
distribut
divers
divid
divisdnf document
domain
donalddonohodow nw ard
dpdraft
dramat
drawdraw ndrp
duedure
dynamearliereas
easieasili
edg
editedu
effect
eff icieffort
eightelectrelimin
em
emerg
empir
emploi
enablencodendenergi enginenhancensembl
entail
enterprisentirentitientropi environepipolar
equalequaterror
especiessentiestablish
estim
et evalu
evid
evolut
evolutionari
exactexactli
examin
exampl
execut
exemptexhibit
exist
existentiexpectexpens experi
experimentexpert
explicitexplicitli
exploit
explor
expressexpressionless extendextens
extern
extract
extractorextremfabface
facilitfactfactorfailfaithfamili
familiarfarfast
faster
favorfeasibl
featur
februarifeedforw ardfew
field
fifteenfigur
filinskif illf iller f ilter
f inal
f ind
finerfinin
first
f irstlif ischerfisher
fit
f ivefixf lavorfle
flexiblf luent focu
focusfocussfollowforc
formformalformer
formulformula
forw ard
found
foundatfour
fradet framew ork
freefrequencfreundfritzsonfrontalfruitfulli
full
function
fundamentfurther
furthermor
fuse futur
ga
gaingamegap
gather
gaussian
gelfond
gener
genesereth
genet
genotypgentnergeographgeometr
geometri
gio
give
given
global
goal
goodgracefulli
gradient
graduatgraingram
graph
graphicgraphplan
greatgreedi
group
guaranteguess guid
hand
handl
hard
hardesthcheadheavili
heterogen
heurist
hidden
hierarch
hierarchi high
highlihillhillclimbhinder
hinton
historhistorihitachi
hmmholdhomographi
hope
horn
htnhuman
hybridhypothesiid
idea
idealidentif i
ignorii illustrilp imagimagebasimit
immedi implement
import
importantliimposimposs
impract
improv
inadequincept
includinconsistincorpor increas
increasingliindepend
indexindirectindistinguish
inducinduct
industriinertia
infer
inflex
inform
inhibit initi
input
inqueriinsightinstabl
instancinstantiinsteadinsuff ici
integr
intelligintendew d interact
interest
interfac
intergrintern internetinteroper
interpretintersect
intract
introduc
introductintrusinvariinvest
investig
involvion
irrelev
isiisol
issuitem
iterjacobjaijamejanuarijob jointjordanjulijumpkaelblkalmankautzkbann keep
keikernelkeyw ordkind
kit
kl
know ledg
know ledgebas
know n
kongkqml
kushilevitz
label
lacklamarckianlambda
languag
languageforlaplacian
larg
largerlaterlatterlatticlaw lead
learn
learnabl
learner
ledleedlemmaiz
length levellifolifschitz
lightlikelikelihoodlimit
line
linear
linguistlink
lisp
list
literatur
littl
lm locallocatlockhe
logic
longlookaheadlookup losslowlyma
machin
mademae
magnitud
mainmainstreammaintain
majormake
malici managmanifest
manipulmanner
mansourmapmargin
markov
marylandmassivmatchmathematmaximmaximummaxqmazemcguirmckai meanmeasur
mechanmedic
membership
memorimerit messagmet meta
metalearn
metaphor
method
methodolog
methodw hichmichael
min
mine
minimminimumminormirror
miss mixtur mobil
model
modelbasmodermodifimodulmodular
monitormonoton motionmotivmotorola multi
multiagmultidisciplinari
multigroupmultilay
multiplmultitudmultivalumuseum
mutat
mutualmysteri
naiv namenarrow
natur
nearnearlinecess
necessari neednegnegat
neither
netw orkneural
neurosci
new
nodenoisnondeterminist
nonlinear
nonmonotonnonparametrnonpropositnontrivinormalnotat
notic
notion
notorinovel
now lannpnumer objectobservobtainoccuroccurr offer
offset oftenon
ontologoper
opportun
optim
order
ordinari orient
origin
otherw isoutcomoutlin
outperform
overaloverheadoverlapoverli overviewpacpackag pagepairpaper
paradigmparallel
paramax parametparameter part
partipartial
partialord
particular
partit
pascal
passpassiv pastpathpathw idthpatient patternpatternmatchpazzani
pca
pdppedestrianpelavin
perceiv
perceptperceptronperceptuperfect
perform
permisspermitperson
perspect
phase
phenomenonphotogrammetrphotogrammetricmodelphotographphrase physicpiecew is
plan
plane
planner
plantplateplural
point
policipolyhedrpolynomi
pomdppool
poorlipopl
popul
popularpopularliposeposit possibl
posteriorpostprocesspotenti
pow er
practic
precispredic
predictpredictor
prefer
presenc
present
previouprevious
price primariprimarili primitprincipprincipl
priorpriorit
probabilist
probabl
problem
problematproce
procedur
process
processor
prodigi produc
product
program
progress projectprologprominpromispromotproof
properti propos
proposi
proposit
provabl
prove
provenprover
provid
pseudopsychologpublishpure purpospursuitpython
qualit
quantitiqueriquestionquickli
quitradial
radicramifrandomrandomli
rangrapier
rate
ration
readerreadili
real
reason
receiv
recent
recogn recognitrecommendrecordrecoverirecurrrecursreducredund
refer
refinrefineri
regardless
region
registregressregular
reinforc
rel relatrelationship
relev
reli
remainrenderrepair
replic report
represrepresent
reproducrequest
requir
research
resembl
residuresizresolut
resourcrespectrespons
rest
restrict
result
retrievreturnreusablrevenu
review
rew eightrichrichard
richerrigid
riskroadmap
robocup
robotroboticist
robustrolerotatroughlirubric
rule
run
salesman
sampl
samuel
satisfact satisfi
saundersbbhscale
scene
schapir
schedul
schemaschemata
schemesciencscientifseamlessli
search
secondsecondlisectionseek
seemingli
segment
select
semantsemistructursend
sendmailsens separ
sequencsequentiseriserious
server servic
set
sever
shallowshapeshapiro sharesharpshop
shortshortest
show
show n
sigactsigmoid
signatursignif ic
signif icantlisigplansilhouett
similar
simpl
simplersimplicsimplif i simulsinc
singersinghsingl
situat
sizeslot smallsmesmoothli
snow birdsoccer softw arsolut
solv
solvablsometimsomew hatsophistsound sourc
spacespars
sparsitispatialspeakspecial
specifspecif ispeechspeed
spitesrvstage
standardstanford
start
state
statementstatic
statist
statu
step
stochast
store
strategi
streamstrengthstress
strikestrip
strong
structur
stuart
studi
stylesubsubfieldsubroutinsubsequ
subset
substitut
subsum
subsumpt
subtresuccesssuccessfullisuff ic suff icisuggestsuit
sumsummar
sunisupervissupport
surfacsurprisinglisurvei
svmsymbol
syntactsynthessystemat
tabl
tableaux
tactactic taketakentannertantal
target
task
tautologtaxonomi
tcpdumpteacherteam
techniqu
technolog
templat
tempor
termterminterminologtesttexttexturtheorem
theoret theori
theoryandtherebithesi
thoughttim
timetool
top topictorontototaltour tow ardtracktractabltradittradition
train
trainabl
trajectori
transformtransittranslattraveltreat
tree
treew idthtrialtrilinear
truetruthtutoritw ofoldtypetypic
ucpop
unbounduncertaintiunderli
understand
understood
undertakundirectunfortununif i
uniformuniformliuniqu
unit universunknow n
unlabel
unobtainunordunrestrict
unsatisfi
unstablusag
userusual
utah
util
valiant
valid
valu
valuablvapnik
vari
variabl
variantvarieti
variou
vastvc
vector
version
verticvia
view
violatvisionvisit visualvisualis
vocabularivol
vote w aiw alkw areh
w eak
w eakliw ealth w ebw eber
w eight
w henevw hereaw herebiw hiteheadw holew ide
w iederholdw innoww ord
w ork
w orkspacw orld
w orldw idw ors
w orst
w rapperw rittenxy year
yield
zero might
only
its
both
othereithertherefore
become three
though
although how ever
w ay
w ell
under
aroundothersz
non
togetherratherw ant stillthusamongmustalready
AI
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi52
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.1
0.2
0.3
0.4
0.5
0.6
0.7
abbreviabduct ablabov
absent abstractac acceptaccess
accommodaccordaccordingliaccountaccuracedb achievacm acrossaction
activactualad adaptaddressadequadjoinadmit adopt advancadvantagadventaffect
aggregaimalgebra algorithm
allow
alphaaltavista alternaltoamalgam amountanalys analysianalyzanil annotanomaliansw er
appear appli
applicapproach
appropriaraneuarbitrari architecturareaarguarguablargumentarisarithmet artif iciascensasnaspect
assign associassumassumpt
asynchronatomattattach attackattemptattentattribut
augmentautoepistemautom automat autonomauxiliari availaviv avoid
base
basibasicbecombeganbegin behaviorbehindbelievbelongbenchmarkbenefit
besid bestbetterbettinibeyondbinaribindbiologist
bitblastbodibolboolean
bottom
boundboundaribranchbriefbroadbrodskibrow n brow sbrybtbucket buffer buildbuiltbulkbusibytecacachcad
calculucalendarcallcam
capablcaptur
carecarefullicartesian casecategoricertainchalleng
chang
charactercharacterischaracteristcheckchenchoicci circuitcircumscriptclaim
clarif i
class
classic classifclausalcleanclear
client
clockclose
closur codecohercollaborcollectcombincomecommercicommit
commoncommonsens communcommutcompact compar compil
complet complex
componcomposcompositcomprehenscompris computconcentr conceptconceptu
concernconcisconclusconcret concurrconditconformconfus
conjunctconnect
consequconsid
considerconsist
constrain
constraint
constructconstructionofconstructorconsumcontain
contentcontinucontrast
contribut controlconveni
convent
conversconvert coopercopi correctcorrespondcostcounterpartcouplcoveragcrash creatcreditcredul criticcs currentcurvcyc cycldan
data
databas
dataguiddatalog
datashipdbmdcgdddbdeadlockdeal
decdecid decis
declar
declustdedic
deduct
deeplidefaultdefici
defin
definitdegerstedtdegre delaideletdelta demonstrdependdeptdepth derivdescend
describ
descriptdesign
desirdespitdetectdetermin
developdevisdictat
differdiff icultdiff icultidimensiondirectdirectlidisassocidiscoursdiscovdiscoveridiscret discuss
disjunctdiskdistancdistinctdistinguish distributdiversdlrdoc documentdomaindomindrivenduedung duredx dynameaseasi
easilieconomedbedieditoredu effect
eff ici
effortegenhofelaborelectr electron
elementarieliminelus
emerg
emploi enablencapsulencodencountencourag enginenhancensur
entirentitienviron
equalequatequipequivalespeciessentiestablishetcevaluevent
evolutevolutionariexactli examin examplexchang executexemplif iexhibit existexpens experimentexpertexplic explicit exploitexplorexponentiexportexpressextend
extens
extentexternextra extractfacefacilfacilitfactfailfarfashion
fasta faultfax feasibl featurfeefetchff l f ilef inal
f inanci f indfinefinit
f irst
f ivefixf lexibl
fm focufocusfocuss followforcforest form formalformatformulformula
foundfoundatfragment framew orkfranzosafreefrequentfresh fullfulli
functionfundament
furtherfurthermorfuturgarlic
gener
genomgeometrigeorggi give givenglb globalglue goalgoodgraingrammargranulargraphgraphicgreatgreatestgreatliground groupgrowguarantegulog hand
handlhappen hardw arhelpherbrandhereaft heterogenhierarch
hierarchihighhigher highlihilberthiloghistorhistoriholdhomhornhospithothp humanhyperhypothetidb ideaidealident identif iiglu iiiiiil illustr imag
immedi impactimperimplement
impliimplicit
import
impos improvinappropriinc
includ
incompletincreasincreasingliincrementindefinit independindexindic individuinductineff ici infer
infinitinform
inginherinherit
inputinsertinspirinstablinstal instancinsteadinteg
integr
intelligintendintensinter interactinterchang interest interfacinterferintermediintern internetinteroper interpretintersectintervintroduc
introduct
intuitinvestig
invoc involvirregular issu
jackjajodiajejeroslowjoinjudgjulijustif ikanchansutkaufmannkeep keikindkl
know ledg
know nlablacklag
languag
langug larglatelatexlattic lead learnlegitim levellifelifespanlight limitlinelinearlisbon listliter literaturlittllivelo locallocklog
logic
logspac longlongerlooklooploos
lorellotlow er
luimachiavelli machinmademagic mainmainli
maintainmainten
majormake
manag
mandatorimanipul
mannermanufacturmanyoftheirmapmarketmarketplacmason matchmateri
math
maxim
mccarthimdbm meanmeaning mechanmediatmeetmemo memorimergermetametaphor
method
methodologmiddlew armilomin mineminimminimumminkermipmismatchmissmitmixmodal
model
modif imodularmonitormonizmonoidmonotonmoormorganmostrec motivmovemultidatabas
multidimensionmultimedia multiplmultisetnail namenaonatur
navignecessarinecessarili needneg
negat
neighborneithernerodnest netw orknevertheless
new
nextnoisinondisjunctnonminimnonmonotonnonquerinonuniformnormalnotablnotif i notionnovelnoveloptimnumberobb
object
objectori
observobtainoccuroccurr
odmg
offeroff ic oftenolderonoodbopal open
oper
opportunopposoptim
optimistoqlord
order
orgorgan
orient
originorthogonotherw isoutputoveralovercomoverrid
ow npapackag
page
paidpairpaper paradigmparadis parallelparametparametrparsparser partpartialparticular
particularlipartitpartli passpastpathpatient patternpearpeerpereiraperfect performperformancedatabasesystemperiodpermisspermit
persist
phasephenomenaphenomenonphrase physicplaceplai planplanarplung pointpointbaspointerpolicipolynomipopulpopularposit
positivist possiblpostposteriori potenti pow erpracticpre precis
predic
predominpreferpreferentipresburgpresenc
present
preservpressprevalprevent previouprimariprimarili primitprinciplpriorproblemprocedur process
processorproducproduct
program
programmprojectproliferprolog proof properti
propos
protectprototypprotoyp proveprovenprovid
proximprzymusinskipublishpuhrpure purpospushdow nqlqualif iqualit qualitiquantif iquantit
queri
queryevaluquestionquickliquitradicraisramasw amiramif randomrangrankrapidreadili realrealiti
reason
receivrecent
recognrecordrecoverirecursredistribut reducreductredund
reexamin
refer
regardlessregionreiter rel
relat
relationshiprelevreliremaindrenderreorganrepeatrepetitreplacreplic
repositorirepres
represent
republish requestrequir
researchresidresolutresolv resourcrespect respons
restrict
restructurresult
retrievreviewricherrigorriscrobinrock rolerollrossround
rule
runsairamsamo
samplsatisf iscalablschedul
schema
schemata schemesciencscientifscopeseamless search
second
secondari securseem selectself
semant
semi
semistructur
sensseparseq sequencsequoiaserializseriouserv
server
servic
set
seversgml
sharesheafshiftshipshoreshortcom
show
show how show nshrinksidesignal signif icsimilar
simplsimplisimplic simulsinc
singlsit sitesix sizeskepticslsldnf smallsmallersocial softw arsolutsometimsophistsound sourc
spacesparsspate spatialspatio special
specif
specif ispectrumspentspherespringer
sql
srsridharsrlssststablstale standardstart statestaticstatiststepstorag
store
straightforw ardstrategistratif i streamstreetstrongstrongli
structur
studistyle
subgoalsublanguagsubquerisubramaniansubsequsubsetsubtypsuciu suff icisuitablsummari
support
surveisymmetr synchronsyntactsyntaxsystemattabltableauxtabultalk targettautechniqu
technologteltemplog
tempor
tendterm
terminterritoriththankthemselvtheoret theorithereofthinkthird
thoroughtightli timetmtodai tooltop
topdow n topictopologtova tow ardtractabltradetradeoff
tradittransact
transfertransform
translattranspartravers
treattreatment treetrivialtroll trytsimmitune
tupl
turntw odimensiontype
typicunacceptundecidunderestimundergo underliunderstandunderstoodunif iuniformuniformliunionuniquunitunivers
unknow nunlikunpredictunstructuruod updatusag uservalidvalu
vari variablvariantvarieti
variouversionvia
view
violat virtualvisualvocabularivolatil w aiw angw arrenw eakw eakestw ealth w ebw herea w idew idespreadw isconsin w ordw ork
w orkfloww orkshop w orkstat w orldw rapperw ritexmlxsbye yearyieldyorkzero
onlyits
both
othereitherthereforebecome threethoughalthough
how everw ayw ell
underaroundothers
nontogetherratherstillthus amongmustbecomesw anting
DB
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi53
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
aapc
abl
abort
absenc
abstract
accept
access
accommodaccompaniaccordaccountaccumulaccur
achiev
acmacquir
across
actaction
activactivexactor
actualad
adapt
addit
address
adjustadminist
admiss
admitadmitt
adopt
advanc
advantag
affect
againstagencaggressaiaid aim
aircraftairlinalalert
alew if
alexandalgebraalgoalgorithm
allevi
alloc
allow
alonaltern
ami
amountanaloganalys analysianalyt
analyz
andersonannot
anonymansw er
apollo
appear
appletappli
applic
applicationspecif
approach
appropri
approximarbitrariarchitect
architectur
areaarguarisarithmet
arrai
arrivartarticul artif iciaspectassembl associ
assumassumpt
asynchron
athenaatlantaatomatomicact
attackattemptattentattract attribut
audio
audiofil
authentauthorautomatautomata autonom
avail
averagavionavoid
axiombackgroundbacktrackbalanc
bandw idth
bank
base
basibasic
batteri becombeginbehav behaviorbeliev
benchmark
benefit
berkelei
bestbetter
beyondbillion
bitblock
bodibooleanbottleneck
boundbranchbreak
brew er
bridgbriefbrigg
bringbroadcast
brokenbrokerbrow sbrvbsp buffer build
builtbulkbusibutton
cach
cad calculucalifornia
call
candidcapabl
capaccardcarri
case
cash categoricauscd
cell centralcertainchair challengchang
channelcharactercharacteristchargercharlott
check
checkerchines
chip
choic
chooschose
circuit
ckp
classclassif
classif iclear
client
clif fclockcloseclosur
cluster
cmococoars
code
coeff ici
coher
coherentsharedmemoricollect
colliscomacombin
commentcommon
commonli
commun
compar
comparison
compil
complement
complet
complex
componcomposcompositcompress
comput
computationcomputationsthat concept
concernconclus
concurrcondit
conduct
configur
conflictconfuscongest
connect
conserv consid
consider
consist
constantconstitutconstrain
constraintconstru construct
consumcontaincontentcontinucontradictcontrast
control
controversiconveni
conventconvergconvertconvexcopecore cost
count
counterfeitcouplcpucr
crai
creatcredit
critic
cryptographcryptographicryptolog
cryptosystemcsmaculler
currenc
current
custom
cvm
cycl
cyclic
data
databas
dataflowdataobjectdatapathdatatypdatedatumdaviddeadlockdeadlockrecoveri
dealdebit decad
decisdecision declar
decompos
decompositdefici
defin
definitdegre
delaidelivdeliveri
demand
demonstrdenhand
densdepend
deployderiv
describ
descriptdescriptor
design
desirdespit
detail
determin
develop
devic
differ
diff idif f icultdiff iculti digit
dimensdimensiondirect
directori
disadvantagdiscovdiscret discuss
disk
dispatchdisplai
dissemindistancdistinct
distinguish
distribut
diurnaldivergdividdivis domain
donhaindraftdrastic
draw ndrivendrop
dsmduboi
duedungeonduplex duredvsm
dynam
eagerearliearliereaseasi
easiliec econom
effect
eff ici
effort
elect electron
elementelev
elgotelus emerg
emphasemphasi empir
emploi
emul enablencapsul
encodencount end
energi enginenoughensur
entir
entiti
entri
environ
equat
eric
erronerror
especi
essenti
establishestimet
ethernet
evalu
eventevid
evidenc
evolutevolutionarievolv
exactexamin exampl
exce
exclus
execut
exercisexhaustexhibit
exist
exitexokernel
expectexpens
experi
experientiexperiment
explainexplicit
exploit
explorexplosexportexpos
express
expressli extend
extens
externextraextrem
face
facilit
factfactorfailfailur
fair
fairlifalconfall
falsfamili
familiar
fanci
farfashion
fastfault
fbuffeatur
feedfeedback
fewfiat
f ieldfigur
f ile
f ileservfilesystemfilter
f inanci
f indfine
fingerfingertip
f initf initest
f irstf it
f ive
fixf lexibl
f lowfluidflukeflyfmfocufocus
folklorfollow
forcfork
form
formalformatformulforthcomforw ardfountainfourfraction
frame
framew ork
fraserfreefrequencfrequent
frostfrustum
full
fulli function
fundamentfurtherfurthermorfuturgain
gallei
gamegapgarbag
gate
gather
gener
geometrgigabit
give givenglobal goal
goodgovern
graingranular
graphgraphicgrasp
greatgreatergreatli
gribblgrid
groupgrowgrow thgsgso
guarante
guidhalfhall
hand
handlhank
hard
hardw ar
hardw areand
harmonhd
headerheapheavilihellmanhelp
herlihi
heterogen
heurist
hide
hierarchhierarchi
high
higher
highli
historihithold
homehomogenhonest host
human
hundr
hybridhyperhypertexthypothesi
ibm ideaidealident
identif i
ietfiiikpillus illustr
impact
implement
implicitimplicitli
import
imposs
improv
inappropri
includ
inconsistincorpor
increas
increasingli
inde
independindic
individu
inductindustri
inequinevit
inexpens
inferinfinitariinf lex
infomastinforminfrastructurinherinherit
initiinnov inputinsidinsightinstablinstanc
instanti
instead
instruct
instrument
insuff ici integrintelintellig
intendintensintent interact
interconnectinterest
interfac
interferintern
internet
internodinterposinterpret
interprocessinterruptintervent
intric introduc
introduct
invalidinvaluinvestig
invisinvok
involv
ionip
irregularisa
issu
iteriumjava
javascriptjdkjdszjersei
joinjudgkatz
keepkei
kenkerbero
kernel
keyw ordkit know n
ksrkuck
laboratorilan languag
laplac
larg
larger
largest
latenc
latter
layerlayout
lazi
leakleakag
learnleavled
lee
level
levi
librari
lif tlightw eightlike
limit
limitless
linelinear
linklinker listliteraturlive
load
loader
local
locat
locklockrecord
log
loggp
logic
logp
longlongestlookuplooploos
looselycoupl
lossi
low
low erlow estlow levellp
lrc
lurelutzly
machin
made
mainmainstream
maintain
maintenmajor
makemalici
manag
manipulmanner
manualmapmarshalmaskmaster matchmathemat
matrixmatter maximmbac
mbitmean
meaning
measur
mechan
meet
meiko
membermembership
memori
merchantmerg
mesh
messag
metacomputmetasystem methodmethodologmicrocodmicrokernel
microprocessor
middlew ar
migratmigratorimillionmimic
minim
minimamissmistakmitml mobilmode
model
modernmodif i
modulmodulamodularmolecularmomentmonitormontz moreovmoss motivmotorola
move
movement
mpmpimppmud
multi
multicast
multilay
multimedia multipl
multiprocess
multiprocessor
multirmultithread
municipmuninmurphimwmyriadnaglnakamura name
nation naturnavig
ncube
necessarinecessarilinectar
need
needlessli
negneithernestnetcash
netw ork
neural
new
new linext
nicenightnii nodenomin
nonblocknonprehensilnonuniformnotablnote notion
novel
noveltinsnumer
obj
object
observobtainobviouoccur
offer
oftenoldolder
olymp
onoodb
oper
oppos
optic
optim
orca
order
organorganis orientoriginorthogon
os
outperform
overalovercomoverhead
overlapow n packetpagepalmpam
parallel
parametparctabpariti
partpartialparticip
particular
partit
pass
passiv
pastpathpathw ai
pattern
paymentpcpencilperceivperceptperfect
perform
periodpermitperpag personpgaphilosophi
physic
pin placeplacementplai planplatformplayer point
pointerpolicipolit
polygon
poorli
popularportabl
portion
positpossess
possiblpost
potenti
pow erpractic
pre precispredict
preemptibl
preferpreferenti
prefetchprefix
preliminariprepar
present
prevent
previou
primariprimarili
primit
princip
principl
privatekeiprobabilist
probe
problemproceprocedur process
processor
produc
productprofessorprofil
prof ilem
program
programm
progress projectproliferpromisproperti
propos
protect
protocol
prove
provid
providingaprovidingmodularprovis
proxi
pseudopublic
pure
purpospvm
qo qualitiquantif iquantit queri
questionqueue
quickquicklirabinraid
ramifrandi random
rang
rapidrapidlirare
rasp
rate
ratiorawrc rereachabl
reactiv
read
reader
real
realist
realitirealtim reason recent
receptrecord
recordlist
recoveri
recoveryrecord
recurrrecursredistribut
reduc
reductrefer
refinreflectreformulregardregim
regionregistregularrel relatrelax
releas
relireliablreliancremainremaind
remedi
remotreplic
reportrepres
represent
requestrequir
requisitrerout researchresili
resourc
respectrespons
restrict
result
retailretessel
retriev
returnreusabl
reviewrigidrigorriserithmrobinson
robotrobust
rolerout
routerrsa rule
run
runtim
safesameh
samplsandboxsatisf i
save
scalabl
scale
scanscarcscatterscenario
sceneschedul schemescheurichscienc
scientif
scientist
scout
searchsec secondsecrecisecretsection
secur
seek semantsemisendsepar
sequenc
sequentiseri
serial
serv
serverservic
session
setsever
sgishamir
share
shepherdshiftshort show
show nsidesignsignatur
signif ic
signif icantlisimilarsimilarli simplsimpler
simpli
simplicsimplif i simulsimultan
sinc
singl
site
situat
sizeslidesliderslight
slow
small
smoothsnoopiso
social
softw ar
solut
solv
sometimsomew hat
sonhetimsoon
sophist sourcspacespars specialspecialpurpos
specif
specif ispectralspectrum
speed
spispinsplicespuriousquid
stabilstall standardstanford
start statestatechartstatemstaticstatist
steerstem step
stevenstm
storestrategi
stream
strengthstress
strictstrike
stripe
strongstrongli
structurstudi
stylesubjectsubmitsubsequsubsumpt
subsystemsuccesssuccessfullisuffer suff ici
suit
summer
sun
suno
supercomput
suppli
support
supportingonlisurfacsurvei
svcsvrsw
sw itch
symbol
symmetrsympo
synchronsynergi
syntax
synthesisynthetsystemattabl
tacctackltailor take
tallitampertarget task
tcatcp
techniqu
technologtelecommun
temportendenctensor term
terminolog
test
textualtexturthank
themselvtheorem
theoret
theori
thesi
thingthirteenth
thoroughthoughtthousand
thread
threatthroughout
throughput
ti
tiger
time
tmtmctodaitoken
toler
tooltopologtow ard
tow n
trace
trade
tradeoff
tradit
tradition traff ic
transacttransendtransfer
transformtransitransistor translattransmisstranspartransport
treetritriangltriangultrytune
turn
type
typechecktypic
ubiquitouscomputucuhligunabl
underli
understand
unexpectunfortununif i
uniform
uniformli
uninterpretuniqu
universunixunknow nunnecessariunobtrus
unoccludunpredictunregistunresolvunsecur
untrustuntypupdatupperusag
userusual
utilvalid valuvaluablvari variablvarieti
variouvectorvendor
verif
verif iversionversuvi via videoviewview erview pointvigor
virtual
virtualiz
visibl
visionvisit visualvlsivote
voter
w aiw aitw arehousw astw aveletw eak
w ebw fw gw henevw hite
w idew idespreadw indoww ire
w iredvil
w ord w orkw orkload
w orkstat
w orldw ormholw orst
w rapw rite
w riterw ritervers
w ritten
w row t
w w w
year
zebra
zero might
only
its
both
other
either
thereforebecome threethough
although
how everw ay w ellunderaround
othersznontogether
ratherstillthus among
must
becomesletssay
onceitself
HW
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi54
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
abilabreastabsencabsolut abstract accessaccumulaccur
accuraciachiev
acquiracquisit
acrossactaction activactualadadaboost adaptadd
additaddressadequadmit advancadvantagadversariadvicag aggreg
agnostagraw ai
algorithm
alialiasalign allocallow
alonalongalternamount
analog
analysi
analyst analyzanimannealansw erappar appearapplet
appli
applicapportapprentic
approach
appropri
approxim
aqarbitrarchaeolog architecturareaarguargumentarisartarticl artif iciascensascent aspect
assignassist
associ
assumassumptasymptotattemptattent
attribut
attributeoriaugmentauthorautom automatavail
averagax backgroundbag
balancball
base
basibasicbasisfunctbasket
batteribay
bayesian
beforehandbegunbehav
behavior
beliefbellmanbenchmark
bestbetter
beyondbiasbinbinari
biologbivariblackblessboltzmann
boost
bootstrapbottleneck
bought
bound
boxbranchbreimanbrieflibroadbroader brow sbuffer buildbuiltbytecodcalcul
calendar
call
candidcanon capablcaptur
cart
case
castcategor
categoricauscenter centralcertainchallengchangcharactercheck
chervonenkichildrenchoicchosen
class
classicclassif
classif i
clausclearclevercliqu closeclosest cluster
cn
codecognitcohn
collectcolumn combincombinatori
comecommit commoncommonlicommuncompact
companicompar
comparison
competit
compilcomplement completcomplex
componcomprehenscompress
computcomputationconcentr
concept
conceptsto concernconcludconclusconcret conditconductconfidconfigur
conjecturconnectconnectionistconserv
consid
consider
consist
constraintconstraintbas constructconsum
contain
contentcontinucontradictcontrastcontribut control
convei conventconvergconvex coopercopecorecorporcorpucorrectcorrelcorrespondcorrupt costcourscpucreat
criticcrossovcs
currentcurriculacurs
curvcustomdai
data
databasdataset
datedbmdbminerdealdecaddecid
decis
deduct
deependefindefinitdegre demonstrdens densiti
dependderiv
describ
descriptdesigndesirdetaildetectdetermindeterminist
develop
devicdiagnosidietterich
differ
diff icultdimensdimension
directdirectlidisciplin
discovdiscoveri
discrepdiscret
discrimindiscuss
diskdispardistancdistancebas
distinctdistinguish distributdiversdividdmti document
domain
dp
dramat
draw ndrivendt
dure dynamearliereaseasieasiliedgeffect
eff ici
efforteherelabor electron
eliminelsew herem embedemerg
empir
emploiempti enablencompass endenergi enginenhancensemblentail entirentitientropienviron
envisepisod
equalequival
error
especiessentiestablish estimetceuclideanevalu
evidevolutionariexact
examin
exampl
excelexcess executexhibit
exist
expect
expens experiexperimentexpert
explanexplicitexplicitli exploit
explor
exploratoriexponenti
expressextendextensextractextremfab face
facilit
factfactorfaithfalsfamiliarfar fastfasterfeasibl
featur
fedorovfeedbackfeedforw ardfewfew er
field
fillf iller f ilterf inal
f ind
finefirst
f irstli f itf ivefix flexiblf light focufocusfocuss followforc
form formalformulformulafoundfoundat
framew orkfreefrequencfrequentfreundfriendlifrontalfulli
function
fundament
furthermor
futurga gaingame gaussian
gener
genet
gentnergeometrgeometrigive
given
global
goal
goodgradientgraduat graphgraphicgreatgreatligreedi groupgrow th guaranteguess
guidhandhandlhardest helpheurist
hidden
hierarchhierarchi highhigher highlihillclimbhintonhistorhistori hmmholland
hopehornhphuge humanhx hyperlinkhypotheshypothesi
ididea identif iignor
iiiii illustrilp
imagimielinskiimpact implement
importimportantliimpossimpract
improv
inadequ
includ
incomincompletincorporincorrectli increasincreasingli
incrementindepend
indexindic individu
induc
induct
industriinfeas inferinflex
inform
initiinputinsensitinsidinsightinstablinstancinstantaninstead integrintelligintend interact
interest
intern internetinterpret
intersectintract
introduc
introductinvalidinvestig
involvirep
irrelev issu
item
itemsetiter
itijacobjam javajordankalmankdd keep keikernelkeyw ordkitknn
know ledg
know ledgebas
know n
kong
label
lacklamarckian languaglaplacian
larg
lawlayerlead
learn
learnabl
learner
ledledalemlengthlevel
li librarilifelikelikelihood limitline
linear
linklistliteraturlittl
llsflm locallocatlog logiclongerlookaheadlookup losslost lowlow erma
machin
mackaimademagnitudmail mainmaintain
mainten majormakemalici managmanipulmannermapmarginmarketmarkov
massiv matchmatrixmaximmaximum measur
mechanmedicmemori
memorybasmeta
method
methodolog
metric
mine
minimminimumminkow skimirrormissionmix mixtur mobil
model
modifimodulmonotonmosaicmotiv
move multimultiagmulticlassmultilay
multipl
multiplelevelmuseummutualmysterinaiv natur
nearest
neednegnegoti
neighbor netw orkneural
new
new elnextnine
nois
noisinonlinearnontrivi
notion
novelnow lannpnumbernumeroobjectobservobtainoccasionoccup occur
occurr offer
often
on open operopponopportun
optim
option
order
organ orientotherw isoutcom
outperform
outputoveralovercomoverlapoverviewpac
pagepairpanacea paradigmparallelparametpartparti
particularparticularli
partit
passpassiv pastpathpatient patternpatternmatchpazzanipbilpcapenalpeoplperceptron
perform
peripher personperspectphasephenomenon
physicpiecew is placeplanplanner
plural pointpolicipolyhedrpoolpoorpoorli
populpopularportabl
positpossiblposteriorpostprocess
pow erpracticpractitionprecipit
precispredecessor
predictpredictorpreferpreliminari
presenc
present
previouprevious
price primariprimariliprincip principlpriorprobabilist
probabl
problem
problematproblemsolvprocedur process
processorprodigiproducproduct programprogressprohibit projectproliferpromot properti
proposproposit protocolprovabl prove
providprune
pseudopsychologpublish
purchaspurpos
qmr
qualitiquantitiquantiz queriquestionquickliquitradial
raisramifrandomrang
rapidlirapier ratereactivreadablreadili
real
realtim reasonreceiv
recent
recognrecommendrecordredefin reducredundreflectregardregionregistregressregular
reinforc
rel relatrelationship
relev
reliremain
reorderreorganreplic reportrepresrepresent
requirresearch
resemblresiduresolut resourcrespectrest restrictrestructur
result
retailretinretriev
reuterrevenu
reviewrew eightribl
richrigidrigorringw orldripperkrisk robocup robotrobustrobustlirocchio
rolerow
rtdprubric
rule
run
sale sampl
samuelsat satisfisbc scalablscale
schapirschedul schemesciencscientif
searchsearchabl secondsecondliseekseemingli
segment
select
sensorseparsequencsequentiseriseriou servic
set
sever
shallowshape shareshootshortshortcom
show
show nsigmoid signif icsignif icantli
similar
simpl
simplersimplif i simulsincsinger singlsitesituat sizeskillslot
small
smesmoothsmoothlisoccersocial softw arsolut solvsophist
sought sourc
space
sparsiti special specifspecif ispectrum speedspinsquarsrv
standardstandpointstart
state
static
statist
step
storagstorestraightforw ardstrategistrengthstrike strongstrongli
structur
studi
subsubclasssubfield
subset
subspacsubstantisubsumsubsumptsuccesssuffer suff icisuff ixsuggest
suitsumsummarsuperior
supervissupportsurfac
surprisingli
surveisvmsw amisymbolsyntactsynthetsystemattabltactictail taketaken
target
task
teamteamw ork
techniqu
technologtemplat temportend termterminolog
test
theoret
theoritheoryandtherebithoughtthresholdtighter time
todaitool
top topictour tow ardtracetractabltrade tradittradition
train
trainabltrajectoritransact
transittranslattransmisstreattreatment
tree
trendtrialtrialandtrilineartrivialtruetuneturn
tutori
tw entitw ofold typetypicultimunbounduncertainti understandunderstoodundertakunif iuniformuniformliunionuniquunitunknow nunlabelunordunsatisfactoriunstablunsuccessunsupervisupdat
upperuprightusenet
userusual
utilvaliant valuvaluablvapnikvari variabl
variantvariat
varietivariouvast
vc vectorverif iversionvertic via viewviolat virtualvisionvisit visualvisualis
votevqw aiw arehw eak
w ealth w ebw ebw atch
w eight
w hole
w ide
w indoww innow w ord
w orkw orld
w orsw orstw rapperw rittenxy yearyieldmight
only its
tow ard
bothothereitherthereforebecome
three
although how everw ay
w ell
underothersz
non
togetherw ant thus amongmustperhaps
MI
within-collection frequency
with
in-c
lass
fre
quen
cy
0 0.01 0.02 0.03 0.04 0.05 0.060
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
abil
ablabreast abstractac
access
accompaniaccomplishaccountaccuraciacha
achievacm acoustacquiracquisit
across
actaction activ
ad
adaboostadaptaddit
addressadjust advancadvantagadventadvic againstaggregaggress aiaidal algorithmallowalongaltavistaalteramen
amount
analog analysianalyzannot
annualansw eranticipappear appli
applic
appreci
approach
appropri
approximaraneuarbitrariarbitrariliarchiarchitecturarchiv areaarguarisartarticlaskasn aspectassessassign assist
associ
assumassumptasymmetratm attackattemptattentattributaudioaugmentaugustauthor
automat
autonomavail
avoidaw arbackbonbackgroundbalanc bandw idthbank
base
basibayesianbecombeginbegunbehalfbehav behaviorbehindbelievbelongbelow benefitbenni
best
beyondbiologist bitblackblastblobblobw orld
boardbooleanboost boundboundariboxbroadcastbrokerbrought
brow s
brow ser buildbuiltburdenburgeonbuttonbypacketcabl cachcalcul
call
capaccapturcarefulli
carvcase
categorcategoricautioucbir centercertainchafe challengchangchannel
chaptercharacter
characteristchoicchoos
chorcircuit
citi classclassif
classif iclient
clockclosecluster
codecohen
coher
cole
collect
color combincommenccommercicommod
common
commonlicommun
comparcomparison compilcomplement complet
complexcomponcomposit
compresscompriscompromis
computcomputation
concentrconcept
conceptu concernconcurrcondit
conformconfus
connect
consensuconsequ
considconsistconstant
constrain constraintconstruct
consumcontact
containcontent
contextcontextdriven continucontrast controlconventconversation
convert coopercoordincopecopicorecornelcorpora
corpucorrectcorrel
correspond
costcourscovercraw ler
creatcrosscrucial cryptograph
curiou
current
dai
datadatabas
datedeadlin
dealdecid decisdecompositdecompressdeductdeeplidefin
definitdegre delaideliv demanddemonstr
dependdeploidepth deriv
describ
descriptdescriptor
designdesirdespit detail
determindevelop
dftdialogdictionari
differ
diff icultdiff iculti
digit
dimension
directlidiscarddisconnectdiscours
discovdiscoveridiscret discussdiskdispatchdissemindistancdistinctdistinguish
distribut
divergdivers
dividdoc
document
domain
dow nloaddozendramatdrive duedure dynamearliearlier
easiliebfedit
editor
effect
eff icieffortelectronelementeliminembed
emergemphas
emphasiempiremploi enabl
encod
encouragencrypt
end enginenhancenormensur
entir
entiti
environ
errorespeciessenti
establish estimeteuclidean
eulerevalu
event
evid
evolutevolv examin
exampl
exchangexclusexecutexistexpand
expans
expensexperiexperimentexpertexplanexplicitexplicitli exploit
explorexponenti expressextendextensextent
extracteyfab facefacilfacilitfactorfailurfairfall familifamiliar
faqfarfashion fastfastafavourfc feasibl
featurfeedback
fewfffffg
f ield
filef ilter
f ind
finderfine
first
f ix f lexiblf lipper
f lowfluctuat
focu
focusfocussfoil followforc
foreignform
formalformatformulfound
fourfourier framew orkfreefrequenc
frequentfull
fullifulllengthfunctionfundamentfurthermorfuse futur
gaingammagapgather
ge genergeneralisgenomgenr
geographgglossgilboa give
given
globalglossglossarignat goalgoodgopher
gpgpgr graingrammar graphicgreatgreatli groupgrowgrow th guarantehandlharderhead
helpheretofor
heterogen
heuristhide hierarchhierarchi
high
higherhighli
hmmhocholdhomehomepag hosthttphum humanhunt hybridhyperbfhyperlinkhypermedia
hypertexthyponymiibm ideaident identif iidfigignorii illustrilp
imag
impactimpercept implementimplementorimplicit
import
imposimprecis
improv
inadequinadequaciinappropri
includ
incomincompletincorpor
increasincreasingli
inde independ
index
indicindispensindisput individuindustriinferinfominfon
inform
infrastructurinitiinnov inputinputoutput
inqueri
instancinstead
integ
integr
intelligintensintent interactinterconnect
interest
interfacintermedi
internet
internetw orkinteroperintract
introducintroduct
invert
investiginvolvirirrelevissu
itemiteritiner javajointjudgement
keepkei
kept keyw ordkindknow
know ledgknow ledgebas
know nkrss label
languaglarg
largerlargest latenc
latent
latter lead learnled
lengthlevel
lexic
lexiconlexicosyntactlikelikelihood limitlinguist linklistlittl
locallocat
log longlongerlookloos lowlsi machin
mademailmainmaintainmajormakemalici manag
manipulmanualmapmassachusettmassiv
match
materimathematmatrix
maximmaximummeanmeasur mechanmediamedicmelodmelvyn
memorimergmerit messagmet meta
method
metricmigrat
mildlimineminimminimummirror
mixedin mobil
model
modernmodestmodif imodulmonitor
monolingumosaic motiv
muc multimultidimensionmultimedia
multiplmultivarimusicmyopic
name
natur
navignearlinecessarili
need
nestnetf ind netw ork
new
new ernew sw irnivnlirnonhomogennormalnotori
novelnovembnumbernumernyu object
observobtain
occurodmgoe offeroff ici
often
okapionontolog operopportunopposopposit
optimoql
order
ordinari organorientoriginoutcomoutperformoutputoveralovercomoverlapoverloadoverviewoverw eight
ow now nerpackag packetpagepairpaper paradigm
paragraph
parallelparametparsev part
partialparticip
particularpartit
passpassagpassion past
pathpatient patternpeoplperceptpereira
perform
persist
personphiphihaphotographphrasalphrase
physicpiecpilotpir
piratpitch place planplantplatformpo pointpolylogarithm
poorli popularportablportionpose possiblpotenti
pow erpp practicpre
precipitprecis
predefinpredicpredictprefer
present
preservpreventpreviouprimari
prime principlprivaci
privat
probabilistprobablist
problem
procedur
process
processorproducproficiprofil programprogress
project
proliferpromispromotproper propertipropos
propositprotect protocolprove
provid
proxi publicpubliclipublishpure purposqbicquadratquantitquantiti
queri
questionqueu
queuequit
rais
randomrangrankrapid
rapidlirare rateratiorawrd
re readreadi realrealitirealiz reasonrecal receiv
recent
recogn recognitrecognizrecommendrecordrecoverredefinredesign reducrefer
refinreflectreformulregardlessregionregular
rel relat
relev
reliremainremotremovrenewrepetit
replacreplic
reportrepositori repres representrequest requir
research
researchindexresid
resourcresponsrestructur
result
resum
retriev
returnreusablreveal
reviewrew ardrew rittenrichrichardrightrigor
ripper
riskroam robustrobustlirocchioroleronroughli
routroutin rulerunruntim samplsatellitsatisfact satisf isavvysearchscalabl
scalescanscenario schedulschema
scheme
scientifscriptseachseamlessli
search
searchablsecond
secret secur
seekseem
segment select
self semantsemistructursend
senssensemakseparsequencseri
server
servic
set
sever
sfqsgmlshape shareshort
showshow nsicsigir
signsignalsignatursignif icsignif icantli
similar
similarli simplsimulsimultansinc
singl
singular sitesituatsliderslow
smallsmaller
smart
softw arsolut solvsometimsongsophist
sound
sourc
spacespaceandspanish
spatialspaw n specialspecifspecif ispectral speechspeedspin
spokenspread standardstanfordstart statestationari statiststepstern store
strategistrengthstrikestringstrong
structur
strzalkow ski studisubmisssubmitsubstanti
subtop
succe successsuccessfullisuggestsuitsummarsunsetsuperhighw aisuperimpos
superiorsupervis
support
surveisuspendsuspenssvd
svmsw itchtag
tail target
task
tcpteamtechtechnic
techniqu
technologten
term
terminologtest
text
textbastexttiltextualtextur
tf
ththeme
theoremtheoret
theorithereofthink throughputtitilebar
timetipstertitl
todaitooltop topictrack
tradittradition traff ic train
trainabl
transacttransfer transformtranslattransmiss
transporttreat
trec
tree
tremendtritriggertruetrulitsimmitune typetypicubiquit
umassuncapacituncertainuncertainti
unclearundergo underli
understand
uniformuniformliuniquunit universunknow nunlikunorganunrestrictunstructurunsuccessunsuitunsupervisuntrustupperurlusagusenet
user
util valu
vari
variablvariantvariat
varietivariouvast vectorverif
versionvi viavictor videoviewview pointvirginia
virtualvisitvisual
vocabularivolumw ai
w ardw arehousw atermarkw aynw eak
w eb
w ebmatw ebw atchw ellknow nw herebiw hole
w ide
w irelessw ish
w ordw ordnet
w orkw orld
w orldw idw orsw orstw rapperw ritew riter w rittenw w w
year
yieldzdonzue might
onlyits
tow ard bothothereithertherefore
becomethree
althoughhow everw ay
w ellunder
aroundothers nontogetherratherw antstill
thus amongmust
itselfw ants
IR
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi55
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.1
0.2
0.3
0.4
0.5
0.6
0.7
abilablabsent abstractacadem
accept
access
accordaccountaccuraci achievacmacquaintacquir acrossactiv
ad adaptadditaddressadequadjustadladmit adopt advantagaffectag aggregagraw aiaid aimal
alexandria algorithm
allocallowalonalong
altaalternamountanalys
analysianalyt analyzanarchanchorandtheircontain
annot
anonymansw er
appear appli applicapproach
appropriapproximarc architecturarchiv area
arguarisart aspectassign assistassociatom attentattributauctionaugmentauthorauthorit
autom automat
autonom
avail
aw arbalanc
base
basibasicbayesian
becom
begin behaviorbelievbenefitbestbetterbeyondbibliographibidbinbiographbiologibipartitbookbookmarkbottombranchbriefbringbroadbroker
brow s
brow ser buildbuildinginformationmedibuiltburdenburibusibuyer cachcalcul callcamp
campsearchcandid capablcapturcarefulli
carnivorcasualcatalogcatalogucatapultcategor
categoricbir center centralcentroid certaincertif ichachain
challeng
changcharactercharacteristcheap checkchooscircumstcitatciteclaim classclassifclassif icleanclearclickclient
close
cluster
cme cocoauthorshipcognitcollabor
collect
colorcomallow combincommentcommerci commoncommun
comparcompat compilcomplet
complexcomplic componcomposcomprehens compresscompris computconceptconcerncongestconjunct connectconnectedconsequ considconsist
constraintconstruct
consum
containcontemporaricontend
content
contextcontrolconventconvert coopercopi
corpucorrectcorrespondcostcover
craw lcraw lercreatcreation
credenticreditcriteriacrucialcurrenc
current
customcustomizcutdaili
data
databasdatasetdate dealdecentr decisdeclardecompositdeemdefici defindelgadodeliveridemonstr
deploydeprecidepth deriv
describ
descript
design
desirdetermindevelop
devicdevisdictat
differdiff icult
diff icultidiff lcultdigest digitdimensdimensiondirectdisadvantagdiscovdiscoveri
discrimindiscuss
disjoint diskdispatchdispersdisplaidistancdistil
distinct distributdivers
document
domaindraftdrivendueduredutta dynamearlieasieasilieconomedueduc
effect
eff icacieff ici
effortelectronelementelsew heremail emergemphas empiremploi enablencodencount endenforc
engin
englishenhancenormenoughenterentertainentitientri environenvironmentequalerdo error
especiessenc essentiestablishestatetetc
evalu
eventeverydaieveryw herevolv
examin
exampl
exceedingliexchangexemplariexhaustexhibitexist
expandexpansexpensexperiexperimentexpertis
explainexplicit exploitexplorexplosexportexpos expressextendextensextentexternextra
extract
extrem facefacilitfactfailfall familifar fastfeast
featur
featurevectorfeed feedbackfew fieldfigur f ilterf inal
f ind
firstf ivefix f lexiblf lowfocu
focus
followfontfoodforeground form formalformatformulforth foundfoundatframe framew orkfrenchfreshfriendlifullfulli function
fundamentfurtherfurthermorfuturga gaingapgathergear genergeogeographget give givenglobal goalgoodgoogl
graphgrazegreatli groupgrouplen
grow
grow thguid
handlharharvestheavi helpherbivor heterogenheurist hiddenhierarchhierarchi
high
higher highlihill historiholdhomehope hosthoushttp humanhundrhunt
hyperlink
hypermediahypertext
hypothesi ideaidentidentif iidfie ignor illustr
imagimagerov implement
import
imposimposs improvincent includinclusincomparincompat
incorpor increasincreasingli
inde independ
index
indic individuinducinductinfer
inform
informationsourcinfoseek initiinnov
inputinsightinspirinsteadintact integrintelligintend interactinterest interfacintern
internet
intersectintranet introducintroductinvent investiginvok involvionisol
issuitemiterjinjokautz keep
keikeyw ordkindknow know ledgknow nkrlaborlack
languag
larg
largest layerlayout lead learnlength levelli librarilikelikelihood
limitline
link
listliteraturlittlload local
locat
log logiclonglookloos lowlow levellyco machinmademagnitudmahadevanmail main
maintain
mainten
majormake
managmanifold manipulmankindmannermanualmapmarketmarkup matchmaterimathematmatrixmaturmaximummaze meanmeaning measur mechanmedia
mediatmedicinmediummeetmembermembershipmercurimerg
metametabrokmetadatametaindexmetasearch methodmethodologmetricmicropaymillion
mineminim
minimummirrormiscellanmodalmodemodel
modul motivmove multimultiagmultidimension
multiplmultiprocessmuseummutualnaiv
namenation
natur
navig
nearnearestnecessari
necessit needneighbornetbil netw orknewnew sgroupnlp nodenoisinorth notionnoun
novel
novic objectobservobtainoccuroctob offeroftenoilolderomputon
onlinonsiderontobrok
ontolog
open operoptim orderorg
organ
organiz originort outputoutsourcovercomoverloadoverlookoverviewow npage
paipairpaper paradigmparadigmat parallelpars partparthaparticipparticularparticularli
partitpartnerpassw ordpastpatent path pattern
peopl performperiodpermitpersist
personphenomenaphoakphrasepinpoint
plaiplain pointpolicipopul
popularportalpose
positpossibl
potenti pow erpracticpre
predicpredictpreferpreliminari
present
pressur previouprice primariprimariliprimeprivaciprivacyenhancprobabilistprobabl
problemprocessprocessorproduc
productproficiprofil programprogrammprogress projectprominpropag propertiproport
propos
prosper protect protocolprototypprove
provid
prunepublicpublishpujolpure purpospursupush
qualiti
queri
questionquitrajagopalanrandomrang
rankrapid
rapidliraterdf rereadabl realrealm reason
receiv
recent
recognitrecommendrecord reduc
referrreferralw ebrefinrefreshregularregurgit reinforcrel
relatrelationshiprelevrememb
remotreplacreplic reportrepositori represrepresent
request requirresearchresemblresidresnikresolutresortresourc
responsressourc restrict
resultretriev
return
revel reviewrevisrevolutionarirew ardrich
ripe robot
role
roughli rout rulesamplsanguesasarathi satisf i
savvysearchsbc scalabl
scalescatterscenario schemescreen
search
searchabl secondsecret secursecurityandseekseemsegmentselect
selfsellerselmansemant
semisemistructursenssensitsentsentenc separsequentiserious server servicsession
set
severshah shape sharesheershoe shortshot showshow nsignif icantli
similar simplsimpli
simplif isimulsimultansinc singlsink
site
situat
size
skeptic smallsocial softw arsolut solvsom
sourc
spacesparssparsitispatialspatiospatiotemporspecialspecifspecif ispecimenspectralsphericspiderstagger standardstanfordstart statestatiststemstock storag
storestrand strategistrengthstrongstruct
structur
studisubjectsubsetsubsumpt
succesuccesssuff icisuggestsuitsuitablsummari
summer supportsurprisinglisurveisw ell synchronsynopt
syntaxsynthetsystemattagtailor taketarget
task
tassiertechnictechniqutechnologtedioutelecommuntemplat temporten termterveen testtestabltestbtestgroundtexttextualtexturtf theoretthesithousandthreatthroughoutthroughput timetimeconsum
todai
tool
topic
topologtourtow ardtrack tradit
traditiontraintransacttransittranslattranspar treetremendtriggertropotruth type
typic
ucsbuncorrel underli
understandundertak
unfortun
uniformuniformliuniquunit universunlikunpreced
unpromisunrelunsatisfactoriunstructurupdatusablusagusenet
user
usernamusual
util valuvari variablvarietivariou
vastve vectorvendor verif iviavidal videoviewvir
virtualvision
vistavisual
vocabulariw aiw alk
w eb
w ebbookw ebcraw lw ebmatw ebsomw ebvis w eight
w ide
w it w ordw ordnet w ork
w orld
w orldw idw rapperw rite
w w w
yahoo
yearyieldyourselfyu might only
itsbothothereitherthereforebecome threealthough
how everw ay
w ell
underaroundothers
nontogetherratherstillthusamong
alreadybecomesonceitselfsays
www
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi56
of
60
0 0.01 0.02 0.03 0.04 0.05 0.060
0.1
0.2
0.3
0.4
0.5
0.6
0.7
abilablabstractacacademacceler
access
accommodaccomplishaccordaccuracha
achievacknow ledgacquir
across
act activactualad adaptadaptor additaddress
administradmissadmit adopt
advancadvantagadvertisadvoc againstagencaggregaggress aiaimalalgebraalgo
algorithmallevi
alloc allowalphaalter altern analysi
analyt analyzanycastaodvapparappearapplet appli
applic
approachapproxim
architectur
archiv
areaargu
arisarparqarrivasic aspectassemblassess assistassociassumassumptasymmetriasynchron
atm
attach attackattain attentattitud audioauthent automat autonomavail
averagavoidbalanc
bandw idth
basebasibasicbearbeck becombehav behaviorbeliefbelong benefitbesid best
betterbibliographi
bit
biterror blockbooleanborrowbottleneck
bound
boundaribridgbrieflibring
broadcastbuffalobuffer buildbulk
bursti
bypacketcabl cachcalculi calculucalisti
callcampu capablcapaccare
carricascad casecategori
causcbr
cellcellular
center centralcertainchchair challengcham changchannel
charactercharacteristchaw lacheap checkchemicchemistrichinoichipchooschricitatciteclarif i classclassic classif iclearli
clientclockcloseclustercm co codecodercoexistcollaps collectcombin
commercicommoncommonli
commun
comparcompetcompetit compilcomplet complexcompon compresscompressor
comput
computeraidconcentr conceptconcernconcurrconditconductconfigurconflictconform
congest
conjunct
connect
consequ
consid
consider consistconstantconstitutconstrain constraintconstructcontaincontemporaricontend contentcontextcontextu continucontraricontrastcontribut
control
conventconvergconvincingli coopercopyrightcorecorrectcorrel costcpu creatcredenticriteriacritic
currentcurvcyberguidcycldarpa
data
databasdatagramdatalinkdatededebatdec decisdecoddecomposdecoupldecstatdeduc deductdeerdefensdefin
definit
degrad
degre
delai
delegdelivdeliveri
demand
demonstrdemultiplexdenial dependdeploideploy deriv
describ
descriptdescriptor
design
desirdesktop
despitdetectdetermindeterminist developdevic differ
diff icultdiff iculti digitdirectdirectlidisciplindisclosdiscrep
discussdisk
dispatchdissemindistancdistinct
distribut
diversdivid
djdocumentdomaindomindonalddow nlinkdraftdramdramatdrivendropdsm
due
dumb duredynamearliearliereaseasieasilieconomeconomi
edgedueffect
eff ici
effortelaborelementemergempiremploi enablencodencount
end
endtoendusenergiengagengend enginenhancenoughensurenterprisentri
environ
equipequivalerron erroresaki especiessentiestablish estimetetcethernet evalueventeverydaievidevolutevolvexact examinexampl
exceexceptexchang executexhaustexhibit
existexpect experiexperienc experimentexplain
explanexplicit exploitexplorexponentiexpos expressextendextensexternextrem facefacilit
factfactorfail
fair
fairlifall familifarfashion fastfasterfaultfeasiblfebruarifec feedbackfew
fieldfif teenfile
f ileservfilesystem findfinegranularf ininfinishfinit f irstf itf ivefix f lexiblf loodflow
flyfm focusfoil form formalformerformulforthforumforw ardfoundfountainfraction frame framew orkfreefrequencfritzsonftp
ftpdata fullfulli functionfundament
furtherfurthermor
futur
gaingap
gatew ai
gather genergeneserethgenetgeographgiantgigabitgigabytgio giveglobal goalgoodgpgrgradualgranulargraphgreatgreatergreatestgreatligroup
grow
guarante
guidelinhalf handlhandlerhandoff
hardhardw arheader
heavili helphenc heterogenhierarchhierarchi
high
higher highlihighspehinthithocholdhomehop
host
hour humanhypothes ideaidealidentidentif i
idlietfii illustrim imagimmens impact implementimportimposimpossimprovinaccurinadvertinappropriincipi
includinclusincom incorpor
increasincur independindividuinflex inform
infrastructur
inherinheritiniti inputinstantiinstead instructinstrumentinteg
integr
intendintendew dintensinter interactinterarrivinterconnectinterfac
intern
internetinternetw ork
interopinteroperinterplai interpretinterventintra introducintroductinvestiginvok
ip
ipvisiispn issujacobijaijame javajitterkatsub keep keikind know ledgknow nkorkmaz kqmlkrunzla
lanlane languaglaptop
larg
latenclaudabl
layer
lead learnlength
level
librarilightlightw eightlike
limit
linear
link
linkshar listliteraturlittl
loadloadabl local
locat
lockhe logiclong
longestlooklookuplooploos
losslossi
lossless lowlow estlrd machinmade
magnitudmainli majormake
manag
manipulmapmarchmarkmarshalmarylandmaster matchmaterimathemat maximmaximummbpmcguirmckaimeaning
measurmechan
mediameetmembermembership memorimerit
messag
metacomputmetadata methodmethodologmetricmichaelmigrat
million
minimminimummipmismatchmitig
mixmm
mobil
modemodelmodemmodernmodif imodulmodulamodularmonofractmoreovmorpheumosaicmostli motion
motivmountmovemovementmpimpoampp multi
multicast
multiclassmultifractmultimedia multipl
multipletimmultiplex multiprocessormultiprogrammultiservicnagami namenarrownash naturncsanearnecessarili
need
negnetnetsolv
netw ork
neurolognevertheless
new
new linew toniannextnitionnninntp nodenonethelessnoticnotif i notionnovelnoveltinow numeroamp objectobservobtainobviousococcasionoccuroccurr offer oftenondemandopen
oper
opportunoptic
optimoption orderorgan orientoriginososirioutlin outputoveraloverhead
overview
packet
packetsw itchpact
paradigm
parallelparamax parametparameterizparametrpariti partpartialparticip particular
particularlipathpatholog patternpeerpelavin peoplperfectli
perform
performaperiodpermitpersonperspectperuspfq physicpiecpioneerpipelinpizzaplacementplai planplaneplanetplausiblplenti pointpoispoissonpolici
polymorphpoolpoorlipopulpopularpopularli portablpose posit possiblpotenti pow erpre preferprefetchprefix
present
preservpresetpressurprevalprevent previouprice primariprimariliprincip principlprioritiproactiv probabl
problemproceprocess
processorproductprofil programprogrammprogress projectproliferpromispromot properti
propos
protocol
prototypprovabl
provid
proxi purpos
qo
quali qualitiquantif iquantit queriqueuqueue
quickradioraisramifrandom
rang
rapid
rate
rereachreactiv
real
realisrealistrealizrealmrealtim reasonreceiv
recent
recipirecovrecoverired
reduc
referrefinreflectregardregardlessregionregularrel relatreli
reliabl
relianc remotrepeatreplacreplic reportrepres
represent
reprogramrequest
requir
rerout researchresolv
resourc
respond responsrestrict resultretainretransmiss retrievreusreusablrevers reviewrevolutionrichrichardriscriserithm robustrolerough
rout
router
routinrstrsvprt rulerun
safesafetisatellit satisf isave
scalabl
scale
scenario schedul schemeschubasciencscientifseamlesslisecond securseldom selectself
semant
send
sender
sensit
sensorsent separsequentiseriserv
server
serverless
servic
session
setsevershapeshapiro
share
sharp shortshot showshow nsign
signal
signif ic
signif icantlisimilar
similarli simplsimplic
simplif isimulsimultan
sinc singlsitesituat sizesmallsmoothsmoothlismtp softw ar
solutsophistsort sourcspacespaffordsparsspatialspaw n specialspecialisspecialpurpos specifspecif i
speed
speedupspinspoofspreadsrmsscstabilstack standardstanfordstart state
staticstation statiststatustem storagstorestrategistreamstrictstripe strongstrongli structurstuartstub studi
suachsubsequ
subsetsubstantisubsumptsubsystemsuffer suff icisuggest
suitsuitablsummarsummarisunisunosupercomputsuperset
support
suppos survei
sw itch
sw itchletsymmetrisyn synchronsyntaxsynthessynthesisystemattabltailor taketarget
task
tcp
tcplib techniqutechnologtelnet
temportemporaritenet termterminterminologtertiaritestbtextbastherebithesithoroughthoughtthresholdthrottlthroughout
throughputtigertilebartim
time
timelititltoend
tolertooltopologtoronto
tracetracerouttradeofftradit
tradition
traff ic
transacttranscodtransfertransformtransittranslat
transmiss
transmittranspartransport
treadmarktrendtryfonatuneturntw ice typetypic
ubiquitultrixunabluncertaintiunderliunderneathundertakunfair unfortununhintuniuniformliuniquunit universunixunsynchronupdatuplinkupperurlusag
user
util valuvari
variabl
variatvarieti
variouvbr
vectorvega versionvictim videoviewview erviolat
virtualvisualvoltagvsw anw b w ebw eber w eightw eiserw hitew hiteboard
w idew idespreadw iederholdw ire
w ireless
w irelinw orkw orkflow
w orkstat
w orldw rapperw rittenw w wyearzdonzhang might only itstow ardbothother
eitherbecome threethoughalthough how everw ay w ell
underaroundothers nontogetherratherstillthus amongmustonce
itselfw ants
net
within-collection frequency
with
in-c
lass
fre
quen
cy
Preliminary Results
Stopwords
Mas
oud
Mak
rehc
hi57
of
60
Results
Preliminary Results
title = Convergence Results for the {EM} Approach to Mixtures of ExpertsArchitectures,Abstract: The Expectation-Maximization (EM) algorithm is an iterativeapproach to maximum likelihood parameter estimation. Jordan and Jacobs(1993) recently proposed an EM algorithm for the mixture of expertsarchitecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchicalmixture of experts architecture of Jordan and Jacobs (1992). They showedempirically that the EM algorithm for these architectures yields significantlyfaster convergence than gradient ascent. In the current paper we
keywords: ascent, em, expert, faster, hinton, jacob, jordan, nowlan
title = Text Classification from Labeled and Unlabeled Documents using EM,
Abstract: This paper shows that the accuracy of learned text classifierscan be improved by augmenting a small number of labeled training documentswith a large pool of unlabeled documents. This is important because in manytext classification problems obtaining training labels is expensive, whilelarge quantities of unlabeled documents are readily available. We introduce analgorithm for learning from labeled and unlabeled documents based on thecombination of Expectation-Maximization (EM) and a naive
keywords: accuraci, em, label, pool, quantiti, readili, unlabel, valuabl
~400,000
15,971
12,044
5,605
226 Features4,872 Keywords,507 Stopwords
using metadata insteadof the document
stemming
multi-partitionvector space model
Fuzzy-based termclustering
Mas
oud
Mak
rehc
hi58
of
60
Outlines
• Problem Statement: Metadata Mining• Proposed Approach• Experimental Results• Future Research
Mas
oud
Mak
rehc
hi59
of
60
• Evaluate the metadata representation model
• Test and evaluate the proposed fuzzy document and term representations in metadata mining problems such as dimensionality reduction, stopword and keyword extraction and term association
• Evaluate the performance of the multiple-classifier architecture in metadata classification problem
Research Plan
Future Research
Mas
oud
Mak
rehc
hi60
of
60
Thank You!