progress report - year 2 extensions of the phd symposium presentation daniel mcennis
DESCRIPTION
Current Data 40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts neededTRANSCRIPT
Progress Report - Year 2
Extensions of the PhD Symposium Presentation
Daniel McEnnis
Overview
Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements
Current Data
40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music
LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed
Planned Data Set Acquisition
Explored DBTunes XML version of myspace.
Linking with LastFM data designed but not yet written.
Provides per-artist audio data for all recent artists.
Theoretical Achievements
Algorithm Literature ReviewTheortical Computer Science journal
submissionNZCSRSC conference submissionRecommendation Tasks and Evaluation
Metrics
Algorithm Literature
Systematic exploration of theoretical computer science and discrete mathematics.
Discovered 1973 SIAM paper for maximal clique algorithm.
Maximal clique algorithm is most efficient discovered
Journal Submission
Submitted Graph Triples Census algorithm. Proof of correctness Proof of Time complexity Proof of Space Complexity
Rediscovery of 2001 algorithm in Social Networks
Most efficient implementation known
NZCSRSC
Poster at the conferenceWritten as a short users guide
Evaluation Exploration
Incorporating cross-validation into relational data.
9 types of music recommendation Personalized versus generic Open query versus targeted query Dynamic versus static data New music versus all music
Personalized Radio
Open query with personalized presentation
Static data vs dynamic dataNew items prediction vs predict
anything
Targeted Search
Not personalizedSimilarity queriesAutomatically generating targeted lists
for a browsing hierarchyNew music vs all musicStatic vs dynamic data
Personalized Tag Radio
Create a personalized play list matching a given query
New music vs all musicStatic vs dynamic data
Excluded Types
‘Top 40’ predictionRendered obsolete by other types
Cross-Validation in Graphs
Actor removal Only form currently used All links to a particular actor are removed
Link removal Selected links from ground truth are
removed Algorithm evaluated on reproducing
missing links
Graph-RAT Improvements
Release of 0.4.4 Finalized Graph-RAT as a relational
programming language Added propositional algorithms
Release of 0.5.0 New Query Subsystem Usability enhancements Space complexity improvements
Aggregators
8 algorithms with 9 helper functionsCover each form of propositionalizationCover mappings between links and
propertiesCore primitives for Graph-RAT as a
programming language.
Similarity
2 new similarity algorithms1 new distance metric
Query Subsystem
28 primitives for searching in a graph 10 graph primitives 7 actor primitives 7 link primitives 4 property primitives
Functional - composition to build queries
Performance Specs
Queries can return collections or iterators.
Collections Implemented as references into graphs Linear in number of references
Iterators Ordered sequences of objects Constant in space complexity (excluding
Graph ID and AllGraphs)
Usability Enhancements
Properties and MetadataInterface enhancementsDynamic Loading of ClassesXML scripting support
Properties and Metadata
Properties description Encapsulates all parameter code Utilizes Graph-RAT Property objects Comparison to JavaBeans
New Metadata Model Parameter model update Input/Output descriptors update
Interface Updates
Arrays->Lists graph, link, actor, and property objects
Iterators All graph operations support iterators
Dynamic Loading
Classes loaded from file at runtime.Loading controlled by call to loader
objectAutomatic registering with relevant
factoriesAll factories updated to support dynamic
loading Extend Abstract Factory
XML Scripting support
SAX parser support for all components excepting crawling and parsing
Implemented using the Builder pattern
Core Improvements
2 cross-validation algorithms~20 algorithm with space complexity
improvementsIterators for all graph primitivesMacros for separation of graph data by
cross-validation property.
Additional algorithms
2 new similarity algorithms
1 new distance metric added
Obsolete algorithms removed
LastFM crawler updates
LastFM upgraded its web-services, removing the old version
New version will link to the semantic web
~20 parsers completedStill under construction
Planned Future Work
Contingent on arrival of computerTesting of existing codeCross-Validation SchedulerCompletion of LastFM ParserDBTunes (from semantic web) parserExperiments!Write Thesis!
Unplanned Future Work
Full semantic web crawlerIncorporating GData protocolsDatabase backendColt-Matrix-Over-Graph adapterDatabase-backed Weka instance
Beyond the Horizon
Support for Prolog primitivesMulti-database graph supportSemantic Web graph utilizing the proxy
patternSupport for dynamic updates and
dynamic data