copy or not
Post on 23-Feb-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
Copy or NotDawei (David) Shi
Copy Or Not Introduction Algorithm Framework Future work Demo
Copy Or Not Introduction Algorithm Framework Future work Demo
Introduction A web-based document comparator Calculate accurate similarity between 2
documents
Copy Or Not Introduction Algorithm Framework Future work Demo
Algorithm Preprocessing Vector space Similarity calculation
Preprocessing
LowercaseStop
words filtering
Stemming
Preprocessing Stemming
› Porter Stemming Algorithm› E.g.
cat – cats meet – meeting agree – agreed correct - correctness
Vector Space Build dictionary 1
› word -> frequency Sort the keys of dictionary 1 Build dictionary 2
› key -> (index, count) Build binary vectors
› index -> occurrence
Similarity Calculation Vectors v1 and v2 Similarity = v1 * v2 / (norm(v1) *
norm(v2))
Performance Algorithms coded in Python
› Dynamic typing› Not good at numerical operations
Solution: numpy
Numpy A Python extension module Written mostly in C Define numerical array and matrix
types and basic operations on them
Numpy vs Python Python code
› a = range(10000000)› b = range(10000000)› c = []› for i in range(len(a)):
c.append(a[i] + b[i]) Takes up to 10 seconds on a several
GHz processor
Numpy vs Python Numpy code
› import numpy as np› a = np.arrange(10000000)› a = np.arrange(10000000)› c = a + b
Almost Instant
Numpy Usage Vector dot product Vector normalization Vector zero filling
Copy Or Not Introduction Algorithm Framework Future work Demo
Framework Django
› The web framework for perfectionists with deadlines
Libraries Python
› Numpy› Porter Stemming
jQuery
Hosting Alwaysdata
› Django 1.3› Python 2.6
Copy Or Not Introduction Algorithm Framework Future work Demo
Future Work Support file uploading and comparison Add HTML5 features
Copy Or Not Introduction Algorithm Framework Future work Demo
Thank you!
top related