machine learning in web proxy caching
Post on 15-Apr-2017
478 Views
Preview:
TRANSCRIPT
Machine Learning Approach in Web Proxy Cache
Replacement.Sivaraj Nimishan
2011/CSC/016
Superviser Sriskandarajah Shriparen
Web Proxy Caching• Solution for improving the performance of Web-based systems is Web
proxy caching
Cache Replacements• In the proxy cache replacement, the proxy cache must effectively decide which objects are worth caching or replacing with other objects.
LRU
LFU
LFU-DA
GDSF
The least recently used objects are removed first.
Dynamic aging factor is incorporated into LFU.
Size, Cost of fetching, Dynamic aging factor integrated with frequency
The least frequently used objects are removed first.
SquidSquid log format
LRU : The LRU policies keeps recently referenced objects.heap GDSF : The heap GDSF policy optimizes object hit rate by keeping smaller popular objects in cacheheap LFUDA : The heap LFUDA policy keeps popular objects in cache regardless of their sizeheap LRU : LRU policy implemented using a heap
timestamp
response time
client address
status codes
size
request method
URL client identity
Hierarchy Code
content type
Machine LearningSupport Vector Machine Decision tree
Data collection Billion Triples Challenge 2012 Dataset
The dataset was crawled during May/June 2012. Several seed sets collected from mulitple sources.
Datahub A Data Ecosystem for Individuals, Teams and People
DBpedia DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.
Freebase A community-curated database of well-known people, places, and things
Rest The seed set for the Rest crawl contained all other URIs involved in a relation in the DBpedia
Timbl Timbl crawl consisted of Tim Berners-Lee's Friend of a Friend (FOAF)project.(2 files)
PreprocessingData Set Size from to
Datahub 136.8MB [Thu Apr 26 20:07:13 2012] [Fri Apr 27 16:20:16 2012]
DBpedia 170.3MB [Tue May 1 07:46:29 2012] [Fri Apr 27 21:19:02 2012]
Freebase 123.6MB [Fri Apr 27 07:18:03 2012] [Mon Apr 30 12:31:49 2012]
Rest 32MB [Mon Apr 30 13:34:06 2012] [Mon Apr 30 18:46:04 2012]
Timbl 1 138.5MB [Sat May 5 21:05:02 2012] [Tue May 8 07:50:56 2012]
Timbl 2 179.5MB [Tue May 15 20:29:22 2012] [Wed May 23 04:53:27 2012]
Data Set Requests Cacheable requests %
Datahub 398547 181850 45.63 %
DBpedia 1382090 537038 38.86 %
Freebase 333956 145010 43.42 %
Rest 71972 18942 26.32 %
Timbl 1 889591 323451 36.36 %
Timbl 2 1675106 680952 40.65 %
Total 4751262 1887243 39.72 %
Preprocessing...
successful entries with status codes 200
Preprocessing...
SWL Sliding Window Length of 30 minutes-( Romano and ElAarag)Target attribute is obtained by backward-looking sliding window
1 ; if the object is revisited within the sliding windowTarget attribute =
0 ; otherwise
Attributes Valuestime 1335442301duration 379 client 127.0.0.1result_code TCP_MISS/200size 1609method GET URL http://www.opencalais.com/robots.txt
{
a perl command used to convert the unix time-stamp to human-readable timestamptail access.log | perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e'
Preprocessing...
access.log
connection.java
Labelinsert.java
InsertMongoDB.java
access.csv
mongoexport
Preprocessing...
Methodology
Performance Measure
Hit Ratio is the factor widely used in evaluating the performance of web caching
i.e, Hit Ratio is defined as the percentage of requests that can be satisfied by the cache.
Hit Ratio = * 100 Hit RatioCacheable requests
Machine LearnerWSO2 Machine Learner is a product which
helps to manage and explore the data, build machine learning models after analyzing the data using machine learning algorithms, compare and manage generated machine learning models and predict using the built models.
Apache Spark is a fast and general engine for large-scale data processing.
Easy graphical user interface for human-friendly viewing
Access the ML UI from a Web browser using the following URL: https://<ML_HOST>:<ML_PORT>/ml
to run ML : <PRODUCT_HOME>/bin/wso2server.sh
SVM Decision Tree
Parameters100 : Iterations
0.001 : Learning Rate1 : SGD Data Fraction
L1 : Reg Type0.001 : Reg Parameter
ParametersMax Depth : 30Max Bins : depend on unique featuresImpurity : gini/entropy
Data set
Total requests
Number of hits
Hit ratio
Datahub2
54557 45357 83.13
Dbpedia 181114 105883 58.46Freebase 43507 32527 74.76
Rest 5685 4428 77.88Timbl 97039 42390 43.68
Timbl2 206708 135149 66.15
Data set
Total requests
Number of hits
Hit ratio
Datahub2
54557 25470 46.68
Dbpedia 181114 118418 65.38Freebase 43507 26359 60.58
Rest 5685 1519 26.71Timbl 97039 58243 60.02Timbl2 204288 96822 47.39
ConclusionData Set Requests Cacheable
requests Hit Ratio(%)
Datahub 398547 181850 83.13
DBpedia 1382090 537038 65.38
Freebase 333956 145010 74.76
Rest 71972 18942 77.88
Timbl 1 889591 323451 60.02
Timbl 2 1675106 680952 66.15
In this study SVM and Decision Tree approches were used to train proxy logs files to classify the contents of Web proxy cache.
The hit ratio calculated by the classification decisions made by the trained SVM and trained Decision tree
The performance of Web caching can be improved using supervised machine learning.Classifiers can be utilized to improve the hit ratio of traditional Web caching policies.
ReferencesS. Romano and H. ElAarag, "A neural network proxy cache replacement strategy and its implementation in the Squid proxy server", Neural Computing & Applications, Vol. 20, No. 1, (2011), pp. 59-78.
A. I. Vakali, "LRU-based algorithms for Web Cache Replacement"
W. Ali S. Sulaiman, and N. Ahmad "Performance Improvement of Least-Recently Used Policy in Web Proxy Cache Replacement Using Supervised Machine Learning" Int. J. Advance. Soft Comput. Appl., Vol. 6, No.1 ,(2014)
Introducing Machine Learner https://docs.wso2.com/display/ML100/Introducing+Machine+Learner
Squid: Optimising Web Delivery http://www.squid-cache.org/
top related