customized search: using web mash-up and web usage...

4
IPASJ International Journal of Computer Science (IIJCS) Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm A Publisher for Research Motivation ........ Email: [email protected] Volume 2, Issue 5, May 2014 ISSN 2321-5992 Volume 2 Issue 5 May 2014 Page 37 Abstract Nowadays, web search has become very easy. The options of search engines have increased for web Users. Indeed having so many choices, the user satisfaction is still not achieved. User has to search from all the search engines to get the desired results. Our approach is to make User web search more effective and easier. Customized Search means the Integration of various web based Search services on the basis of analyzed User web logs. Using this application we will try to shortlist three Image search engines and make a mash-up application of them. The aim is to enhance the quality of the internet usage. Our experimental results demonstrate that our approach can improve user satisfaction by giving customized results of search from three search engines. Keywords: Mash-Up, Adaptive Resonance theory, Encog, web log analysis. 1. INTRODUCTION There has been exponential growth in the World Wide Web in terms of Web sites and their users. Now-a-days, we have many Advanced Search engines available on the web to search Data, Images etc. In spite, of having such advanced searching technologies [7], the user has to do, three or more keyword trial and errors or has to switch between two or more search engines to get the desired results and that too the results are not 100% satisfactory [7]. Different search engines have different approaches (algorithms) [7],[8] for search for example Google’s PageRank algorithm and yahoo’s yahoo search and many more. Each search gives different results according to the keywords used, and may or may not match up to the user’s requirements. Our Application is an attempt to increase the user satisfaction from search results. Few definitions before going into the Concept. Mash-up [11], in web development, is a web page, or web application, that uses content from more than one source to create a single new service displayed in a single graphical interface. Clustering [2], [3], [9], [10] is the process of organizing objects into groups whose members are similar in some way. Web log analyzer [1],[6] is a kind of web analytics software that parses a server log file from a web server, and based on the values contained in the log file, derives indicators about when, how, and by whom a web server is visited. To understand the concept consider 10 most popular search engines. These 10 search engines are grouped into a group of 3 (because our system is currently capable of mashing up only 3 search engines, in future the number will be increased) with considering permutation and combinations (not all). For example, a group may be of google, yahoo, and flicker, and another group will contain google, Picasa, yahoo and so. In this way up to 20 groups are formed and the search engines in a group are mashed up and few customization are added to get good results. So, now we have 20 Customized Search engines. Now, in the User Web Browser the Customized search plugin is to be installed. The Function of this plugin is to retrieve the user logs (from which the search engines used by user is Deducted) from server (for this AWStats is used) and then on the basis of this extracted data one mash-up is allotted to the User (directed to the link of the mash-up) from the above explained 20 mash-ups. Figure 1: System Block Diagram [12] Customized Search: Using Web Mash-up and Web Usage Mining Ravi Kumar Mondeti 1 , Manoj Valesha 2 , Vivek Sadhwani 3 , Gresha Bhatia (Mentor) 4 1,2&3 B.E. Computer Engg., VES Institute of Technology, Mumbai -400074, Maharashtra, India 4 Professor, Department of Computer Engineering, VES Institute of Technology, Mumbai -400074, Maharashtra, India.

Upload: donhan

Post on 18-Feb-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Customized Search: Using Web Mash-up and Web Usage Miningipasj.org/IIJCS/Volume2Issue5/IIJCS-2014-05-18-024.pdf · Engineering University of Minnesota ,Web Usage Mining: ... working

IPASJ International Journal of Computer Science (IIJCS) Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm

A Publisher for Research Motivation ........ Email: [email protected] Volume 2, Issue 5, May 2014 ISSN 2321-5992

Volume 2 Issue 5 May 2014 Page 37

Abstract

Nowadays, web search has become very easy. The options of search engines have increased for web Users. Indeed having so many choices, the user satisfaction is still not achieved. User has to search from all the search engines to get the desired results. Our approach is to make User web search more effective and easier. Customized Search means the Integration of various web based Search services on the basis of analyzed User web logs. Using this application we will try to shortlist three Image search engines and make a mash-up application of them. The aim is to enhance the quality of the internet usage. Our experimental results demonstrate that our approach can improve user satisfaction by giving customized results of search from three search engines. Keywords: Mash-Up, Adaptive Resonance theory, Encog, web log analysis. 1. INTRODUCTION There has been exponential growth in the World Wide Web in terms of Web sites and their users. Now-a-days, we have many Advanced Search engines available on the web to search Data, Images etc. In spite, of having such advanced searching technologies [7], the user has to do, three or more keyword trial and errors or has to switch between two or more search engines to get the desired results and that too the results are not 100% satisfactory [7]. Different search engines have different approaches (algorithms) [7],[8] for search for example Google’s PageRank algorithm and yahoo’s yahoo search and many more. Each search gives different results according to the keywords used, and may or may not match up to the user’s requirements. Our Application is an attempt to increase the user satisfaction from search results. Few definitions before going into the Concept. Mash-up [11], in web development, is a web page, or web application, that uses content from more than one source to create a single new service displayed in a single graphical interface. Clustering [2], [3], [9], [10] is the process of organizing objects into groups whose members are similar in some way. Web log analyzer [1],[6] is a kind of web analytics software that parses a server log file from a web server, and based on the values contained in the log file, derives indicators about when, how, and by whom a web server is visited. To understand the concept consider 10 most popular search engines. These 10 search engines are grouped into a group of 3 (because our system is currently capable of mashing up only 3 search engines, in future the number will be increased) with considering permutation and combinations (not all). For example, a group may be of google, yahoo, and flicker, and another group will contain google, Picasa, yahoo and so. In this way up to 20 groups are formed and the search engines in a group are mashed up and few customization are added to get good results. So, now we have 20 Customized Search engines. Now, in the User Web Browser the Customized search plugin is to be installed. The Function of this plugin is to retrieve the user logs (from which the search engines used by user is Deducted) from server (for this AWStats is used) and then on the basis of this extracted data one mash-up is allotted to the User (directed to the link of the mash-up) from the above explained 20 mash-ups.

Figure 1: System Block Diagram [12]

Customized Search: Using Web Mash-up and Web Usage Mining

Ravi Kumar Mondeti1, Manoj Valesha2, Vivek Sadhwani3, Gresha Bhatia (Mentor)4

1,2&3B.E. Computer Engg., VES Institute of Technology, Mumbai -400074, Maharashtra, India

4 Professor, Department of Computer Engineering, VES Institute of Technology, Mumbai -400074, Maharashtra, India.

Page 2: Customized Search: Using Web Mash-up and Web Usage Miningipasj.org/IIJCS/Volume2Issue5/IIJCS-2014-05-18-024.pdf · Engineering University of Minnesota ,Web Usage Mining: ... working

IPASJ International Journal of Computer Science (IIJCS) Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm

A Publisher for Research Motivation ........ Email: [email protected] Volume 2, Issue 5, May 2014 ISSN 2321-5992

Volume 2 Issue 5 May 2014 Page 38

2. IMPLEMENTATION We divided the implementation of this System into three phases.

User log file Analyzer. Mash-Up Allocator. Creating Web Mash-up.

The first two phases work in sequence as the output of Analyzer is input to Mash-Up Allocator. The third phase is not required to be implemented for all the user. 2.1 User log Analyzer: In this phase the user log files are retrieved from the remote server using the AWStats [6] . User logs are analyzed to extract the most frequently used Search Engines by the User [1]. The Extracted data is stored in the following input format to Mash-Up Allocator.

“O O O O” It is a ten bit Bipolar input of “O” and “ ” (empty space) bits . Where each bit represents a search engine and if that search engine is used by the user then it is marked as “O” or else marked as “ ”. ( the size of the input depends on the number of search engines considered , here we assumed 10) 2.2 Mash-Up Allocator: The Adaptive Resonance Theory (Neural Networks) Algorithm [9],[10] is used in Mash-up Allocator. The Mash-Up allocator reads the input from the output of the Analyzer phase and allocates the Mash-UP. The Mash-UP implements the clustering algorithm and cluster user into the group of mash-ups. The JAVA ENCOG [8],[9] library is used to code the Mash-Up allocator. The output of the Mash-Up allocator is as follows:-

User-IP Mash-Up allotted

192.168.107.121 2 2.3 Creating Web Mash-up: The Search Engines which are Grouped as explained in the Introduction section are Mashed Up using Yahoo pipes [4],[5] . Schema of search engines is created and implemented using mash-up . The API’s of search engines is needed for Mash-up. For example: - Consider the mash-up of Google, Picasa, and Flickr. Google is an independent platform which allows user to access its servers in any formats possible. Picasa being a google product is also open for development. The only issue is with Flickr. It restricts its usage to a limited purpose so in order to use Flickr we had to contact them for a secret key also known as API. Once we were done getting the API our project was implemented successfully.

Page 3: Customized Search: Using Web Mash-up and Web Usage Miningipasj.org/IIJCS/Volume2Issue5/IIJCS-2014-05-18-024.pdf · Engineering University of Minnesota ,Web Usage Mining: ... working

IPASJ International Journal of Computer Science (IIJCS) Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm

A Publisher for Research Motivation ........ Email: [email protected] Volume 2, Issue 5, May 2014 ISSN 2321-5992

Volume 2 Issue 5 May 2014 Page 39

Figure 2: Schema of Google, Picasa, Flickr mash-up[4]

3. Software’s And Libraries

3.1 Yahoo Pipes: Yahoo pipes [4],[5] is used to create all the mash-Up used in this project. Yahoo pipes is an amazing web mash-up generator. The purpose of Yahoo Pipes is to create new pages by aggregating RSS feeds from different sources. Yahoo Pipes has many modules which can be used either to grab data from sources or to edit the data that is grabbed from the sources. These modules are grouped into categories. These categories are sources, user inputs, operators, URL, string, date, location and number. 3.2 AWStats: AWStats [1] is used to analyze the web log files of user to generate input for Mash-Up allocator. AWStats, A log file analyzer which generates the most used websites and then clusters them together.

3.3 Encog: Encog [2],[3] libraries is used to code Mash-up allocator in JAVA. Encog is a machine learning framework available for Java, .Net, and C++. Encog supports different learning algorithms such as Bayesian Networks, Hidden Markov Models and Support Vector Machines. However, its main strength lay in its neural network algorithms. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. 4. Related Work: Karthick Murugan, a yahoo pipes user created Flickr image search module which allows user to search Flickr

images. A guy named “a contributor” has posted the most pipes amongst the members of yahoo blog he actually helped us

with the concepts.

5. Conclusion: Customized search is an advanced searching approach in which different Search Engines are mashed-Up (combining most useful methods of shortlisted search engines) to create an altogether more powerful search engine, according to the User usage data of Search engines. Experimental results show that that our Approach increases user search result efficiency and satisfaction.

6. Acknowledgement This idea would not have been possible without noteworthy contributions of ,

Prof. Gresha Bhatia who inspired us for making this project the way it is. “The Contributer” who helped us in finalizing the pipes schema.

References [1] http://www.awstats.org/docs/awstats_compare.html AWStats.

Page 4: Customized Search: Using Web Mash-up and Web Usage Miningipasj.org/IIJCS/Volume2Issue5/IIJCS-2014-05-18-024.pdf · Engineering University of Minnesota ,Web Usage Mining: ... working

IPASJ International Journal of Computer Science (IIJCS) Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm

A Publisher for Research Motivation ........ Email: [email protected] Volume 2, Issue 5, May 2014 ISSN 2321-5992

Volume 2 Issue 5 May 2014 Page 40

[2] http://www.heatonresearch.com/encog- Encog Frame . [3] http://www.youtube.com/channel/UCR1-GEpyOPzT2AO4D_eifdw?sub_confirmation=1 Encog learning platform. [4] http://pipes.yahoo.com/pipes/docs -yahoo pipes documentation [5] http://pipes.tigit.co.uk/ yahoo pipes tutorial. [6] Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan, Department of Computer Science and

Engineering University of Minnesota ,Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data SIGKDD Explorations. Copyright c 2000 ACMSIGKDD, Jan 2000..

[7] 1st meenakshi shruti pal, 2dr. sushil kumar garg. image retrieval: a literature review, international journal of advanced research in computer engineering and technology (ijarcet) volume 2, issue 6, june 2013.

[8] poonam bhusari, rashmi gupta, amit sinahal ,personalized image search from photo sharing websites using ranking based tensor factorization model (rmtf) ,international journal of advanced research, volume 3, issue 8, august 2013.

[9] vaishali a.zilpe, dr. mohammad atique , web usage mining using neural network approach: a critical review , (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 3 (1) , 2012, 3073 - 3077

[10] sharma , m varshney, “an efficient approach for web log mining using art”, international conference on education and management technology, 2010 (icemt 2010).

[11] volker hoyer ,Katarina stanoevska- slabeva,Simone kramer, andrea giessmann , what are the business benefits of enterprise mashups ? 1530-1605/11 $26.00 © 2011 ieee.

[12] http://creately.com/ , block diagrams are drawn using Creatly. AUTHORS:-

Ravi Kumar Mondeti (Corresponding Author) is pursuing his B.Engg degree from VES Institute of Technology, Mumbai -400074, and Maharashtra, India Affiliated to Mumbai University. Currently he is working as Planning and Management Officer in ISTE-VESIT . His area of interest is Neural Networks, Artificial Intelligence, Data Mining and Machine learning.

Manoj Valesha is pursuing his B.Engg degree from VES Institute of Technology, Mumbai -400074,Maharashtra, India Affiliated to Mumbai University. His area of Interest is Web Mash-up, Computer Networks, Game Design.

Vivek Sadhwani is pursuing his B.Engg degree from VES Institute of Technology, Mumbai -400074, and Maharashtra, India Affiliated to Mumbai University. His area of Interest is Web log analysis and Data Mining.