1 - people | computer sciencepeople.cs.ksu.edu/~shahid/report/4-report_content.doc  · web view,...

55
1. Project Overview 1.1 Introduction The Merriam-Webster collegiate Dictionary describes plagiarism as follows: Plagiarize: Transitive senses: to steal and pass off (the ideas or words of another) as one's own: use (another’s production) without crediting the source. Intransitive senses: to commit literary theft: present as new and original an idea or product derived from an existing source Plagiarism remains one of greatest temptations facing students and the only barrier to its frequent use is the fear of detection. With the spread of the Internet, the ability to share bodies of work has increased many folds and the risk of detection has diminished almost 1

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

1. Project Overview

1.1 Introduction

The Merriam-Webster collegiate Dictionary describes plagiarism as follows:

Plagiarize:

Transitive senses: to steal and pass off (the ideas or words of another) as one's

own: use (another’s production) without crediting the source.

Intransitive senses: to commit literary theft: present as new and original an idea

or product derived from an existing source

Plagiarism remains one of greatest temptations facing students and the only

barrier to its frequent use is the fear of detection. With the spread of the Internet,

the ability to share bodies of work has increased many folds and the risk of

detection has diminished almost proportionally. A search of the Internet reveals

many stores for plagiarizable papers. There are many sites that sell papers and

several other smaller ones that provide smaller collections available for free. As a

direct consequence of the alarming increase in plagiarism, writers and scholars are

getting discouraged from directly sharing their works over the Internet.

Several tools are available to combat plagiarism.

1

Page 2: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Turn-it-in.com: A very well respected tool and used widely by many

institutions. The University of Dayton has a site license for this plagiarism

tool.

Essay Verification Engine (EVE): Available at www.canexus.com, it

accepts popular formats such as Word and plain text and returns a report

with the URL suspected of plagiarism and other statistical data. However

it has a subscription fee.

Moss: A tool developed at university of Berkeley to detect similarities in

software programs.

Our aim at Kansas State University was to develop our own version of a

plagiarism detection tool. The tool was to use the Google search engine for the

web search for related documents and should have an intuitive Graphical User

Interface that was simple yet attractive. It should be able to reduce complex

remote file system operations to simple drag and drop operations on the GUI and

should be able to analyze source documents by just the click of a mouse button.

1.2 Architecture

Figure1-1 shows the basic architecture for the entire application.

2

Page 3: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Fig 1-1 Architecture.

The web server acts as an interface between the client and the database/file

system. Most of the middle tier comprises of SOAP services running on the web

server. The client end will be running an application delivered via Java Web Start.

The client end application authenticates user login, fetches file system data from

the server in the form of XML data and displays it as a tree structure at the client

end. The application also recurses through the client end file system to produce

another tree structure representing a hierarchical view of the client file system.

The user can then move files from his local directory to his web directory at the

server end, feeds files into the IDM search engine and start the search operation

using simple drag and drop operations and mouse button clicks.

3

Page 4: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

2 The Existing Scenario

2.1 Current IDM Interface

In the fall of 2001, Sorel Robeldo of Kansas State University, under the guidance

of Dr. Daniel Andersen developed a web-based tool capable of searching the

Internet for any document(s) that could have been totally or partially copied by a

student.

Figure 2-1 Screen shot of the old systems index page

4

Page 5: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 2-2 Screen shot of the old systems search start page

Figure 2-2 shows a sample Screen shot of the old IDM tool. The start page is a

simple HTML page with text boxes, where the user can select files from his file

system to upload. When the upload button is pressed, the files are uploaded to the

server using JSPSmartUpload and the next screen is presented to the user

indicating what files have been uploaded and a search button. Hitting the search

button starts the search operation and the results generated are presented to the

user.

While the IDM tool is easy to use, it has some inherent drawbacks. There is no

provision for user customization with respect to a persistent file store so that a

user can upload and store documents to his web directory before adding them to

5

Page 6: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

the search engine. The need was also felt to limit users to this tool and therefore

the ability to create user accounts and logins. Moreover, the current interface was

thought to be too simplistic and a newer graphical user interface was proposed

whereby the user can view his local system and his web directory in a windows-

like hierarchical view. Instead of using a simple upload mechanism as shown in 2-

1, it was proposed to have a drag and drop interface for the user to upload files for

searching.

The other main issue to be addressed was obtaining permission from Google

before running the search. Due to the large number of search queries generated by

the system, Google blocked the IP address from where the requests were

originating after a maximum number of automated requests were served. It was

resolved to use the Google API as against the current method of querying the

Google web-site directly. Every user of the API is provided with a license key,

which entitles him to a maximum limit of 1000 queries per day.

2.2 The Intended Audience

The intended audience for the new IDM system is mainly instructors and their

assistants in Kansas State University and universities throughout the United

States. The user is assumed to have a minimal knowledge of file system

operations like in windows type GUI and should be able to install or already have

installed Java Web Start on their machines.

6

Page 7: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

For UNIX users, Netscape is expected to be in the user’s classpath. The tool is

pretty easy to use and the user interface quite intuitive so that people with

minimal computer knowledge can use the tool.

7

Page 8: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

3. Tools and Technologies Used

3.1 J2SE/JDK1.3

Java was the natural choice of programming language to use since the application

is a Graphics intensive one and because it is to be delivered via the web. As Sun

Microsystems advertises on its web site “Java 2 Platform, Standard Edition

(J2SETM) software is the premier solution for rapidly developing and deploying

mission-critical, enterprise applications. Version 1.4.1 builds upon Java

technology's cross-platform support and robust security model with new features

and functionality, enhanced performance and scalability, and improved reliability

and serviceability. Version 1.4.1 advances rich client application development

and provides the foundation for standards-based, interoperable Web services that

can be built and deployed today!”

J2SE 1.4.1or the Java 2 Standard edition is Sun Microsystems’ latest release and

JDK 1.1.8 is the development kit that is associated with J2SE. Java is fairly easy

to understand and has a rich set of API’s for application development. In

particular, our application draws heavily from the Java Swing API for its

rendering and functioning.

3.2 Java Web Start

Java Web Start provides the technology to greatly simplify development of Java

applications. It includes the security features of the Java platform and allows the

8

Page 9: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

user to use the latest Java 2 Technology with any web browser. It automatically

downloads any files required to run an application and caches them locally for

faster deployment.

Java Web Start can also run applications independent of a web browser.

Applications can also be launched through desktop shortcuts making the

launching of a web-deployed application similar to launching a native application.

From the security point of view, Java Web Start applications can bypass the

typical Sandbox Environment that an applet is subjected. Thus, applications

deployed via Web Start have access to system resources.

3.3 Webserver (Resin)

Resin is a fairly lightweight easy to configure webserver from Caucho

Technologies. Resin includes a full-featured HTTP/1.1 web server dedicated to

serving fast Java dynamic content. It supports the latest Servlet 2.3 specification

from Sun. Resin simplifies creating Java classes by automatically recompiling and

reloading the java when the source changes. It has a very short response times and

most importantly it is free.

3.4 SOAP

SOAP is an XML based lightweight protocol for exchange of information in a

decentralized, distributed environment. Data encoded in a SOAP message can be

used in a variety of situation such as Message Passing and Remote Procedure

9

Page 10: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Calls. SOAP can be potentially used in combination with a variety of other

protocols. SOAP itself does not define any programming model or

implementation specific semantics. It instead defines a simple mechanism by

providing a modular packaging model and encoding mechanisms for encoding

data within modules. SOAP consists of three parts.

1) The SOAP envelope - the top level XML element or the root element in a

XML encoded SOAP message.

2) The SOAP encoding rules – which define a serialization mechanism.

3) The SOAP RPC representation defines a convention that can be used to

represent remote procedure calls and responses.

3.4 JavaMail API

The JavaMail API provides a set of abstract classes that model a mail system and

can be used to build Java Technology based mail and messaging applications.

3.5 JavaBeans Activation Framework (JAF)

Java Activation Framework enables developers to take advantage of standard

services to determine the type of an arbitrary piece of data encapsulate access to

it, and to instantiate appropriate bean to perform the required operations.

10

Page 11: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

3.5 Database (oracle)

Oracle is a robust relational database management system. It was the automatic

choice for our system since it was already installed on the CIS system in Kansas

State University.

3.6 XML

Extensible Markup Language (XML) is the universal format for data on the Web.

XML allows developers to easily describe and deliver rich, structured data from

any application in a consistent way. Structured information contains both content

as well some indication about what role that content plays. More and more

developers are migrating towards XML in their applications.

XML helps in creating richer indexes, databases and content management

systems, it lowers switching costs by letting software systems talk to each other

such a car manufacturers system talking to a parts suppliers system. Data

represented using XML can be displayed on a variety of target devices. The same

XML script can be used to target a PDA or a PC.

3.7 Old IDM System

Rather than develop an all-new backend for the document matching system, we

have used the backend of the document matching system developed by Sorel

Robeldo with some changes. The IDM system, written mainly in Java, uses a mix

of Java Servlet technology, Java Server Pages and static HTML files.

11

Page 12: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

This system is described in detail in his report “Internet Document Matching

(IDM)” submitted as part of the requirements for his Master of Science Degree at

Kansas State University.

3.8 Rational Rose

Rational Rose Enterprise Edition was used to generate class diagram.

12

Page 13: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

4. Detailed Architecture and Implementation

4.1 Database Configuration

The database has a single table called the login table. The login table stores

information about the users of the system. It currently has 4 fields: Username,

Password, Social Security Number(SSN) and Date-of-Birth.

Of the four fields in the login table, only two are used to authenticate the user

when the application starts. These are the Username and the Password fields. The

other two field, namely the Social Security Number and the Date-of-Birth fields

currently are only used as additional information about the user and can be used to

validate a user incase he forgets his Username or Password.

4.2 Web-Server Configuration and Database Connectivity

The system can be broadly divided into the following: static HTML pages, JNLP

files, jar files, the backend IDM system, SOAP Services running on the server end

and the client end application packaged as a jar file.

4.21 HTML pages

There is just one static HTML page, which is basically the index page that

contains a link to the main JNLP file

4.2.2. JNLP files

13

Page 14: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

A JNLP file is basically an XML document. The following shows a complete

example of the Login.jnlp file:

Login.jnlp

<?xml version="1.0" encoding="utf-8"?> <!-- JNLP File for Login Demo Application --> <jnlp spec="1.0+" codebase="http://acrux.cis.ksu.edu:7070/test/SOAPlogin/" href="Login.jnlp"> <information> <title>Login Window</title> <vendor>shahid, Inc.</vendor> <description> Login Frame</description> <description kind="short">Login into the IDM search .</description> <offline-allowed/> </information> <security>

<all-permissions/> </security> <resources> <j2se version="1.4.0-beta3" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <j2se version="1.4.0-beta2" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <j2se version="1.4" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <j2se version="1.3+" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <j2se version="1.3+" max-heap-size="256M"/> <j2se version="1.3+"/> <j2se version="1.3.1" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <j2se version="1.3" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2 <j2se version="1.4.1+" max-heap-size="256M" href="http://java.sun.com/products/autodl/j2se"/> <jar href="Login.jar" main="true" download="eager"/> <extension name = "Additional Jars" href = "additional.jnlp"/> <extension name = "Additional Jars"

14

Page 15: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

href = "additionaltwo.jnlp"/> </resources> <application-desc main-class="LoginUser"/></jnlp>

Some of the attributes as defined by the Java Web Start developers guide are

specified below:

The JNLP Element

codebase attribute: All relative URLs specified in href attributes in the JNLP file

is using this URL as a base.

href attribute: This is a URL pointing to the location of the JNLP file itself. The

Java Web Start software requires this attribute to be set in order for the

application to be included in the Application Manager.

The Security Element

Each application is, by default, run in a restricted execution environment, similar

to the Applet sandbox.  The security element can be used to request unrestricted

access.

If the all-permissions element is specified, the application will have full access to

the client machine and local network. If an application requested full access, then

all JAR files must be signed. The user will be prompted to accept the certificate

the first time the application is launched.

15

Page 16: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

The Resources Element

The resources element is used to specify all the resources, such as Java class files,

native libraries, and system properties that are part of the application.

Extension attribute: If the application needs to use more than one Jar files signed

by different authorities, the extension attribute is used to indicate the JNLP file

that contains the description of the additional jar files.

A detailed description of all the attributes of a JNLP file can be obtained from

Sun Technologies website. All JNLP files must be encoded in UTF-8 encoding.

The server must return “application/x-java-jnlp-file” MIME type for JNLP files in

order for the Java Web Start software to be invoked.

4.2.3 Jar Files

The client application that is sent to the user in the form of a signed jar file as

prescribed by java web start needs the following jar files in order to execute:

soap.jar: Downloadable from http://xml.apache.org/SOAP/

mail.jar: Downloadable from http://java.sun.com/products/javamail/

activation.jar: Downloadable from

http://java.sun.com/products/javabeans/glasgow/jaf.html

16

Page 17: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

xercesImpl.jar and xmlParserAPIs.jar: Downloadable from

http://xml.apache.org/xalan-j/

These files need to be sent along with Login.jar (the client application) and

therefore need to be in the same directory as the client application at the server

end and must be mentioned in the JNLP files specified in the extension tag of

Login.jnlp.

4.2.4 SOAP Services:

There are three SOAP services running on the server each with its own

deployment descriptor.

LoginService: This class accepts the user name and password from the client

application, connects to the database and if the username is a valid one. If the

username password combination is an authentic one, the LoginService class

generates a pseudo-random session-id consisting of the username and a random

number appended together and returns it to the calling application. If user

authentication fails, it returns null.

The LoginService is deployed using the soaplogin.xml deployment descriptor. It

runs under the id “urn:loginService

ServerXMLgen: This SOAP service generates an XML file corresponding to the

user web directory at the server end and sends it to the requesting application.

17

Page 18: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

The ServerXMLgen is deployed using the xmlgen.xml deployment descriptor and

runs under the id “urn:xmlGenerator”

OperationManager: This SOAP service runs the file system operations at the

server end. All file system operations are modeled as drag-and-drop operations

and mouse click operations at the client end. Each such client end operation sends

a message to this service with the necessary parameters in order to execute a

matching operation on the file system at the server end. The operations on the

client and server end are mapped on a one-to-one basis.

The operation manager also handles file uploads to the IDM system upload

directory and is also responsible for creating and later deleting the temporary

directory at the server end to store the files that the user uploads for a search.

The OperationManager is deployed using the operation.xml deployment

descriptor and runs under the id “urn:operationService”.

4.2.5 The Backend IDM system

The description of the files used in the backend IDM system can be obtained from

Sorel Robledo’s report.

4.3 Modifications to the IDM system.

The following files of the original IDM system have been rendered redundant.

index.html - The original starting HTML page.

18

Page 19: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

sample2.jsp – The JSP application responsible for uploading files to the IDM

upload directory and storing the file names as a value of the user session.

The following files have been modified:

Test.java – Originally Test.java was invoked from sample2.jsp. It picked up the

names of the files to be uploaded from the session and proceeds to initiate the

search from there on. It has been modified so that the session now contains the

value of the new session_id that corresponds to the directory name under the

upload directory that contains the user files to be uploaded.

IntSearch.java – This class has been modified extensively to use the Google API

so as to carry out searches with Google’s permission.

DocComp2.java - This is the class that produces the output when the document

comparison button on the results page is hit. It generates a side-by-side

comparison of the source document with the plagiarized version found on the net.

Since originally the source file names were present in the session as a parameter,

it located the files from these filename values. In the new version, the new

session_id is passed as a parameter in the session and the DocComp2 class uses

this session_id variable to find the uploaded user documents.

4.4 The Client Application

The client application is packaged as a jar file (Login.jar) and is deployed via Java

Web Start at the client end. Along with Login.jar, Java Web Start also downloads

19

Page 20: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

other jar files required for the application to run namely activation.jar, soap.jar,

mail.jar, xercesImpl.jar and xmlParserAPIs.jar.

The client application frame contains the following 3 components:

DirectoryTree – It is basically a JTree that displays the client file system in a tree

like structure. The tree has two types of nodes corresponding to Files and

Directories. The directory nodes can be expanded to view the files within and

conversely they can be collapsed. The DirectoryTree recognizes DragEvents and

any node of this tree can be dragged and dropped into the XTree2.

XTree2 – Also extends JTree and is a representation of the client’s files in his

web directory. It supports both Drag and Drop operations. Once again, the nodes

of this tree are of two types indicating weather the modeled object is a File or a

directory. Files from the client’s local system can be moved to his web directory

by simply dragging the required node in the DirectoryTree and dropping it into

his Web Directory tree (XTree2). Nodes in XTree2 can be added, deleted and

renamed and moved. There is also an option to delete all the files in the client’s

web directory by hitting the clear button. The files in XTree2 serve as the source

documents for the IDM search. Any file that needs to be searched can be simply

dragged from XTree2 and dropped into the Droppable List.

DroppableList – It extends JList and models a list of the files that need to be

searched. XTree2 serves as the source for the Drop enabled DroppableList. The

user can simple drag a node from XTree2 and drop it into the DroppableList.

20

Page 21: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Once the files are present in the list, the user needs to hit the Search button to start

the search. The Clear button can be used to purge files.

There are several other classes that are packaged in the Login.jar jar file that are

used by these three components and these are listed along with the class diagram

in the next section.

The main thing to note in the client application is that bye default, JTrees do not

support Drag and Drop operations, so instead, each of the trees must implement

the suitable Drag listeners and when the corresponding event is fired, take the

necessary course of action.

Another thing of interest is that Windows does not lend itself well to the java

File.listRoots() method that is used to get the roots of the system. It inexplicably

gets stuck trying to access A: and other removal drives and throws up a window

asking if you would like to abort, retry or ignore which refuses to go away in spite

of any amount of clicking. To get around this we use the

WindowsAltFileSystemView written by Steve Bohne of the Java Web Start

Engineering group.

21

Page 22: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

5 Class Diagram

Figure 5-1 UML class diagram.

Figure 5-1 shows the UML class diagram for the entire application. Of the classes

shown, the OperationManager, ServerXMLgen and LoginService are the SOAP

22

Page 23: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

services running on the server side. The rest of the classes are packaged as

Login.jar to run on the client side.

The main method is in LoginUser and therefore this is the class that gets

instantiated first. LoginUser makes user of the LoginDialog class to get the client

information. It then uses the verifyUser() method of the SOAP service

LoginService to authenticate the user. If the user is verified, LoginService

generates a session id and returns it to LoginUser. If the session id is not null,

meaning that the user has been verified, LoginUser then queries the

createDirModel() method of the SOAP service ServerXMLgen which returns an

XML document as a string corresponding to the users web directory.

LoginUser then instantiates the XTreeTester frame and passes the XML string to

it as a parameter. XTreeTester in turn creates instances of DirectoryTree, XTree2

and the DroppableList. DirectoryTree recurses through the user’s local machine

and populates itself with this data. XTree2 uses the XML string corresponding to

the user’s web directory to populate itself.

DirectoryTree, XTree2 and the DroppableList have various Drag and Drop

Listeners to listen to drag and drop events that are translated to File System

operations. XTree2 and the DroppableList use the SOAP service

OperationManager to carry out these File System operations. Finally

DroppableList uses BrowserControl to generate an instance of a web browser.

23

Page 24: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

6 User Interface

The user interface starts with the index page as shown below.

Figure 6-1 The index page.

After the user clicks on the link, it starts Java Web Start which then proceeds to

download the jar files from the server. After the jar files are downloaded, the user

authentication dialog box pops up

24

Page 25: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-2 The Login Dialog.

On entering the correct user name and password, the applications main UI is

displayed as shown below

Figure 6-3 Screen shot of the User Interface.

As can be seen, the application shows the users current system and his web

directory in a tree structure.

Nodes in both the Trees can be expanded and contracted at will.25

Page 26: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-4 Screen shot of nodes being expanded.

Nodes (Files and Directories) can be copied from the Local Machine to the web

directory. Consider the node D:\190 moved over to shahid\girish\

26

Page 27: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-5 Screen shot of nodes being moved.

The Web directory Tree also allows us to rename a node, add and delete a

Directory Node and delete a File node.

Figure 6.6 and 6.7 show the operation of renaming a node. The node shahid\girish

is renamed to shahid\gary

27

Page 28: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-6 Screen shot of a node being modified.

Figure 6-7 Screen shot of node modification complete.

28

Page 29: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

The figure 6.8 and 6.9 below show the operation of deleting a File node shahid\

gary\190\README.txt

Figure 6-8 Screen shot of node selected to be deleted.

Figure 6-9 Screen shot of node deletion completed.29

Page 30: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

A node can be moved to any level in the hierarchy as long it satisfies the rules of

the underlying File System. Consider the moving of the node shahid\gary\190 to

shahid\florida\cgiwig\190

Figure 6-10 Screen shot before moving the node.

30

Page 31: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-11 Screen shot after node has been moved.

Finally, files that need to be searched can be copied into the DroppableList that is

entitled “Files to Upload”. At this time only text files can be moved into the

search list and the files must be in the client’s web directory to be uploaded into

the search engine.

Consider moving the file shahid\test_files\EinsteinShort.txt into the upload

directory.

31

Page 32: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-12 Screen shot before moving a file to Upload List.

Figure 6-13 Screen shot after files have been moved to Upload List.

32

Page 33: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Clicking the clear button under the list can purge the files uploaded in the search

list. To start the document search, the search button has to be pressed. This calls

the Test.java Servlet of the IDM system and when the search is complete an

HTML page is invoked in the default web browser which contains a link to the

results page.

Fig 6-14 Screen shot of the page with the results link for the search.

When the link is clicked, the results page is shown.

33

Page 34: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-15 Screen shot of the Results Page.

Clicking on details brings up a web page corresponding to the following.

34

Page 35: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

Figure 6-17 Screen shot of the Details Page.

Clicking on the Document Comparison button brings up the following page.

Figure 6-18 Screen shot of the Document Comparison page.

35

Page 36: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

7. Testing and Results

The application has been tested on both Sun Solaris and Windows Operating

system using Java Web Start version 1.0.1_02. While it runs seamlessly on the

Solaris terminals, there is one slight glitch on the windows system. While

generating the DirectoryTree structure in the windows environment, the user will

encounter a window saying that the A: had no disk and asking if you would like to

abort, retry or ignore. This window may pop up a couple of times and will

disappear after that.

The application will take a while to load the first time around because of the large

number and size of jar files that need to be downloaded but on every next

invocation, it will start of almost instantaneously since it caches information since

the last access.

There is no compatibility issue with Web Browsers because the application runs

independent of a Web Browser. Browsers are only used to display HTML pages.

While the system is robust in general, there is one key area that is prone to errors

and that is the Google API. There are key words like “of” that cause the Google

API to throw and error. In addition to this, after the quota of 1000 searches is

exhausted, the Google API throws an error saying that the search quota has been

exceeded and every subsequent search request will be denied.

36

Page 37: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

8. Future Enhancements

There are several enhancements that could make this tool a more viable option.

The underlying IDM system that compares the documents using Google could use

a faster more efficient algorithm to break the documents down into meaningful

chunks. As of now, some of the chunks generated are often meaningless with

respect to comparison with other documents. Chunks like “of” and “the” and

sometimes even blank strings seem to be able to find their way into the search.

Such chunks have to filtered out because they contribute toward slowing the

system down and increase the number of searches that we have to carry out using

Google.

Another key area of concern is the high amount of searches that the system

generates. Test documents between 5-10 KB are generating about 300 searches

each. Either the number of searches has to be toned down which again refers to

the fact that the chunking algorithm has to be more efficient, or permission needs

to be obtained from Google to be able to run a much higher number of searches

preferably both.

Currently the system allows only 5 documents to be uploaded to search at one go

because of the search limitations mentioned above. Once the previous issue has

been resolved, this issue can also be addressed.

37

Page 38: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

The entire tool could be packaged as an application that can be run from the

user’s local machine thereby increasing speed of execution and distributing

computation amongst users rather that concentrating computation onto a single

server. Rather that the administrator using his search key to run the searches, each

user could also register with Google and obtain his own private key to the Google

API.

Another useful addition to the tool would be to compare all the files with the users

own directories for possible plagiarism. The search mechanism can be enhanced

to accept a richer variety of input files as against just plain text files.

38

Page 39: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

9. Conclusion

With the growing incidences of Plagiarism, we feel that this tool will go a long

way in helping instructors concentrate on the quality of work rather than worry

about its source and the fact that a document may have been plagiarized.

The new GUI adds a lot of features to the IDM system developed by Sorel

Robeldo and adds some sorely needed user customization. Moreover, since the

user’s web directory serves as a persistent file store, once the user has uploaded

files, he can always terminate the application and come back at a later time login

and continue from where he left of.

Since the application is delivered online, the user is not geographically

constrained from using the tool. We have used the most recent Internet

technologies and have ensured that the application will lend itself well to

upgrades at a future time.

39

Page 40: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

10. References

- Rob Kenworth, “Improve JTree usability with Drag and Drop”, Java

World Java Tip-97,June 30 ,2000, http://www.javaworld.com/javatips/jw-

javatip97.html

- Kyle Gabhart, “Roll Your Own Swing-based XML Editor”, DevX

10

MinuteSolutions,http://gethelp.devx.com/techtips/java_pro/10MinuteSolut

ions/Gabhart10min03/Gabhart10min03-1.asp

- Kathy Walrath - Mary Campione “The JFC Swing Tutorial: A Guide to

Constructing GUIs”, Addison-Wesley Pub Co, July 1999, ISBN:

0201433214

- Java Web Services Tutorial, “Reading XML Data into a DOM”,

http://java.sun.com/webservices/docs/1.0/tutorial/doc/JAXPDOM3.html

- Eric Armstrong - Tom Santos, - Steve Wilson, “Understanding the

TreeModel”, Nov. 02, 2002,

http://java.sun.com/products/jfc/tsc/articles/jtree/

40

Page 41: 1 - People | Computer Sciencepeople.cs.ksu.edu/~shahid/Report/4-Report_Content.doc  · Web view, it accepts popular formats such as Word and plain text and returns a report with

- David Geary, “Drag and Drop with Swing”, Nov. 1, 2002,

http://java.sun.com/products/jfc/tsc/articles/dragndrop/

- Steve Bohne, “Using JFileChooser in WebStart-deployed application”,

Java Web Start and JNLP Forum, Oct. 18 2000,

http://forum.java.sun.com/thread.jsp?forum=38&thread=71491

- Steve Spencer, “Control Browsers from your Java Application”, Java

World Java Tip-66, January 01, 1999,

http://www.javaworld.com/javatips/jw-javatip97.html

- Samudra Gupta, “Using SOAP With Java”, Java Boutique Tutorials,

http://javaboutique.internet.com/tutorials/SOAP/

- Brett McLaughlin, “Java and XML”, 2nd Edition, O’Reilly Publications,

ISBN: 0-596-00197-5

- Sun Microsystems, “Java Web Start Technology Guide”,

http://java.sun.com/products/javawebstart/docs/developersguide.html

41