johnson graduate school of management library project
DESCRIPTION
Clients: Ken Bolton Angela K. Horne JGSM Library Reference Team. Project Team: Jonathan Gong Benson Lee Man Fai Matthew Lee Greg Leedberg Liz Xu. Johnson Graduate School of Management Library Project. Functional Requirements. Search Function Simple Search Advanced Search - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/1.jpg)
Johnson Graduate School of Management Library Project
Clients:
Ken Bolton
Angela K. Horne
JGSM Library Reference Team
Project Team:
Jonathan Gong
Benson Lee
Man Fai Matthew Lee
Greg Leedberg
Liz Xu
![Page 2: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/2.jpg)
Functional Requirements
Search Function Simple Search Advanced Search
Administrative Features Add HTML Page Remove HTML Page Update Existing HTML Page
![Page 3: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/3.jpg)
Search Feature
Why? Because the client would like the content of their website to be more accessible
Simple Search For a easy to accessible search
Advanced Search To limit search results and get better results
![Page 4: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/4.jpg)
Simple Search
A search box will be located on the home page of the JGSM library website. (http://www.library.cornell.edu/johnson/)
The system will return all of the pages that contain all or any of the words provided by the user. (with exceptions)
Example: “Bloomberg FAQ”
“the Bloomberg FAQ”
![Page 5: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/5.jpg)
Advanced Search
Search fields: Find pages with all of the keywords Find pages with any of the keywords Find pages with “the exact phrase”
Example: “Bloomberg FAQ” Limit search to a specific category
![Page 6: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/6.jpg)
Viewing Search Results
The results of the search should be displayed 10 to a page in ranked order
Search results will contain the title of the pages, link to the pages, and a short description
Search results should reflect what the most useful links are to users
Example: “Bloomberg FAQ”
![Page 7: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/7.jpg)
Administrative Features - Add
All administrative features must authenticate the user using a username and password.
Add HTML Page – Administrator can: specify a URL to add to the search system
the system will add page and key metadata into the database
select category for the page add an abstract to be associated with the page (optional)
if there is no abstract, part of the text of the document will be displayed in search results
![Page 8: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/8.jpg)
Admin Features - Remove
Remove HTML Page – Administrator can: specify a URL to remove from the search system
the system will remove the page and all association with the URL from the database
upon removal the page will no longer be searched by users
if the URL does not exist in the database, the system will display an error
![Page 9: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/9.jpg)
Admin Features - Update
Update HTML Page – Administrator can: specify the page to update using its URL
the page metadata in the database is updated from the new URL
change the category of the page (optional) change the abstract of the page after viewing the
old page abstract (optional)
![Page 10: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/10.jpg)
Non-Functional Requirements
Ease of Use Documentation Help System Deployment Scalability Security Design Criteria
![Page 11: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/11.jpg)
Ease of Use
System will be extremely easy to use
Search Search box on main JGSM Library’s page A link on the main JGSM Library’s page to the
advanced search page Advanced search’s 3 options are also self-explanatory
![Page 12: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/12.jpg)
Ease of Use
Administration The administration user interface is very
straightforward Three functionalities:
Add a Page Remove a Page Update a Page
![Page 13: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/13.jpg)
Ease of Use
After viewing the training slides and trying it out a few times…
An administrator should be able to maintain the database through the administration page immediately
![Page 14: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/14.jpg)
Documentation
All source code that we write will have documentation within
All source code that we use from another source will include information on where it came from
A separate document will contain our implementation strategies and describe all algorithms we use
![Page 15: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/15.jpg)
Help System
Search There will be a link to a help page on the
advanced search page that suggests ways to get better search results
That page will automatically display if no search results are found
![Page 16: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/16.jpg)
Help System
Administration A brief help page written by us will be linked to on
the administration page for instructions on usage There will be error messages that indicate what
went wrong, if errors occur during database maintenance
![Page 17: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/17.jpg)
Deployment
We will install and configure all necessary software and integrate the system into the JGSM Library system
After deployment, system can be used instantly by anyone who accesses the page
![Page 18: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/18.jpg)
Scalability
The system will not experience visible slowdown as the document base grows, up to at least twice the number of documents currently in the database
This applies for both searching and database administration
![Page 19: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/19.jpg)
Security
Administration page will be accessed with user name and password
We recommend that the client do not link to the administration page from anywhere on the JGSM site
![Page 20: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/20.jpg)
Use Cases
The use scenarios of this system involve two actors: The website user who wishes to search The administrator who actually manages the website.
WebsiteUser Administrator
![Page 21: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/21.jpg)
Use Cases:WebsiteUser Use Cases
WebsiteUser
Quick search
Advanced Search
View Results
«uses»
«uses»
![Page 22: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/22.jpg)
Name: Quick search Actor: WebsiteUser Flow of events:
1. WebsiteUser visits Johnson Graduate School of Management Library Website
2. WebsiteUser clicks in "simple search" box near top of page
3. WebsiteUser types in one or more search terms into the box that they desire to search for.
4. WebsiteUser presses <enter>.
WebsiteUser
Quick search
View Results
«uses»
![Page 23: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/23.jpg)
5. WebsiteUser views results via the View Results use case.
6. When completed, WebsiteUser may either browse to another webpage, close their web browser, or perform another search.
Entry conditions: WebsiteUser knows URL of library website. WebsiteUser has a compatible browser.
WebsiteUser
Quick search
View Results
«uses»
![Page 24: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/24.jpg)
Name: Advanced Search Actor: WebsiteUser Flow of events:
1. WebsiteUser visits Johnson Graduate School of Management Library Website
2. WebsiteUser clicks on "Advanced Search" link. 3. WebsiteUser is presented with advanced search options --
searching for "any" words, "all" words, exact phrase, or within a certain category.
4. WebsiteUser types in one or more search terms into the box that corresponds to the type of search they wish to perform.
5. WebsiteUser selects the category they wish to search within, if any.
6. WebsiteUser clicks the "search" button.
WebsiteUser Advanced Search
View Results
«uses»
![Page 25: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/25.jpg)
7. WebsiteUser views results via the View Results use case.8. When completed, WebsiteUser may either browse to
another webpage, close their web browser, or perform another search.
Entry conditions: WebsiteUser must have a web browser capable of
displaying the Johnson Graduate School of Management library website.
WebsiteUser must know the URL of the JGSM Library website, or browse there from another site.
WebsiteUser Advanced Search
View Results
«uses»
![Page 26: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/26.jpg)
Name: View Results Actor: WebsiteUser Flow of events:
1. Website presents WebsiteUser with a results page, containing a list of the first 10 results, ordered by relevance as determined by the search engine's ranking algorithm.
2. For each result, the results page includes a title of the page, a link to that page, and the context in which the search term(s) were used, OR an abstract of the page.
3. If a result seems useful to WebsiteUser, they click on the link and can visit the page. They may navigate back to the results page to see the results again.
4. If there are more than 10 results, WebsiteUser may see the next 10 by clicking a "next page" link at the bottom of the search results.
WebsiteUser
Quick search
Advanced Search
View Results
«uses»
«uses»
![Page 27: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/27.jpg)
Use Cases:Administrator
Administrator
Add a page tosite/index
Remove a page fromsite/index
Update a page'scontent in site/index
Authenticate«uses»
«uses»
«uses»
![Page 28: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/28.jpg)
Name: Add a page to the site/index Actor: Administrator Flow of events:
1. Administrator adds HTML page to online website.2. Administrator visits the Administration Page.3. The “Authenticate” use case authenticates the Administrator4. In the "Add" section, Administrator enters the URL of the page just
added.5. If Administrator desires to store a description of the page (for use in the
search results), they enter it in the description box.6. If this page belongs to a category (used for advanced searching), they
may select that category from the category pull-down menu.7. Administrator clicks the "Add" button.
Administrator
Add a page tosite/index
«uses»Authenticate
![Page 29: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/29.jpg)
7. Page is now indexed and available for searching.
8. Administrator is returned the Administration page.
Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the
administration page.
Administrator
Add a page tosite/index
«uses»Authenticate
![Page 30: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/30.jpg)
Name: Remove a page from the site/index Actor: Administrator Flow of events:
1. Administrator removes HTML webpage from the online website.2. Administrator visits the Administration page.3. The “Authenticate” use case authenticates the administrator.4. In the "Remove" section, Administrator enters the URL of the page just
removed.5. Administrator clicks the "Remove" button.
Administrator
Remove a page fromsite/index
Authenticate«uses»
![Page 31: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/31.jpg)
5. All data relating to that webpage is then removed from the index, and will no longer appear in search results.
6. Administrator is now returned the administration page.
Entry conditions: Administrator must have a compatible web browser. Administrator must know the URL of the administration
page.
Administrator
Remove a page fromsite/index
Authenticate«uses»
![Page 32: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/32.jpg)
Name: Updating a page in the site/index Actor: Administrator Flow of events:
1. Administrator updates the HTML webpage on the online website.
2. Administrator visits the Administration page.
3. Administrator is authenticated through the “Authenticate” use case.
4. In the "Update" section, Administrator enters the URL of the page which has been updated.
5. Administrator clicks the “Continue" button.
Administrator
Update a page'scontent in site/index
Authenticate«uses»
![Page 33: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/33.jpg)
5. Administrator is now presented with the current abstract, if one exists, for the page being updated.
6. If the Administrator wishes to alter or remove the abstract, they may edit it here.
7. Administrators clicks "Update" button.
8. All data relating to the updated page in the search index now reflects the updated contents.
9. Administrator is now returned to the administration page.
– Entry conditions: Administrator must have a compatible browser. Administrator must know the URL of the administration page.
Administrator
Update a page'scontent in site/index
Authenticate«uses»
![Page 34: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/34.jpg)
Name: Authenticate Actor: Administrator Flow of events:
1. Administrator is requested for a user name and password for the administration page.
2. Administrator supplies user name and password, and presses <enter>.
3. Administrator is granted access to administration page.
Administrator
Add a page tosite/index
Remove a page fromsite/index
Update a page'scontent in site/index
Authenticate«uses»
«uses»
«uses»
![Page 35: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/35.jpg)
User Interface
The design follows our understanding of the client’s requirement.
![Page 36: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/36.jpg)
Simple Search
![Page 37: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/37.jpg)
Advanced Search
![Page 38: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/38.jpg)
Search Results
![Page 39: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/39.jpg)
Ease of Use
Simple search for new users. Advanced search for skilled users.
![Page 40: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/40.jpg)
Administration Interface
![Page 41: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/41.jpg)
Adding a Page…
The add-page section on the Database Administration page
![Page 42: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/42.jpg)
Removing a Page…
The remove-page section on the Database Administration page
![Page 43: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/43.jpg)
Updating a Page…
The update-page section on the Database Administration page
![Page 44: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/44.jpg)
Update a Page…
![Page 45: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/45.jpg)
Consistent Procedures
Adding a page, removing a page and updating a page all follow similar procedures.
![Page 46: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/46.jpg)
Feedbacks
A feedback about the administrative operation will be displayed on the top of the main administration page after the operation.
![Page 47: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/47.jpg)
Error Handling
Error message for failed operations.
![Page 48: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/48.jpg)
Follow up
The user should feel in control
![Page 49: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/49.jpg)
Development Tools
![Page 50: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/50.jpg)
PhpDig vs. Home BrewPhpDig pros: Easier to maintain if familiar with search API Potentially more flexible – for example, can automate indexing More robust than our solution PhpDig works right now Can index MS-Word, PDF, Excel documents with plug-ins
PhpDig cons: Documentation does not specify algorithms used Code is longer and more complex Indexing relatively slow, must use Firefox to add and update pages Using Help Forum on website requires $5.00 for 30 days access Simplistic ranking algorithm (based on cursory glance)
We currently favor using PhpDig as our solution.
![Page 51: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/51.jpg)
Database Schema (Ours)
WordTable
PK WordPK ID
Count
PageTable
PK ID
URLTitleCategoryFullTextDateModifiedAbstract
![Page 52: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/52.jpg)
Three Main Functions
Add page to database Search for page in database Remove page from database
![Page 53: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/53.jpg)
Stop List (PhpDig + Ours)
Example words: a, the, I’m, isn’t, moreover
Example: The pig makes excellent soup
Filtered words: pig, makes, excellent, soup
![Page 54: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/54.jpg)
Porter Stemmer (Ours)
Word stems are extracted from words Implementation is from Porter’s website Example:
pig, makes, excellent, soup
pig, make, excel, soup
![Page 55: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/55.jpg)
Adding a Page
1. Scan page into database
2. Filter out common words with stop-list
3. Use Porter Stemming algorithm to retrieve word stems (PhpDig uses twoletters trick)
4. Add word stems to database
![Page 56: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/56.jpg)
Search
1. Filter out common words from query
2. Use Porter Stemming algorithm on query (PhpDig uses twoletters)
3. Look for words in word table
4. Return pages that contain query terms
![Page 57: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/57.jpg)
Removing a Page
PhpDig:• Look up URL in sites
table, get page ID, then get spider ID.
• Remove entries with corresponding spider IDs in engine and spider tables
• (Optional) Delete file from FTP server
Ours:• Look up URL in page
table, get page ID• Remove word entries
with corresponding page ID in word table
• Remove entry with specified URL in page table
![Page 58: Johnson Graduate School of Management Library Project](https://reader035.vdocuments.site/reader035/viewer/2022062720/56813322550346895d99f9cb/html5/thumbnails/58.jpg)
Query Results
Ours: Sort by term frequency - inverse document frequency (tf-idf) score
PhpDig: sorted by occurrence
“Everything should be made as simple as possible, but not simpler.” -Einstein