efficient online information searching 251111 internet and online community week 3

Post on 26-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EFFICIENT ONLINE INFORMATION SEARCHING251111 Internet And Online CommunityWeek 3

REVIEW

• Computer Technologies & The Modern World• Evolution of Communication & Technology

• Telecommunication• Input Devices• Output Devices

• Future Technology• Context Aware Computing

• Breakthrough Technologies

10 BREAKTHROUGH TECHNOLOGIES 2014

1. Agricultural Drones

2. Ultraprivate Smartphones

3. Brain Mapping

4. Neuromorphic chips

5. Genome Editing

6. Microscale 3-D Printing

7. Mobile Collaboration

8. Oculus Rift

9. Agile Robots

10.Smart Wind & Solar Powerhttp://www.technologyreview.com/lists/technologies/2014/

THIS WEEK

• Efficient Online Information Searching• How do you search for information?• Search Engines• Search Engine Optimisation (SEO)

A SEARCH ENGINE

SEARCH

Documents

DocumentRepresentations

Indexing MatchingRelevance /Feedback

Query

RetrievedDocuments

MEASURING SEARCH EFFICIENCY

• Recall• (a.k.a. Sensitivity)• Fraction of relevant instances retrieved

• Precision• (a.k.a. Positive Predictive Value)• Fraction of retrieved instances that are relevant

RECALL & PRECISION

(Walber)

RETURNED RESULTS

• The “Blue” area represents all the relevant articles

• The “Orange” area represents other articles that could be returned

A A BCBC

RECALL

• A = Relevant Returned Articles

• C = Relevant Unreturned Articles

PRECISION

• A = Relevant Returned Articles

• B = Irrelevant Returned Articles

RECALL & PRECISION

• Suppose there are 200 relevant articles

• A search engine returns 40 articles, of which 25 are relevant…

• What is the recall?

• What is the precision?

GOOGLE SEARCH REFINEMENT

• Quotes! “…”• Force Google to look for something

• Star Wars I vs Star Wars “I”• Jobs is central LA vs Jobs in central “LA”

• -• Stop Google from looking something• Dolphins –football

• ~• “Is Similar to” – look for synonyms• ~inexpensive

GOOGLE SEARCH REFINEMENT

• OR or “|”• Bangkok | Chiangmai

• ..• Specify a range

• *• Replaces one or more words• google * my life

GOOGLE SEARCH REFINEMENT

• allintitle:• makes sure the search appears in the title• allintitle: ken cosh

• cache:• returns cached copy of page

• link:• returns pages that link to the specified page

• site:• restrict results to a particular website

SEARCH ENGINE MARKET SHARE

• Which Search Engine do you use?• Which is the most popular?

VIDEO BREAK!

• How Search Works• https://www.youtube.com/watch?v=BNHR6IQJGZs

SEARCH ENGINES

• A great source of traffic for your site.

• But, how do they decide which sites to display, and which order to display them on their SERPs?• SERPs = Seach Engine Results Pages

• Obviously being #1 in Google for a popular search term will bring you lots of traffic.

RANKING ALGORITHM

• We don’t know, but it takes plenty of factors into account;• Page Content• Meta tags• Age• Keyword density• Links

• And the algorithm appears to evolve over time.

GOOGLE’S MAGIC

• Gone are the days when you can just say what your page is about, now its much more technical…

• Much of Google’s magic comes from their patented “PigeonRank” algorithm• http://www.google.com/technology/pigeonrank.html

PIGEON -> PAGERANK

• PageRank is a numeric value that represents how important a page is on the web.

PAGERANK

• Google figures that when one page links to another page, it is effectively casting a vote for the other page. • The more votes that are cast for a page, the more important the page

must be.

• The importance of the page that is casting the vote determines how important the vote itself is. • Google calculates a page's importance from the votes cast for it. • How important each vote is is taken into account when a page's

PageRank is calculated.

PAGERANK

• PageRank is Google's way of deciding a page's importance.

• It matters because it is one of the factors that determines a page's ranking in the search results.

• It isn't the only factor that Google uses to rank pages, but it is an important one.

LINK FARMS ETC.

• Not all links are counted by Google. For instance, they filter out links from known link farms. Some links can cause a site to be penalized by Google. They rightly figure that webmasters cannot control which sites link to their sites, but they can control which sites they link out to. For this reason, links into a site cannot harm the site, but links from a site can be harmful if they link to penalized sites. So be careful which sites you link to. If a site has PR0, it is usually a penalty, and it would be unwise to link to it.

CALCULATING PAGERANK

• To calculate the PageRank for a page, all of its inbound links are taken into account. These are links from within the site and links from outside the site. • PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn)) • That's the equation that calculates a page's PageRank. It's the

original one that was published when PageRank was being developed, and it is probable that Google uses a variation of it but they aren't telling us what it is. It doesn't matter though, as this equation is good enough.

CALCULATING PAGERANK

• PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))

• 't1 - tn' are pages linking to page A

• 'C' is the number of outbound links that a page has

• 'd' is a damping factor, usually set to 0.85.

PAGERANK SIMPLIFIED

• We can think of it in a simpler way:- • a page's PageRank = 0.15 + 0.85 * (a "share" of the PageRank of

every page that links to it)

• “share” = the linking page’s PageRank divided by the number of outbound links on the page. • A page "votes" an amount of PageRank onto each page that it

links to. The amount of PageRank that it has to vote with is a little less than its own PageRank value (its own value * 0.85). This value is shared equally between all the pages that it links to.

PAGERANK

• From this, we could conclude that a link from a page with PR4 and 5 outbound links is worth more than a link from a page with PR8 and 100 outbound links.

• The PageRank of a page that links to yours is important but the number of links on that page is also important.

• The more links there are on a page, the less PageRank value your page will receive from it.

OR PERHAPS NOT…

• If the PageRank value differences between PR1, PR2,.....PR10 were equal then that conclusion would hold up, but many people believe that the values between PR1 and PR10 (the maximum) are set on a logarithmic scale, and there is very good reason for believing it.

• Nobody outside Google knows for sure one way or the other, but the chances are high that the scale is logarithmic, or similar.

• If so, it means that it takes a lot more additional PageRank for a page to move up to the next PageRank level that it did to move up from the previous PageRank level.

• The result is that it reverses the previous conclusion, so that a link from a PR8 page that has lots of outbound links is worth more than a link from a PR4 page that has only a few outbound links.

EITHER WAY…

• Whichever scale Google uses, we can be sure of one thing. A link from another site increases our site's PageRank. Just remember to avoid links from link farms.

SEO

• Search Engine Optimisation• Become an important job for website owners

WHAT IS SEO?

• Search Engine Optimisation• Making webpages more search engine friendly.

• SEO should be considered from the start.• Domain Name• Site Structure• Site Design• Site Navigation• Site Topics• Headings• Subheadings• Content• Links• Usability• Accessibility

WHY IS IT IMPORTANT?

• 24% of marketers said that >75% of their traffic comes from search engines• 60% of students use search engines to find online

retailers• 55% of online purchases were made on sites found

through search engines• 80% of users reach sites through search engines• 48% of websites depend on search engines for the

majority of their traffic(Various sources)

WHY IS IT IMPORTANT?

• Following Search Engine rules.• If your webpage fits the criteria for a certain search term, you’ll get top

ranking.

• Search Engine Optimisers• Modify webpages to fit the criteria to give a page a better chance of being

selected.

DESIGN WITH SEO IN MIND

• It’s tempting to build a website, and then think about SEO.

• Better to design with SEO in mind

DOMAIN NAME

• Get a domain name that contains your keywords

• But make sure it is still memorable…

• www.AAA1-Chiang-Mai-Travel-Hotel-Guide-Bookings-Tourist.com• Is not a good domain name!

WEBSITE STRUCTURE

• Usability• It doesn’t matter how good the content is if the site is frustrating to use.

• Linkability• Remember the internal linking structure, and its effect on PageRank

WEBSITE DESIGN

• Flash?• NO! Search Engines rely on keywords to classify pages, while

flash is mostly for entertainment. • Search Engines do not index flash files.

• HTML• Yes! It’s easy and spiders have no problem indexing it.• But PHP etc. is fine so long as you use search engine friendly

urls & links

WEBPAGE CONTENT

• Spiders use the content to know where to categorise each page.• A page with no text (flash site)

• Where should it be put?• A page with lots of text on lots of topics

• Where should it be put? There are too many competing keywords.• The amount of content is also important.

LINKS

• After content, links are the most important thing…• Some would even argue it’s the opposite way around.• PageRank

• The link text is just as important as the link.• It is tempting to use an attractive graphical button for the link –

but how can the spider associate keywords with the link?

HOW MANY KEYWORDS?

• Keyword Frequency• The number of times a keyword, or phrase, appears within a page.

• Keyword Density• The ratio of keywords contained in the page within the number of total

indexable words• Perhaps 1-3%

KEYWORD DENSITY

• Is more complicated than that.• Different search engines have different preferences• Different search engines will also calculate a different density for your page;

• Stop words?• Word Stemming?• Keywords in particular HTML tags

KEYWORD PROMINENCE

• As well as frequency and density, prominence is also a factor• Words appearing near the beginning of the page, paragraph, sentence.• Certain HTML tags (title)

KEYWORD PROXIMITY

• How close keywords are together could also be a factor.

• Consider a search for ‘dog biscuits’• “We sell delicious biscuits for all breeds of dogs!”• “We sell the most delicious dog biscuits in the world!”

top related