the top ten largest databases in the world

10

Click here to load reader

Upload: lewiskeller

Post on 17-Oct-2014

37 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Top Ten Largest Databases in the World

The Top Ten Largest Databases In The World

By Lewis Keller

2/27/2012

[Type the abstract of the document here. The abstract is typically a short summary of the contents of the document. Type the abstract of the document here. The abstract is typically a short summary of the contents of the document.]

Page 2: The Top Ten Largest Databases in the World

The Top Ten Largest Databases In The World

Introduction

When I was presented with the opportunity to research the largest databases in the world,

I was willing to do a detailed discussion of the top five. However, I came across a list of the top

10 largest databases in the world. So, I decided to expand my discussion to cover the whole list.

One thing that I’m not surprised about is that the top two are owned by our government (the

Library of Congress and the Central Intelligence Agency, respectively). However, what I am

surprised about is that Google made it only to #7 on the list. Considering that it has a vast

amount of knowledge available to the public, I thought that it would be somewhere within the

top five. Overall, though, the sizes of these databases are pretty astounding, as several of them

are hundreds of terabytes in size.

#1: The Library of Congress

The Library of Congress has 130 million documents altogether. They have so much text

data that, if it were to be digitized, it would be 20 terabytes in total size! They have 5 million

digital documents, and over 10,000 items are being added to the database every day. However,

many of these items are restricted from the general public.

I decided to test their online system by doing a search for “Vietnam”. I immediately ran

across their 10,000-item limit, which shows me how immense their online system is. The newest

document I came across in my search was an article from 1991, and they had several documents

from the 1960’s and the 1970’s. The only thing that I don’t like about it is that the system gave

me only five minutes to do my search before it would kick me out.

Page 3: The Top Ten Largest Databases in the World

#2: The Central Intelligence Agency

One interesting thing about the CIA’s database is that its size is unknown, due to the

number of classified files that it contains. However, there are portions of it available to the

public, such as The World Fact Book and the contents of the Freedom of Information Act

Electronic Reading Room. Another thing about the database is that it contains statistics on more

than 250 countries and entities.

The Electronic Reading Room makes some (potentially sensitive) government documents

available to the public, which can help someone find a copy of a previously passed act of law to

use for research. So, with my high level of curiosity, I decided to test it, too. I did a search on

Africa, and was able to come up with 98 items, which were available in both GIF and PDF

formats.

#3: Amazon.com

With the wealth of items that Amazon has for sale online, one would expect them to have

a large database. Well, their expectations are right, because Amazon’s database contains 42

terabytes of data. This database gathers and keeps massive amounts of intimate information

about its millions of shoppers, including their religion, sexual orientation, ethnicity and income.

This database combines information disclosed voluntarily by customers with facts gleaned from

public databases. This gives Amazon more detailed information about its customers than any

other retailer.

#4: YouTube

Page 4: The Top Ten Largest Databases in the World

In 2006, back when YouTube was just starting to gain its foothold in our society, their

database was projected to have 45 terabytes of data. I seriously can’t imagine how many

terabytes of data are on there now, six years later. The database is open for people who want to

access it, which I find kind of astonishing, because of the possibility of users’ personal data

being exposed to the public. Despite this, in order to gain access the database, you must request

special developer and client keys. Due to the varying sizes and time-lengths of each video,

estimating the size of YouTube’s database is a difficult task to achieve. YouTube’s data API is

geared towards developers who have experience in dealing with programming server-side

languages.

#5: ChoicePoint

Consisting of 250 terabytes of personal data, ChoicePoint's database of 17 billion public

records is used for background checks, insurance applications and tenant screening. The database

contains information on approximately 250 million people. One thing that I don’t like about

ChoicePoint, is that they sell data to the highest bidders, which include the U.S. government.

However, much of their business is being administrated by the Fair Credit Reporting Act.

#6: Sprint

Sprint has 53 million subscribers worldwide, and their database is very expansive. Large

telecommunication companies like Sprint are notorious for having immense databases to keep

track of all of the calls taking place on their network. The database is spread across 2.85 trillion

data insertions (the largest number in the world). 365 million call detail records processed by the

Page 5: The Top Ten Largest Databases in the World

database per day. However, phone information has previously been leaked out of the database,

though.

#7: Google

Google’s database contains virtual profiles of countless number of users, and it contains

all of the words that are used in search terms. Google searches account for more than 50% of all

internet searches. Like the CIA’s database, the size of Google’s database is unknown (due to it

being locked in a vault).

For a search through Google’s database to work, a crawler visits a page, copies the

content and follows the links from that page to the pages linked to it, repeating this process over

and over until it has crawled billions of pages on the web.

#8: AT&T

AT&T’s database contains 323 terabytes of data, and has 1.9 trillion phone call records.

AT&T is so careful with their records that they've maintained calling data from decades ago,

when the technology to store hundreds of terabytes of data was still non-existent. As a former

AT&T customer, I have to say that that’s a very impressive thing to do, because one never

knows when such a call might wind up putting somebody in jail over a crime they committed 20

years ago.

#9: NERSC

The NERSC is comprised of 2.8 petabytes, and is operated by more than 2,000 computer

scientists. Some of the information that’s included on it pertains to simulations of the early

Page 6: The Top Ten Largest Databases in the World

universe, atomic energy research, and more. What distinguishes it from others is its successful

creation of an environment that makes the resources operative for research.

#10: The World Data Centre for Climate

This database is, by far, the largest database in the world! It contains 330 terabytes of

web/climate simulation data, and 6 petabytes of additional data on magnetic tape. The database is

so large, that it has to be hosted on a machine that cost 35 million euros ($46,942,000).

Conclusion

In conclusion, with the immense amount of data that they contain, each of these databases

help the general public find something that they want and/or need in some fashion. More

importantly, though, they set precedence for future databases. They do it through their size, their

accuracy, and the data that they contain. I honestly think that databases will continue to grow in

all three categories, thus providing more and more information to those who will be requesting

for it.

Page 7: The Top Ten Largest Databases in the World

Bibliography

Credit.com. "Credit.com." 12 Questions for ChoicePoint. Web. 25 Feb. 2012.

<http://www.credit.com/credit_information/credit_law/Questions-for-Choicepoint.jsp>.

Dennyson, Robert. "Top 10 Largest Databases in the World." Beyondrelational.com. 01 July

2011. Web. 25 Feb. 2012.

<http://beyondrelational.com/modules/1/justlearned/388/tips/9212/top-10-largest-

databases-in-the-world.aspx>.

"Freedom of Information Act." CIA FOIA. CIA. Web. 25 Feb. 2012.

<http://www.foia.cia.gov/search.asp>.

Google. "Technology Overview - Company." � Technology Overview - Company� . Web. 26 Feb.

2012. <http://www.google.com/intl/en/about/company/tech.html>.

Harris, Craig. "Amazon Database Would Put Shoppers' Intimate Details on the Line."

Seattlepi.com. Seattlepi, 10 Aug. 2006. Web. 25 Feb. 2012.

<http://www.seattlepi.com/business/article/Amazon-database-would-put-shoppers-

intimate-1211419.php>.

Lee, Kevin. "What Is a Database on YouTube?" EHow. Demand Media, 04 Jan. 2012. Web. 25

Feb. 2012. <http://www.ehow.com/info_12217150_database-youtube.html>.

"LG Optimus Slider Aka Gelato Shows up in Sprint Database with September 11 Release Date."

Phone Arena. 13 June 2011. Web. 26 Feb. 2012. <http://www.phonearena.com/news/LG-

Optimus-Slider-aka-Gelato-shows-up-in-Sprint-database-with-September-11-release-

date_id19516>.

Page 8: The Top Ten Largest Databases in the World

"Library of Congress Online Catalogs." Library of Congress Online Catalogs. Web. 25 Feb.

2012. <http://catalog.loc.gov/>.

"Model & Data: World Data Center for Climate (WDCC)." Model & Data: Welcome to the

Model & Data Homepage. 19 Feb. 2008. Web. 26 Feb. 2012.

<http://www.mad.zmaw.de/wdc-for-climate/>.

NERSC. "About NERSC." NERSC: National Energy Research Scientific Computing Center.

Web. 26 Feb. 2012. <http://www.nersc.gov/about/>.

"Top 10 Largest Databases in the World." Focus. Focus, Inc., 2012. Web. 25 Feb. 2012.

<http://www.focus.com/fyi/10-largest-databases-in-the-world/>.