million books to the web an example of indo-us collaboration lessons learnt & the road ahead...
TRANSCRIPT
Million Books to the WebAn Example of Indo-US Collaboration
Lessons Learnt & The Road Ahead
Prof N. Balakrishnan
Indo-US Workshop on Open Digital Libraries & InteroprabilityWashington, DC
June 23, 2003
Supercomputer Education and Research Centre
Indian Institute of Science
Bangalore India
School of Computer Science
Carnegie Mellon University
Pittsburgh USA
Lessons from the past
• fires of Alexandria – irrevocably severed our access to any of the works of the ancients.
• introduction of printing technology – several Indian and Chinese knowledge disseminated by word of
mouth and on palm leaves virtually disappear or inaccessible
• New cultural revolutions – edifices built by destroying the past irrevocably
– later revolutions seek solace in attempting to preserve what was destroyed
– we need to preserve our heritage independent of the political and social ups and downs
A single wanton act of destruction can destroy an entire line of heritage
Lessons from Reality
In a thousand years:
only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction.
Existing archives of paper many other works still in existence today are rare
- only accessible to a small population of scholars and collectors at specific geographic locations
Contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man
No one can afford to do this, unless the archive is digital
The Approach• Technology Driven Vision• Decide on the stake holders
– Never make it exclusive
• Pilot Projects to perfect technology• Bring in advanced management
concepts – like People Maturity Models – Quality assurance– automate wherever possible
Continued…
The Approach• Lessons from the past
– Too many Digital Library Projects – with half-life of less than 2 years from the date of
“Launch” or a long incubation time– Follow Nike – JUST DO IT
• Digital Library must have two ingredients– A knowledge Amplifier– Free-access, giving avenues for every one to make
economic benefit• still contribute to multiplication of knowledge by circulation
• In India, it should be a test bed for our Language Technology Research– a show case for our heritage
Elements of Technology
• Microprocessors• Memory• Connectivity• Software
All these technologies are growing exponentially
Communication Revolution
If you are amazed at the drop in cost of computing,wait till you see what is going to happen to bandwidth.
Network technology will increase 10-100 times fasterthan processor technology
-Andy Grove, Titan of Intel
Bandwidth will double every year
Network speeds become comparable to interconnect speeds
Death of Time and Distance
Anytime, Anyplace and Anyone
Together, the technology of Computers and Communications Revolutions aim at
The World of Computers & Communication
Small fish eat the Big Fish Microprocessors offer performances
comparable to supercomputers; Paradigm Shift from Dinosaurs to mammals- from performance to functionality
NETWORK is everywhere Web is a preferred medium of communication
for everyone - including the military & the terrorists
Companies that make more and more Software Free – capitalize more- Open archives
Processor of Tomorrow
• Carbon Nano Tubes– 5 to 10 atoms wide – promise to replace silicon soon
• Flexible Transistors– made from plastic, oraganic
materials• Silicon will live for 15 years• Moore’s law will live longer• 1000 times growth in 10 years
The winner will be decided by:Material Convergence + Human Like interactions
Processor of Tomorrow
• A billion Transistors at 10 to 20 GHz Clock rates by 2010
• 128 G Bytes of Main Memory• Terra byte of Disk Storage- may be
Holographic• Speech input/ output ASR• Multiligual• Terrabit connectivity at PC• The DL plans of today must be
sensitive to this
The Road Ahead
ScientificCalculations
Data Analysis
Expert Systems
SuperHumans
Poor
Medium
Rich
Brilliant
KnowledgeContent
Emulating HumanPerformance:
See, Hear, Talk, and “Think”
Bill Joy’sNightmare
Evolution
Nan
osys
tem
s
The future trends:• Browser will be the only medium of
communication.• It will be active- with voice and video,
language independent.• Mobility will be the key.• Small form factor devices such as Palms,
PDAs and Tablets would be the future.• We would soon see TVPCT at the cost of a
TV• We will witness major convergence between
ICT, Nano Technologies and Biological Sciences
Electronic Resources and the Library of the Future
E-mags; E-books; E-music; E-Movies
Dedicated E-book Readers
• Dedicated readers – about 20,000
• Palm devices – 6,000,000• PC’s – hundreds of
millions• “For people accustomed
to reading text on a computer for hours at a time, e-book screen clarity is a non-issue.”
• A low cost E-Book reader design on in India
http://www.eink.com/technology/index.htm
• E Ink is made up of millions of microcapsules– each the diameter of a human hair
• Each microcapsule contains– positively charged white particles &– negatively charged black particles
• that float in a clear fluid
• A film of transistors supplies the voltage to the capsules
• A negative charge makes the white particles move to the top of the microcapsule– an opposite electric field pulls the black
particles to the bottom of the microcapsules, mimicking the effect of print.
• Electronic ink is a real power miser
E-ink/e-paper (Lucent)The technology has been identified and
development is well under wayBy the year 2003, we envision electronic
books • that can display volumes of
information as easily as flipping a page,
• permanent newspapers that update themselves daily via wireless broadcast
• Just as today's books give people easy access to everyday information, tomorrow's books will provide the same easy access to the dynamic data of the information age
The world of publishing will never be the same
Indian Institute of Science’s Simputer
• A hand held Linux Box at around US$ 200• Has the state of the art browser• Color screen• very good speech synthesizer
– In English and many Indian Languages
• A very powerful tool for access with wireless• Soon to be modified as an E-bookwww.simputer.orgwww.picopeta.comwww.ncoretech.com
The Challenges in ComputingTomorrow’s computing
needs are not in mflops and Gflops
The computer to process Information, recognition and DM like a Human
Small inexpensive Robots, swarms will be a reality
Ray Kurzweil:The Age of Spiritual Machines“A $1,000 PC (in 1999-dollars)…
– 2009 = trillion calculations/second
– 2019 = 20 million billion calculations/second (the human brain)
– 2029 = 2 * 1019 calculations/second (1,000 human brains)
Ray Kurzweil:The Age of Spiritual Machines
• 2009: “Computer displays have all the display qualities of paper- high resolution, high contrast, large viewing angle, and no flicker. Books, magazines, and newspapers are now routinely read on displays that are the size of small books.”
• 2009: “At least half of all (business) transactions are conducted online.”
• 2009: “There is effective convergence of all media, which exist as digital objects (that is, files) distributed by the ever-present high-bandwidth, wireless information web. Users can instantly download books, magazines, newspapers, television, radio, movies, and other forms of software to their highly portable personal communication devices.”
Ray Kurzweil:The Age of Spiritual Machines
2009• A $1,000 PC delivers Terahertz speeds• PCs with high resolution visual displays come in a
range of sizes– from those small enough to be embedded in clothing and jewelry – to the size of a thin book
• Cables are disappearing– Communication between components uses wireless technology, as
does access to the Web
• The majority of text is created using continuous speech recognition– Also ubiquitous are language user interfaces.
• Most routine business transactions (purchases, travel, etc.) take place between a human and a virtual personality– Often the virtual personality includes an animated visual presence
that looks like a human face
• 2019: “Reading books, magazines, newspapers, and other Web documents; listening to music; watching three-dimensional moving images (for example, television, movies); engaging in three-dimensional visual phone calls; entering virtual environments (by yourself, or with others who may be geographically remote); and various combinations of these activities are all done through the ever-present communications Web and do not require any equipment, devices, or objects that are not worn or implanted.”
Ray Kurzweil:The Age of Spiritual Machines
2029: “The ever learning Society”• Learning now constitutes the primary focus of
the human species. • Human learning is accomplished using virtual
teachers (and virtual libraries?). • Learning is enhanced by widely available neural
implants, which improve memory and perception but cannot yet download knowledge directly.
• Automated agents are learning, on their own without human assistance. Machines can now create significant new knowledge with little or no human intervention; unlike humans, machines easily share knowledge structures with one another.
Ray Kurzweil:The Age of Spiritual Machines
And Then There Was Music
• RealJukeBox• Win Amp• MP3• Napster
The Growth rates
• The processor performance doubles every 18 Months
• The Network bandwidth doubles every year
• The storage capacity doubles every nine months
• Soon you will have processor bottleneck • 1000 times growth in storage in 10 years
– I already have 250 GB on a single disk-
Recognition verses Recall• Recognition is like seeing your
friend’s face in a sea of faces– even if he has changed since you last saw him– storage intensive and fast
• Recall is like figuring out how to repair your car’s carburetor using a manual and you have never done that before- applying knowledge to a new situation- processor intensive and less storage
• Brian works on recognition• Present day computers prefer recall –
remember the Y2K• Future computers would work like the
brain- recognition
Recognition verses Recall- what it does to our DL
• We will move away from quantitative search (key word match) to “aboutness” and content based retrieval
• In Future the documents will be read more by computers than by humans – will it change the way we write ? Would we think in html or in xml ?
• From mere Text data to 3d Objects, voice and video
• Multiligual• Every conceivable form of knowledge
expression
Technology Driven vision for The Digital Library
• We can store everything– all the knowledge of the human race– in all forms– that is the Universal Digital Library
• Cost of Selection is stationary but storage cost is plummeting
It is not about contents alone- It is about networking of people
Education
Real-time Engineering Science Business
Universities CollegesSchools
3 Ls of Learning1. Face-to-Face Lectures2. Virtual Labs3. Universal Digital Library
Universal Library Vision
All recorded information online• instantly available
– To Anyone– Anywhere in the world – In any language– searchable, browsable, navigable by
humans and machines
Digital Library Contents
• Books• Periodicals (journals, newspapers)• Art, photographs• Databases, software• Movies, video• Music, opera, danceSuppose all of this were on the Web
Digital Library of the future
•Digital library•Digital museum•Digital tour guide•Research assistant• Knowledge amplifier
Can we store all the human knowledge in a Digital formThere are about 100 Million books written by the
human raceMultiply by 10 for all other form of knowledge1 book = 500 pp. = 1 MB uncompressed
– 109 books = 1015 bytes = 1 petabyte
140 million computers on the Internet– At 20 GB free space each >2.8 Zetabytes
now
1 GB of disk costs ~$1– 1 petabyte < $1 million– Our Peta Byte server Initiative– Storage is not the limitation but creation
and coordination are– Avoiding Duplication and connectivity are
Universal Digital Library
• More than 120 million PCs on the net• Each having atleast 20 GB of free
space• Peer to peer Communication• Can we store all the Human
Knowledge in the computers
This is todayThe time consuming process is taking the printed books to the web- The technology
is not an impediment
Technology Driven Vision for the Universal Digital Library• A vision to store everything that the
human race ever produced• A mission to digitize 1 Million Books
and make them freely available
The Strategy for Scanning of books• A planetary Scanner like the Minolta PS 7000• Takes about two hours to scan a 500 page
book, crop, OCR and convert it to TIFF, HTML and XML files
• About 10, 000 pages to the web in a day• Storage per book is around ~ 60MB• 100 Tera byte is not an issue• Our Partner Internet Archives has 370 TB
adding 30 TB a day• Distributed data bases
Identification ofBooks
Pre-Scanning process
Process InvolvedProcess Involved
ConversionProcess
Scanning Process
Image Processing
Process
Scanning
•2 pages at a time •Stored in tif format•2 pages at a time
•Stored in tif format
Post scanning operations
• Skew Correction• Document Registration• Dot Shading and Speck Removal• Image centering• Image Cropping• Smoothing and Completion
Image comparison
Original Image
Processed ImageSW 1
OCR CONVERSION
Performance evaluation for various fonts in Kannada language OCR
Series1: Average performance efficiency before using the cropping software.
Series2: Average performance efficiency after using cropping software.
The Digitized book
• Average book size ~ 500 Pages• Size of Page as Image ~ 50-150
KB • Size of Page as text file
(rtf /htm) ~ 8 – 15 KB• Average size of Digitized book ~
60MB
Brightness – Dark(1 in scale) and contrast – 9(in scale)
Original image
Cropped image
Million Books to the web- Stake holders as Partners
• Academia- CS, IS and users• Researchers and Language
Technologists• Cultural and Religious
Organizations• Public Libraries• Government Agencies• None too exclusive
Background and Status
• Collaborative Project between India and US• Lead roles by CMU and IISc• Initiated by CMU sending scanners free of cost to
India. NSF supported• Initiated by the Office of the Principal Scientific
Advisor to GOI by a Seed funding to IISc• Fuelled by MCIT’s whole hearted support• More than 16 centres in academic, religious and
government institutions spread across the country• 69 scanners in place• China, Egypt (Alexandria Library), Srilanka,
Australia joining in• There is light on the other side of the tunnel
Hubs of DL Activities in India
Anna University, Chennai, Tamil NaduArulmigu Kalasligam College of Engineering, Srivilliputur, Madurai, Tamil
NaduGoa University, GoaIndian Institute of Information Technology, Allahabad, Uttar PradeshInternational Institute of Information Technology, Hyderabad, Andhra
PradeshCity and State Central Library, Andhra PradeshShanmugha Art, Science, Technology & Research Academy, Thanjavore,
Tamil NaduSringeri Mutt, Sringeri, KarnatakaTirumala Tirupathi Devasthanams, Tirupathi, Anadhra PradeshMahastrastra Industrial Development Corporation, MaharastraUniversirty of Pune, PuneKanchi University, Kanchi, Tamil NaduIndian Institute of AstroPhysics, Karnataka
Scanner Operation at Hubs
2 1 2 1 1 1
10
53 4
2 13
5
40
05
1015202530354045
Progress of Various Centre in Scanning
1704
10311097
2000
504 465 273 158
6276
3042
0500
100015002000250030003500400045005000
IISc
AK
CE
SA
ST
RA
TT
D
MID
C
PU
NE
AU
Kanchi
CC
L
SC
L
Centre
No.
of
Boo
ks
8377
08
1589
33 4514
52
5000
00
1341
00
9733
4
1525
02
3939
5
1319
001
1080
759
0
200000
400000
600000
800000
1000000
1200000
1400000
Centre
No.
of
Pag
es
Number of Pages Scanned
Category of Books
2962
5596
836
430176 168
384
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Engl
ish
Telu
gu
Tam
il
Sans
krit
Kan
nada
Oth
ers
Urd
u
EnglishTeluguTamilSanskritKannadaOthersUrdu
Cumulative Status
4771184
16550
Books Pages
More Centres and Initiatives-Already 61 scanners in operation+ 39 in the pipe line
• Rashtrapathi Bhavan• Punjab Technical University• IIIT Hyderabad and University of
Hyderabad
MCIT’s Initiatives
• Mobile Van with VSAT for the Book Mobile• ERNET providing connectivity to all centres• Many Centres supported with funds for
computers and for scanning operations• Total spending from Government support
and from Scanning Centre’s resources is ten times more than the Scanning equipment cost and effectively 100 times more
• Support from all quarters of the government, religious leaders, academia and private agencies
• Universal Digital Library of India to be launched
Some Observations and the Road
ahead• More than 5 million pages have been
scanned• The highest average rate of sustained
scanning was about 4,000 pages per day at Hyderabad during February.
• Our goal is to establish best practices to reach 6000 pages a day
• 3 years – 1 M Books• By 2020 – 20 Million Books, 2 Million
Songs, 200,000 Movies • The most enviable content creation
Road Ahead
• Establishing the Digital Library of India on the same lines as the E-Governance Initiative
• Under the MCIT• Head Quartered in AP• A think tank for content selection,
delivery, technology and policy directions for the country
• Creation of special funds for 4C
Criteria for Selecting Mega Centres- 5 of them planned
• Geographical Distribution• Availability of contents of interest to
larger user base• Local enthusiasm to support and
sustain this activity• Budget of US$ 200,000 Initially and
around 0.5 cent per page of output• One single scanner can produce 2
Million pages a year-• We will have 300 scanners – a Million
books a year
Raod Ahead
• Mega Content Creation Centres • New Delhi, Varanasi, Allahabad,
Hyderabad, Far east (Tawang or Guahathi), Kolkotta and Chennai
• Each Centre having around 40 scanners and 5 mobile scanners
• Content Creation Centres with upto 5 scanners in Gujarat, Rajasthan so as to cover the entire country
• Spearheading Language Technology Initiatives
• Adding voice and video of our heritage
Universal Digital Library
• Goal — To have all public knowledge online, available for free to all, everywhere
• An achievable goal– There are only some 100,000,000 books in the world– A few billion dollars could bring these online
• Limitations– Copyright and licensing issues– Different language books and character recognition
technologies• We must ensure that English is not necessarily the de facto
language
• Universal Library
TECHNOLOGICAL CHALLENGES
• Input (scanning, digitizing, OCR)• Data representation
– text, notations, images, web pages
• Navigation and Search• Multilingual Issues• Output (voice, pictures, virtual
reality)• Synthetic Documents
SEARCH ENGINE of UDL• Very powerful light weight and
scalable CMU search engine• Greenstone• Both are working and are being
evaluated for the choice• Both have been modified for use as
Indian Language search engines- language independent search
• Future- Semantic web and content based retrieval – Speech input and speech output
SearchEngine
TimeTaken
Boolean Proximity Case Stemming
Greenstone Not depending on the number of hits
OR & NOT
Default :AND
Phrase searching
User can select the
option
Stemming allowed
UDL Highly depending on the number
of hits
OR Default :AND
No No Case Sensitivity
Not available
COMPARATIVE ANALYSIS – GREENSTONE Vs UDL SEARCH ENGINES
Choice of Collection• Use books from libraries that are
beyond copyright• Administrative metadata from OCLC,
ISBN, and other sources• Dublin Core for Indian Books• A Copy Right Metadata – aggressive
attempts to obtain copy right- Free Copyright from many agencies including GoI
• Source Library Metadata• Converge towards focussed collection
Funding – Road Ahead• Funding effort must be an organized activity• Commercial funding unlikely for “public good”
activity– Must go to governments, NGOs
• World Bank• Qatar (if CMU deal succeeds)• Benefits of UDL:
– Digital Opportunity– Use in distance education– International involvement – cultural diversity– Technology dissemination– Low cost v. conventional libraries
• Funding is tied to Outreach (next slide)
Outreach• The UDL message must be disseminated• Present at World Summit (WSIS) in
Geneva (12/03)• Pre-WSIS meeting at CERN (12/03)• Establish liaison with UN Decade of
Literacy (2003-2013)• Points:
– Terabyte servers– “Free to read” policy– Universal Dictionary (applicability to other
domains)
Access by Public
• All content free to read, print one page at a time
• Restrictions imposed by donors will be respected
• Categories of use will be recognized, e.g. cannot print entire document
• Buttons, links to fulfillment houses and publishers are allowed- to take in “born Digital” copyrighted material
Partner Relations- Future• All material scanned or input as part of
the UDL will be shared by all partners• Preference for national umbrella
organizations to simplify international partner relations
• Relationships between partners and their national DLs encouraged
• Online communication and collaboration tools needed to facilitate partner questions and interchanges
• Written partnership agreement will be made
Standards• Published standards within the UDL• Quality control and testing standard • Funding to be sought to support
standards development• Logo to be developed (graphic device
without words). Must appear on all sites, all pages
• Logo should have a hot link to a gateway site that links all UDL sites
• Local variability in look and feel of sites is permitted so long as the logo is displayed
Scanning/OCR Policy• We scan what gives greatest
impetus to continued funding• Language: majority of content in
English; otherwise no restriction• Scans will be previewed for
minimum quality; OCR will not be corrected unless local site desires
Metadata
• All entries MUST have metadata according to MARC or Dublin Core
Copyright• Public domain materials: no restrictions,
tools for printing entire document provided• Works of uncertain copyright status:
– Good faith effort to determine status, locate owner– Scan and index work– After a waiting period (at least one month), make
work viewable
• Archival material (old but unique)– Allow resolution restriction to avoid devaluation of
original
• Out-of-print in-copyright (OPIC)– Seek blanket permissions from publishers
Possible Intake Model
CMUUL SERVER
INDIACENTRAL
MIRROR SITE
ENGLISHINTAKE
SCANNINGCENTER
SCANNINGCENTER
TAMILINTAKE
LOCALMATERIALS
SCANNINGCENTER
GUJARATIINTAKE
LOCALMATERIALS
HINDIINTAKE
SCANNINGCENTER
LOCALMATERIALS
ARTINTAKE
SCANNINGCENTER
CHINESEMIRROR SITE
AUSTRALIANMIRROR SITE
INDIA
OUTSIDEINDIA
The Digital Library a Test Bed for language research
• Rich data in many languages from the Million Books to the web Project - atleast 10,000 books in any language
• Translations in many languages- Gita, NBT, NCERT etc- an excellent tool for language translation-
• Training data for the OCR• The case insensitive ITRANS standard
The Digital Library a Test Bed for language research
• Rich data makes the creation of OCRs in Indian languages easy- In Tamil, Kannada and Malayalam – A rapid prototyping
• Speech synthesis and recognition• Indian Language Search Engines• Example Based Machine Translatio
n• Universal Dictionary
Word English POS Pron Use Lang
danúbia linen tape HUNdanum water PMPdanun early PMPdanup hunger PMPdanup hunger, starvation PMPdanupan hungry, starving PMPdaný existent SLOdaný existing SLOdaný given SLOdaný číslom numerical SLOdaný na pospas obnoxious SLOdanyag landscape n HILdaog overturn v CEBdaog prevail v CEBdaogdaog manhandle v CEBdaong boat with a covered cabin, ark TAGdaong bring the ship to shore TAGdaot harm v CEBdaot mar v CEBdaotan bad adj CEBdaotan'g buut dislike n CEBdaotan'g hitabo mishap n CEBdaotan'g tinguha malice n CEBdaotan'g tuyo malice n CEBdapa granary n CEBdapa lie flat on stomach or face
down PMP
dapa lie flat on stomach or face down
TAGdapače on the contrary adv BOSdapadnúť (na nohy)
to land SLOd'apaiser to appease v FRE
HUNGARIAN
KAMPAMPANGAN
SLOVAK
HILIGAYNON
CEBUANO
TAGALOG
BOSNIAN
FRENCH
The Universal Dictionary
Aboutness Hierarchy- Dr Shamos Universe
Word
Sentence
Paragraph
Section
Chapter
Collection
BookNewspaper
Article
Photograph
Object
3D Artifact
Glyph
KEYWORD SEARCHINGOCCURS HERE
SUBJECT SEARCHINGOCCURS HERE
Legal and Business Challenges• Use of copyrighted material• Economics (Who pays? Who
gets?)• Privacy• Reliability of information• Change in the nature of teaching• Change in the nature of
Information creation and use
Philosophy of Copy Right Laws
• Protect the Inventor so that private investments in R & D would flow
• Disseminate the information so that society grows
• Protect the fairuse• Ensure you get what you paid
for
What can be copyrighted ?
• Must be tangible, e.g. a lecture can’t be copyrighted, a transcript of it can
• Work must be original
• Work must be creative - even minimal efforts usually count as creative
Fair use doctrine
Authorizes any person to make fair use of a published or unpublished copyrighted work (including the making of unauthorized copies) in these contexts:
In connection with criticism of or comment on the work
In the course of news reporting For teaching purposes or As part of scholarship or research activity
Four basic Factors:
1. The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
2. The nature of the copyrighted work3. The amount and substantiality of
the portion used in relation to the copyrighted work as a whole; and
4. The effect of the use upon the potential market for or value of the copyrighted work
www.library.org principles
1. Scholarly and government information and knowledge is a public good
• that should be available, maintaining the balance of the rights of the individual creator vs. the needs of the public
2. The Library is the intellectual crossroads of the community.
3. Librarians will conceptualize and ensure
• implementation of innovative new systems• for the creation and dissemination of information
for succeeding generations.
“This rule provides that the first sale of a copy of a work to a member of the public ‘exhausts’ the rights holder’s ability to control further distribution of that copy. A library is thus free to lend, or even rent or sell, its copies of books to patrons”
How does this work in the Digital World ?
Music, Movie and Entertainment Industry
• Much larger part of most of the economies
• Large production costs• Need to protect business interest• Need to technology to protect • NAPSTER – peer to peer communication• DeCSS• NAPSTER for video ??• Consumer is different from the creator
New paradigms in the Digital Library
• Should the laws used for protecting commercially attractive enterprise such as patents, music, entertainment be applied to DL
• The dissemination of information creates multiplication unlike in music etc
• Shorter life cycles for the information
Copyright Conflicting requirements
Need to protect the financial interests of creators in order to encourage private investments to the economy
Need to create a framework for every human being to create
The 2nd principle should dominate in DLThe 1st principle should dominate the
others
The Concept of FourC
The scientific community is the only one that is creator and consumer of information
It pays for both The SW Industry had shown
the way for freeware Can we do it in Scholarly
communication, text books etc.
The Concept of FourC
In the 20th Century, in the interest of public good the Governments created BBC, PBS, AIR and also the Public Library System- provided compensation for artists and writers while providing free access to public
Total Global Expenditure in public broadcasting and public libraries exceed 100 B$
Look at our kings who supported all the poets and scholars
We need to find the 21st Century equivalent of BBC, AIR and PBS.
The Concept of FourC
Learn from NAPSTER- will we have a video equivalent of NAPSTER
It is impossible to police and protect IP Rights at gigabit rate connections
Some countries and WIPO under pressure from lobbying groups form the draconian Copy Right Laws
Remember the FAIR USE Doctrine- and what the creators want- recognition and compensation
The Solution -FourCConsortium for Compensation of Creative
Contents- FourCSet aside 25% of the current national
expenditure on public broadcasting and PLsAuthors are encouraged to put the work on
the web after a few years of commercial exploitation- many models- in return get tax excempt etc.
India showing the way IASc and INSABooks out of printTitanic effectAuthors Can take back the Copy right
The Solution -FourC
Authors compensation based on the hits
Future versions of text books may be FAQs and XMLised-
Many eceonomic models- Can work for Courseware as well
The Solution -FourC
The changing trend in publications- we want the documents to be readable by the machines as well humans
Born digital documentsCan we compensate those for
creating contents for the webCan we compensate those who create
music and movies for the web- really small form factor – small screens
• Knowledge multiplies whenever bits are circulated on the web
• Technology has a habit of creating a problem (by knowledge explosion) and spending the rest of its time in trying to solve it- through Digital Library
• The Universal Digital Library with 20 Million Books by 2020 – A year our President dreams India to become a developed nation
• A FourC Policy and a Digital Library Act are in the anvil in India to meet this mission
• If a billion people sneeze- together we can create a Hurricane
• With the technology of the two nations we will convert this hurricane into useful energy and light up the world of knowledge
ConclusionConclusion
• If you are creating a digital library, it should be for access by anyone, anytime and from any place
• If Your Digital Library Is For Exclusive Use, Let Us Talk About Weather
• There Is Nothing Called, Your DL, My DL
– It Is Our DL– The Universal Digital Library
It happens only in
India