inf 389f—organization of records information
DESCRIPTION
INF 389F—ORGANIZATION OF RECORDS INFORMATION. Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases. Data & Databases. Data are strings of characters that record assertions, etc., about something. Data in computers are strings of codes representing characters - PowerPoint PPT PresentationTRANSCRIPT
1
INF 389F—ORGANIZATION OF RECORDS INFORMATION
Professor Fran Miksa
November 18, 2003
Data, Metadata, Metadata Formats, and Databases
2
Data & Databases Data are strings of characters that record
assertions, etc., about something. Data in computers are strings of codes
representing characters Databases are computer programs that allow
us to manipulate data in computers. IE data are data pertaining to IEs including,
especially, attribute data. IE databases are databases the data of which
pertain to IEs as objects.
3
How Does Data Become Machine - Readable (i.e., computerized?)
Basic question--We want computer to write data into its memory, but how does it do that?
Substitution codes—a basic approach to representing data in computers.
4
Computer Switches as Codes
1 Switch—2 positions (on/off)--How different signals are possible? (21= 2 possible signals)
2 switches—always used together = (22 = 4 possible signals because of 4 possible switch setting combinations)
3 switches = “ “ “ = (23 = 8 “ “ )4 switches = “ “ “ = (24 = 16 “ “ )5 switches = “ “ “ = (25 = 32 “ “ )6 switches = “ “ “ = (26 = 64 “ “ )7 switches = “ “ “ = (27 = 128 “ “ )8 switches = “ “ “ = (28 = 256 “ “ )Of course, in each case, the meanings of the signal switch
combinations have to be agreed upon.
5
Where are Codes Placed?
First, each switch or spot is called a ‘bit’ (BInary digiT) Second, each set of basic bits in a given character
set of codes is called a ‘byte’ Bit codes can be transferred to/triggered/“set” as a
series of “switches” on a computer “chip.” Codes can be represented as magnetized or not
magnetized positions (i.e., spots, locations) on a magnetic surface such as a disk.
The bits of each byte are kept together as a unit.
6
Coding for Colors in Graphics Colors are also encoded the same way,
though the # of bits used for each coding may vary—for example, 8 bit, 16 bit, 24 bit codes for colors.
8 bit color codes mean that each point that is coded has 256 bit combinations to represent all the colors, or all the shades in a “grayscale.”
16 bit color codes have 65,536 bit combinations (in groups of 16 bits), and 24 bit color codes have 16,777,216 bit combinations (in groups of 24 bits).
7
Bits used in Graphics
Pixel = a location in a grid of locations superimposed on a graphic image.
300 pixels to the inch in each dimension of an image yields for a 8” by 5” picture, 2,400 such pixels (locations/spots/dots, etc.) down and 1,500 pixels across, or 3,600,000 pixels total, each of which are coded for a color in a 8 bit, 16 bit, 24 bit code, etc. Formatting of the pixels are known by such names as tiff, jpeg, gif, etc., files.
8
Text & Control Characters as Codes
Lower case letters (26 total) Upper case letters (26 total) Numerals (10 total) Special signs . , ; : “ ” ? / < > [ ] { } \ | - _ = + ` ~
@ # $ % ^ & * ( ) (31 total) [93 to here]
Blank space & other special symbols Special codes for computer operation Foreign language special signs
9
Character Codes
ASCII, EBCDIC, – See “A Brief History of Character Codes”– <http://tronweb.super-nova.co.jp/
characcodehist.html>
10
ASCII Code-I
11
ASCII Code-II
12
Sequencing Codes in a Computer Space--example 1
13
Sequencing Codes in a Computer Space--example 2
14
Databases
Flat File Databases Relational Databases Data Modeling
– Entity-relationship data models– Object oriented data models
15
Flat File Database
From geekgirls reading-”Databases from Scratch—III”
16
Relatable Tables within the Database
From geekgirls
reading-
“Databases
from Scratch
—III”
17
What Kinds of IE data might be useful?
Names (Persons, Corporate bodies) Titles Dates, Publishers, Places Other physical details of packaging Statements of editions, issues, etc. Topics, genre, audiences, uses Relationships
18
Two Forms of Data
Data that represents IE attributes and is simply recorded in some sequence
Among the foregoing, that data that are used specifically for searching (called access points, index terms, etc.)
19
Metadata & Metadata Formats
Metadata consists of strings of data within computers that record the attributes of informational objects (IEs).
Metadata formats are organized arrangements of categories of metadata
20
Original use of term metadata
Object = Students; Data = attributes of students; Metadata = Data about data.
D = Data; M = Metadata
M
D
21
Use of the term Metadata in Information Organization
When object became an IE, it represented data in and of itself.
Therefore, what would the phrase “Metadata = data about data” mean?
Metadata came to mean, all data inside the computer about
22
Metadata FormatsThe purpose of metadata formats is to “code” metadata in
terms of categories. The Categories have a wide variety of uses (e.g., content
categories, computer instructions, formatting of content as text, etc.)
Some codes are used within databases only and are not generally seen by the information user (e.g., the codes in the MARC format)
Some codes are attached to metadata and text through “markup” in HTML or XML (though they are not usually seen by a user in a browser unless a special switch is clicked).
23
Mark-up Languages A text-processing language which embeds
commands into the text that is to be processed. These commands then instruct a display device or a printer to carry out some formatting.From “Markup language" A Dictionary of the Internet.
Darrel Ince. Oxford University Press, 2001. Oxford Reference Online. Oxford University Press. 23 September 2003
<http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t12.002053>
24
From “A Gentle Introduction to SGML--http://etext.virginia.edu/bin/tei-tocs?
div=DIV1&id=SG Historically, the word markup has been used to
describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out.
Generalizing from that sense, we define markup, or (synonymously) encoding, as any means of making explicit an interpretation of a text.
By markup language we mean a set of markup conventions used together for encoding texts.
25
From “A Gentle Guide to SGML” (cont’d)
A markup language must specify what markup is allowed, what markup is required, how markup is to be distinguished from text, and what the markup means.
SGML provides the means for doing the first three; documentation such as these Guidelines is required for the last.
26
Specific Markup “Languages”
SGML--Standard Generalized Markup LanguageFor textsDTD--Document-type-descriptionHeader
HTML--Hypertext Markup LanguageA subset of SGML for marking up text for
browsers that is platform independent
27
Specific Markup Languages (cont’d)
XML--Extensible Markup LanguageBased on SGML, but adds the capacity to
define or otherwise insert special categories. HTXML--Hypertext Extensible Markup Language Other Markup languages--e.g., for every special
purpose imaginable--Geography ML, Chemical ML, Gene Expression ML (GEML), Wireless ML, Rule ML (for XML), Theological ML, Bean ML (for JavaBean), etc.
28
Why is a Knowledge of Markup Languages Important for Information Organization?
MLs contain Document Description capabilities.
MLs contain categories that can be used in databases.
At some point, an information organizer must use markup language for displaying information organization data.
29
Metadata Category Codes
No metadata category codes will be useful unless they are consciously deployed in a computer program.
Metadata codes become especially useful for information organization when they are deployed in an IE organization system—i.e., in an IE database.
30
IE Databases
An IE database is an organized structure of metadata that is used for organizing and retrieving IEs in computers.
Organizing and retrieving IEs by means of a database is possible because the database allows us to manipulate the metadata in terms of the categories represented by the metadata.
31
A General Maxim
A professional information entity organizer must understand the place of data, metadata, metadata formats, and databases in his or her work• Their general roles• The particular details of specific systems used.