inf 389f—organization of records information

31
1 INF 389F—ORGANIZATION OF RECORDS INFORMATION Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases

Upload: oralee

Post on 17-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

INF 389F—ORGANIZATION OF RECORDS INFORMATION. Professor Fran Miksa November 18, 2003 Data, Metadata, Metadata Formats, and Databases. Data & Databases. Data are strings of characters that record assertions, etc., about something. Data in computers are strings of codes representing characters - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: INF 389F—ORGANIZATION OF RECORDS INFORMATION

1

INF 389F—ORGANIZATION OF RECORDS INFORMATION

Professor Fran Miksa

November 18, 2003

Data, Metadata, Metadata Formats, and Databases

Page 2: INF 389F—ORGANIZATION OF RECORDS INFORMATION

2

Data & Databases Data are strings of characters that record

assertions, etc., about something. Data in computers are strings of codes

representing characters Databases are computer programs that allow

us to manipulate data in computers. IE data are data pertaining to IEs including,

especially, attribute data. IE databases are databases the data of which

pertain to IEs as objects.

Page 3: INF 389F—ORGANIZATION OF RECORDS INFORMATION

3

How Does Data Become Machine - Readable (i.e., computerized?)

Basic question--We want computer to write data into its memory, but how does it do that?

Substitution codes—a basic approach to representing data in computers.

Page 4: INF 389F—ORGANIZATION OF RECORDS INFORMATION

4

Computer Switches as Codes

1 Switch—2 positions (on/off)--How different signals are possible? (21= 2 possible signals)

2 switches—always used together = (22 = 4 possible signals because of 4 possible switch setting combinations)

3 switches = “ “ “ = (23 = 8 “ “ )4 switches = “ “ “ = (24 = 16 “ “ )5 switches = “ “ “ = (25 = 32 “ “ )6 switches = “ “ “ = (26 = 64 “ “ )7 switches = “ “ “ = (27 = 128 “ “ )8 switches = “ “ “ = (28 = 256 “ “ )Of course, in each case, the meanings of the signal switch

combinations have to be agreed upon.

Page 5: INF 389F—ORGANIZATION OF RECORDS INFORMATION

5

Where are Codes Placed?

First, each switch or spot is called a ‘bit’ (BInary digiT) Second, each set of basic bits in a given character

set of codes is called a ‘byte’ Bit codes can be transferred to/triggered/“set” as a

series of “switches” on a computer “chip.” Codes can be represented as magnetized or not

magnetized positions (i.e., spots, locations) on a magnetic surface such as a disk.

The bits of each byte are kept together as a unit.

Page 6: INF 389F—ORGANIZATION OF RECORDS INFORMATION

6

Coding for Colors in Graphics Colors are also encoded the same way,

though the # of bits used for each coding may vary—for example, 8 bit, 16 bit, 24 bit codes for colors.

8 bit color codes mean that each point that is coded has 256 bit combinations to represent all the colors, or all the shades in a “grayscale.”

16 bit color codes have 65,536 bit combinations (in groups of 16 bits), and 24 bit color codes have 16,777,216 bit combinations (in groups of 24 bits).

Page 7: INF 389F—ORGANIZATION OF RECORDS INFORMATION

7

Bits used in Graphics

Pixel = a location in a grid of locations superimposed on a graphic image.

300 pixels to the inch in each dimension of an image yields for a 8” by 5” picture, 2,400 such pixels (locations/spots/dots, etc.) down and 1,500 pixels across, or 3,600,000 pixels total, each of which are coded for a color in a 8 bit, 16 bit, 24 bit code, etc. Formatting of the pixels are known by such names as tiff, jpeg, gif, etc., files.

                                                                                                                                                                                                                                                                                                                                                                                                      

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                

Page 8: INF 389F—ORGANIZATION OF RECORDS INFORMATION

8

Text & Control Characters as Codes

Lower case letters (26 total) Upper case letters (26 total) Numerals (10 total) Special signs . , ; : “ ” ? / < > [ ] { } \ | - _ = + ` ~

@ # $ % ^ & * ( ) (31 total) [93 to here]

Blank space & other special symbols Special codes for computer operation Foreign language special signs

Page 10: INF 389F—ORGANIZATION OF RECORDS INFORMATION

10

ASCII Code-I

Page 11: INF 389F—ORGANIZATION OF RECORDS INFORMATION

11

ASCII Code-II

Page 12: INF 389F—ORGANIZATION OF RECORDS INFORMATION

12

Sequencing Codes in a Computer Space--example 1

Page 13: INF 389F—ORGANIZATION OF RECORDS INFORMATION

13

Sequencing Codes in a Computer Space--example 2

Page 14: INF 389F—ORGANIZATION OF RECORDS INFORMATION

14

Databases

Flat File Databases Relational Databases Data Modeling

– Entity-relationship data models– Object oriented data models

Page 15: INF 389F—ORGANIZATION OF RECORDS INFORMATION

15

Flat File Database

From geekgirls reading-”Databases from Scratch—III”

Page 16: INF 389F—ORGANIZATION OF RECORDS INFORMATION

16

Relatable Tables within the Database

From geekgirls

reading-

“Databases

from Scratch

—III”

Page 17: INF 389F—ORGANIZATION OF RECORDS INFORMATION

17

What Kinds of IE data might be useful?

Names (Persons, Corporate bodies) Titles Dates, Publishers, Places Other physical details of packaging Statements of editions, issues, etc. Topics, genre, audiences, uses Relationships

Page 18: INF 389F—ORGANIZATION OF RECORDS INFORMATION

18

Two Forms of Data

Data that represents IE attributes and is simply recorded in some sequence

Among the foregoing, that data that are used specifically for searching (called access points, index terms, etc.)

Page 19: INF 389F—ORGANIZATION OF RECORDS INFORMATION

19

Metadata & Metadata Formats

Metadata consists of strings of data within computers that record the attributes of informational objects (IEs).

Metadata formats are organized arrangements of categories of metadata

Page 20: INF 389F—ORGANIZATION OF RECORDS INFORMATION

20

Original use of term metadata

Object = Students; Data = attributes of students; Metadata = Data about data.

D = Data; M = Metadata

M

D

Page 21: INF 389F—ORGANIZATION OF RECORDS INFORMATION

21

Use of the term Metadata in Information Organization

When object became an IE, it represented data in and of itself.

Therefore, what would the phrase “Metadata = data about data” mean?

Metadata came to mean, all data inside the computer about

Page 22: INF 389F—ORGANIZATION OF RECORDS INFORMATION

22

Metadata FormatsThe purpose of metadata formats is to “code” metadata in

terms of categories. The Categories have a wide variety of uses (e.g., content

categories, computer instructions, formatting of content as text, etc.)

Some codes are used within databases only and are not generally seen by the information user (e.g., the codes in the MARC format)

Some codes are attached to metadata and text through “markup” in HTML or XML (though they are not usually seen by a user in a browser unless a special switch is clicked).

Page 23: INF 389F—ORGANIZATION OF RECORDS INFORMATION

23

Mark-up Languages A text-processing language which embeds

commands into the text that is to be processed. These commands then instruct a display device or a printer to carry out some formatting.From “Markup language"  A Dictionary of the Internet.

Darrel Ince. Oxford University Press, 2001. Oxford Reference Online. Oxford University Press.   23 September 2003

<http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t12.002053>

Page 24: INF 389F—ORGANIZATION OF RECORDS INFORMATION

24

From “A Gentle Introduction to SGML--http://etext.virginia.edu/bin/tei-tocs?

div=DIV1&id=SG Historically, the word markup has been used to

describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out.

Generalizing from that sense, we define markup, or (synonymously) encoding, as any means of making explicit an interpretation of a text.

By markup language we mean a set of markup conventions used together for encoding texts.

Page 25: INF 389F—ORGANIZATION OF RECORDS INFORMATION

25

From “A Gentle Guide to SGML” (cont’d)

A markup language must specify what markup is allowed, what markup is required, how markup is to be distinguished from text, and what the markup means.

SGML provides the means for doing the first three; documentation such as these Guidelines is required for the last.

Page 26: INF 389F—ORGANIZATION OF RECORDS INFORMATION

26

Specific Markup “Languages”

SGML--Standard Generalized Markup LanguageFor textsDTD--Document-type-descriptionHeader

HTML--Hypertext Markup LanguageA subset of SGML for marking up text for

browsers that is platform independent

Page 27: INF 389F—ORGANIZATION OF RECORDS INFORMATION

27

Specific Markup Languages (cont’d)

XML--Extensible Markup LanguageBased on SGML, but adds the capacity to

define or otherwise insert special categories. HTXML--Hypertext Extensible Markup Language Other Markup languages--e.g., for every special

purpose imaginable--Geography ML, Chemical ML, Gene Expression ML (GEML), Wireless ML, Rule ML (for XML), Theological ML, Bean ML (for JavaBean), etc.

Page 28: INF 389F—ORGANIZATION OF RECORDS INFORMATION

28

Why is a Knowledge of Markup Languages Important for Information Organization?

MLs contain Document Description capabilities.

MLs contain categories that can be used in databases.

At some point, an information organizer must use markup language for displaying information organization data.

Page 29: INF 389F—ORGANIZATION OF RECORDS INFORMATION

29

Metadata Category Codes

No metadata category codes will be useful unless they are consciously deployed in a computer program.

Metadata codes become especially useful for information organization when they are deployed in an IE organization system—i.e., in an IE database.

Page 30: INF 389F—ORGANIZATION OF RECORDS INFORMATION

30

IE Databases

An IE database is an organized structure of metadata that is used for organizing and retrieving IEs in computers.

Organizing and retrieving IEs by means of a database is possible because the database allows us to manipulate the metadata in terms of the categories represented by the metadata.

Page 31: INF 389F—ORGANIZATION OF RECORDS INFORMATION

31

A General Maxim

A professional information entity organizer must understand the place of data, metadata, metadata formats, and databases in his or her work• Their general roles• The particular details of specific systems used.