ed 094 758 18 000 958 reintjes, j. francis; marcus

63
DOCUMENT 'RESUME ED 094 758 18 000 958 AUTHOR Reintjes, J. Francis; Marcus, Richard S. TITLE Research in the Coupling'of Interactive Information Systems. Final Report. ESL-FR-556. INSTITUTION Massachusetts Inst. of Tech., Cambridge. Electronic Systems Lab. SPONS AGENCY National S6iende Foundation, Washington, D.C. REPORT NO ESL-FR-556; MIT-OSP-80720 PUB DATE 30 Jun 74' NOTE 62p. EDRS PRICE ,MF-$0.75 HC -$3.15 PLUS POSTAGE 'DESCRIPTORS *Computers; Data Bases; *Information Networks; *Information Retrieval; Information Science; Information Services;'*Information Systegis IDENTIFIERS ARPANET; Compatibility; MEDLINE; Network Interfaces ABSTRACT This reported research centered on development of the concept of a translating computer interface by which the networking of heterogeneous interactive information systems may be achieved during the period in which information retrieval system and network' standards are evolving. The particular concepts and techniques investigated are'thevirtual system concept, a common command language, a master index and thesaurus, and.a common bibliographic data structure. In addition to the theoretical study of the problem, -an experimental interface haS been developed that connects the MEDLIgE and Interex retrieval system via ARPANET communication links and that performs some of the networking functions of the virtual system. (Author/WH)

Upload: others

Post on 17-Mar-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

DOCUMENT 'RESUME

ED 094 758 18 000 958

AUTHOR Reintjes, J. Francis; Marcus, Richard S.TITLE Research in the Coupling'of Interactive Information

Systems. Final Report. ESL-FR-556.INSTITUTION Massachusetts Inst. of Tech., Cambridge. Electronic

Systems Lab.SPONS AGENCY National S6iende Foundation, Washington, D.C.REPORT NO ESL-FR-556; MIT-OSP-80720PUB DATE 30 Jun 74'NOTE 62p.

EDRS PRICE ,MF-$0.75 HC -$3.15 PLUS POSTAGE'DESCRIPTORS *Computers; Data Bases; *Information Networks;

*Information Retrieval; Information Science;Information Services;'*Information Systegis

IDENTIFIERS ARPANET; Compatibility; MEDLINE; NetworkInterfaces

ABSTRACTThis reported research centered on development of the

concept of a translating computer interface by which the networkingof heterogeneous interactive information systems may be achievedduring the period in which information retrieval system and network'standards are evolving. The particular concepts and techniquesinvestigated are'thevirtual system concept, a common commandlanguage, a master index and thesaurus, and.a common bibliographicdata structure. In addition to the theoretical study of the problem,

-an experimental interface haS been developed that connects theMEDLIgE and Interex retrieval system via ARPANET communication linksand that performs some of the networking functions of the virtualsystem. (Author/WH)

U.J

A

I

June 30, '1974 Report ESC-R-556

MIT-OSP Proiect.80720 \.,`NSF Grant GN-36520 ,

RESEARCH IN THE COUPLING OF INTERACTIVE

INFORMATION SYSTEMS FINAL REPORT

J. Francis Reintjes

Richard S. Marcus

Hccironic Swems Laburator)

I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY, CAMBRIDGE. MASSACHUSETTS um

Dr pdritural of frit di optsccfmg

CO

CD

w June 30, V.974' Report ESL -FR -556

RESEARCH IN THE COUPLINGOF

INTERACTIVE INFORMATION SYSTEMS

FINAL REPORT

J. Francis ReintjesRichard S. Marcus

U1 DI PA4TMENT Of NEAL TolDUCATION I WELFARE

NATIONAL ...SMUT( OfE Du( A TION

11,11 1V1 N1 1111 WI 1,o,1 . 1 A AL 1+1 e 1 .1 1 1 1,11 A..

off PI . AN /A 141,4 .114

,11, 111 ...11 4111. nu wi Pwl

The research reported herein was made possiblethrough the support extended by the NationalScience Foundation through Grant GN-36520.

Electronic Systems LaboratoryDepartment of Electrical EngineeringMassachusetts Institute of Technology

Cambridge, Massachusetts 021 39

JACKNOWLEDGEMENT

.The authors would like to acknowledge the efforts

of several project co-workers: Dr. Charles W. Therrien

and M. Don atntor in the area of computer systems;

Mr. Alan Benenfeld in the area of bibliographic data

structures; Dr. Peter Kugel in the area of reei.1:6al

language analysis; and Messrs. Charles.E. Hurlburt,

Leonard Goodman, and Hsin-Kuo Kan in the area of computer

/ programming and analysis.

We would also like to acknowledge the support and

cooperation of Mr. Davis McCarn and Mrs. Barbara Sternick

of the staff of the National Library of Medicine in our

efforts to couple, experimentally, MEDLINE to our local

interface.

ABSTRACT

This report describes results of an 18-monthresearch effort in the coupling of interactive informationsystems. The research has' centered on development of the

-concept of a translating computer interface by which thenetworking of heterogeneous interactive information systemsmay be achieved in the period during which I-R system andnetwork standards are evolving. Particular concepts andtechniques which have been investigatld are:. (1) thevirtual system concept by which users perceive the networkas a single homogeneous system; (2) a common commandlanguage synthesized from a basic language of primitive I-Rfunctions; (3) a master index and thesaurus which stores thevocabularies of the separate data bases along with indexterm interrelationships and counts; and (4) a common biblio-graphic data structure in which the data elements forbibliographic information are hierarchically structured andinterrelated among different data bases. In addition to thetheoretical study of the problem, an experimental interfacehas been developed that connects the MEDLINE and Intrexretrieval system via ARPANET communication links and thatperforms some of the networking functions of the virtualsystem.

ii

I. INTRODUCTION

This report describes results obtained under National

Science Foundation Grant GN-36520, entitled Research in

the Coupling Of Interactive Information SysteMs', The

period of the grantwas January 1, 1973 through June 30, 1974.

The research was motivated by our concern that the

information systems that are beginning to appear in

operational environments will not be effectively utilized

because of an inability of users and information specialists

to master them.' Degradation in the quality of service is

likely to occur because of the many differences that exist

among the systems and the time required to gain an

underdtanding of their specialized features.

Under the present state of affairs, in which each

information system has its own unique features and is

accessed in accordance with its own set of special rules,

it is out of the question to expect users themselves to

make eilicient use of the various systems. ThcFre are just.

too many subtle procedures to master in a short time. We

have even observed that considerable training; of informa-

tion sTecialieNt is required to brine, them to a MO level

2

of proficiexty and we foresee an upper limit to the

number of disparately designed systems the specialists

will be able to handle.

It is clear that until such time as the user

community becomes highly proficient in online-access

pr educes, information specialists must be on hand td

serve user_needs. Although it may turn out that a certain

percentage of users will always want to delegate search

responsibiXities to the specialist, one would like to move

in the direction 9f se'lf- service as aqueans of getting

the user involved directly in the solution of his own

informational problem, thereby reducing'his costs and

improving personal satisfaction. This will not be_

possible unless system access is made simple and straight-

forward.

Hence, future courses of action become obvious;.

either uniform standards must be adopted for all systems,

or computerized interfaces must be developed which

accommodate nonuniformities and re-present them to the

user in a single, standardized form. It is along the

latter line that we haiie been working.

3

I.C. DISCUSSION : THE NEED FOR A NETWORK OF HETEROGENEOUS

INFORMATION SYSTEMS

A. Recent Advances in Interactive Retrieval Systets.

A number of interactive bibliographic information retrieval

systems, have been developed in recent years.*

This type of

online computer system has been widely acclaimed by users

for rapid and easy access to large data bases of bibliographic

references. The economic viability of these systems is

attested to by their continued growth and by the fact that a

number of commercially sponsored systems are currently

**available. In fact, it is now possible togain access

from most points in this country at costs ranging from about

$6 to 00 per connect hour to these retrieval systems. These

A collection of descriptions of several of these systemsis found in Walker, Donald E. (ed), Interactive BiblOgraphicSearch: The User/Computer Interface, AFIPS Press, Montvale,N.J., 1971

Economic viability is diseused in the following two_papers.:

C.W. Therrien and J.F. Reintjes, Modeling of Informa-tion Systems; Proceedings of the Sixth Annual PrincetonConference on Information Sciences and Systems, PrincetonUniversity, March, 1972

Davis B. McCarn and Joseph Letter, On-Line Servicesin Medicine and Beyond, Science, Vol. 181, No. 4097,27 July 1973, pp 318-324

syStems contain in the aggregate references to documents

numbering in the millions in such subject areas as

chemistry, aeronautics and astronautics, education, agricul-

ture,'nuclear science,.toxicology, medicine, engineering,

and environmental studies as well as data bases covering

several subject areas for such document

articles, government-sponsored reports,

cataloged monographs, .and news articles.

(types as journal

Library of Congress

B. Limitations of Present Systems. A major

limitation of current systems is the size of the data base

that can be stored online. A data base containing bibliol

graphic information for a million documents is aboutjhe

maximum size for effective online operation with current

hatdware/software environments for single computer systems.

HOwever, a collection of this size represents a very few00

dOcuments when measured against the total amount of pub-

lished literature. In particular, a million documents

will cover the literature of a single discipline fof

example, chemistry -4-- for only a very few years.

One might argue tnat most researchers work only in a

fairly narrow area and could be adequately served by a data

base of the size of a million documents. There are

several problems with this kind of argument. In thefAk

place, the information needs, of users are often most

critical in areas outside their own specialty. Thus, for

example, when starting out in a new aspect or when seeking

an experimental devide for instrumentation, a-researcher

may have more need for information than when-working strict-.

ly in his own specialty. In Project Intrex we found that111.

there were many more serious users who were from outside

the 5 specialties for which the-data base was se.lected than

there were from within thosekspecialties. Also; there seems to

be a growing trend toward interdisciplinary activity with

the concomitant need for multiple data bases. Even users of

systems with many hundreds of thousands of documents regular-

ly ask for a broader coverage. Then, too, much of-the use

of tbese systems is by information specialists 'acting as

delegated searchers; these specialists may have many clients

from different subject areas and, hence, a need for a

multiplicity of large data bases.

Another limitation on current syStems is their

capacity in terms of number of simultaneous online users.

See Project Intrex Semiannual Activity Report15 September 1971, Massachusetts Institute of Technology,pp 6-6. PB 202 860.

6

This capacity is usually numbered in the tens wheieasr

there is a potential for thousands of simultaneous users,

even if only the United States is considered.

C. .The Ultimate Uniform I4etwork Solution. Ultimately,

.,the solution to these problems may necessitate the"construc-

tion of a large-scale, on-line information retrieval net-.

work made up tof many similar.-- preferably identical

computer nodes, each node being associated with an online

to base of a millioh or more documents on a separate set

of topics. For maximum efficiency users connected to each

node --- there might be several hundred online users at

any one time-- would make requests in a common retrieval

language. Such requests would lead to parallel searching

of e data bases whi4h would orgLiized

within a standard file,structure.' Intercommunication among

computer nodes would be accomplished over high-speed.

communication lipls for hich data-concentrator techniques

would be employed to gain-further efficiencies and to

"reduce response times.

Thus, in order to achieve economies of scale,

with data bases created only once and used many times by

a large user community, and to: provide easy transfer of

information among this large community, the ultiMate

solutionrappears to be a uniform network of standardized

parts. The telephone and railroad (standard gauge)

networks would seem to provide good analogies.

D. dpbstacles to Immediate Implementation of the

Ultimate Network. For the next several years, however:the

degree of standardization required for the ultimate network

is unlikelyiin view of the\41ready heavy investments that

have been made in existing heterogeneous, nonstandaidized

.retrieval systems. Lack of standardization is &pervasive

barrier to intercommunication. A potential user of dif-

ferent retrieval systems ig faced with a series of-obstacles

right from the start': the,necessity to discover these

'systems in the-first place, to make separate procedures to

gain access and account for dosts, and-, quite possibly, to

make actual acceis via different terminals and separate

locations. Other obstacles face the user once access is

-made: different command languages, retrieval' functions,

,indexing vocabularies, and output. formats. If the pro-

grammers of one system wanted their system to communicate

directly with another, they would face problems of dif-

ferent operating systems, hardware, programming languages,

t

character codes, word /byte /bit orgahizaion, file

organizations, and most directly for the majority of

8

I-R sy6tems, no established computer.-to-computer

communication links.I

Because of the established character of the

different I-R systems, their environments,, and their user

clientele, and the cost of remakinedata bases in dif-

ferent file organizations --- even if peimissionto do so

were .granted -, it seems unlikely that any. existing I-R'.

system and environment will soon become a de facto standard.

I

E. The Computer Interface. In view of the

foregoing obstacles to the immediate implementation of thea

ultimate network, we decided on a course of action

for the intermediate term which seeks to approximate the,-/

effectivends and efficiency inherent in the ultimo.%

'work as best as possible through a currently achievable

.network based on computer-interface techniques. Such

an interface achieves compatibility among systems of

heterogeneous hardware and software components through

translating and conversion algorithms. It is this kind of

interfae that we have investigated and are reporting upon

here.

9

The over-all objective of our work was to establish

through analysis and experiment the feasibility (or in-

feasibility) of a computer-stored.common language interface

for disparately designed information systems. Our program

was conducted under three major4teadings:

Study of I-R Systems and Research Planning

Advanced Network Research

Uperimental Interface Decigr., Implementationand Analysis

10

III. RESULTS OF TEE PROJECT

A. Stud of I-R S stems and Research Plannin

effort called for an examination of several online I-R

systems from the viewpoint of the requirements they would

impose on a computer interface design. The purpose was to

find at least one I-R system outside which would be

suitable for our network experiments and whose administra-

tors and technical staff would he willing to cooperate LT,

these experiments_

We revieljied many-of the important I-R systems.

We participated in the feature analysis of twelve 1t0er-.

act:-.re fetrms: performed by Dr. Thomas Martin f :itanfotO

an 4ticndcd 4 three-day oftsirtar for this putptisc at

:aanford University: Our revi.ele of theitc ayptcluite led wa

to the CfrIclual(rn that Q4t otlEftlal idcaa fot 4 fletiecti.

interface mstitEd (Icfailc4 4cyclopmcNt It+ astWiticitt. we

iclatttlISetl the NSLDI1NL tctticvai tvctric of tfic Naticterial

I II,: at v 1'40(11c lItc a. th,c t.c t (:4c f<3 ttc t o (hoopc at tic

fitat aotc,u, tctiutc Sttim M 1,7. with which chvcticuct,t

Cite tcacc,ttc th,JoaltiE MVA.T.Nt. int ltvIc

MAI t iti, t:ctv aI I. Itti vc I cIt V S t t, 11 ot t.,1 a

; t I tT,c lt,zt It k:tc fot ttInstr,stii. at 1 ,to,

1.,cccc1,1, ct ctc c(WqICA;1.1., A 4tt. iWted .OritlpAit4

I SVC atlti4yglif

( 1) 1311,1 staff wqp very r.(popctst ivy In Aiding(flit TP1:1G41"( h Pi int t . Ploy pr(iliit 3 rsr1 brit INV Anil Tcar1V a( cFp t,n t hc.

y t

( 2 ) tic c- r rent t r,1 ) c p1,,i1pry341 pt irv1,1c c ,ritt t fni Pt, 1,c tt3c.nti33it tc1 t t, j IP I ;it: 140;1c* t-

Int tr-P rot ricvA1 evetctt; ,)1 pi 1.7

( ) 1/01. has t A1,1 i ci.cl An cirt,cr ; tucnt Ar rnr)P r t irm tr, the At. .r t lc API A r,c t to, t1 4/1. I ( to

W4'/3 (1 j,/ fr".1c he- j,f I In I t ts/i ,,,q7 ,33,3pt, rtcIw( 71. cititct Itticnt c

in the r r.t)/ t c r,f ftsfl" thic t s,c we

tiat)cnc al 11,c I cvIcw c1111.1 It. 310c 11.c. Icy; c-w ck ct

( t C ;lctWr,7lrc I 4 11 LID C 1.1 r.ctWrrtlr

c j Pc 3" 1113c rt Tiilz cv3 Pvc t c`wzI .1 nk In t i.I wc.

.1; e. t,%-ct cc3 bin( we c1Ict t Svc 3v .;cc in rtL;1 cfpci 3ntel'Ag1

I! kt; rA.3'31:1 3:14341 T17VVI-.7 I 1c r 3tituthit ci e.c. I 3,43 t le I t he.4

P I c t c t F.0 # t au* tr.r. t i. (.11 10,1 a t i r'T, t I.

Isc 3 a t 3 trna 1 en ; .c ; I 1 ; s.j_*! 4 i c I-I C- t c-ilia 41: c

s. one-11 1.c JUP I 14 t I.J 1 7 33: 11 sAt3c

hc-wluz ti ( t 1.. rt, I I...Luc t 3. a C ( (A 40.

rf 3 3 I t .1 c t 3 Isc 1i I ; r. =: cizt cct I;

.;:t c.tipc I 1i:4c:A =1 tic t w. c- .I 4c .1 In.; c 13- Ic I .rot

...1 tc3 rat i. 1 le cictru.: vc 1c

c c . c .l cti c i C : 4 :- 0., 4 i,c 1 t . 1'2'1

12

frero each EtrJupo 'Pc 711C4WPF. SyAtc-m DcP1,111k4cnt

Cntri4)/otirtn (FD0C). to At(mair Vnotgy Comvoloolcin. Aotto130.

tanfnrd Ohl-vas-yr-Pity, and !WA.

01434v*T4,-4-4 )14-tv,41-4. ;,c-scot,-J4 to.zi, 4-41.444

Et ant woo a 1.7 (,34 Loy c-4 4 f 1)fly f , f fc ot41,3c..tv ( i f Int CI-

rytorac-f t irift, rf I vc Oc giro, kccktOph; f 43 iy ccpSit 141 11-4 (4/ aka 1 crn

4Tott le31 ovette.tibt A ttsts.t tictet jt:c,.r.k t hc, cal ly Tirroolt

#,a Itccr, Ttct,atCr rat I 102 tc.pt)3It wctc *16..4 141accr,tc.i3 At

the Po t w(11 ; nt (rnr,c( t ; (Tr) far,c. 3 orf 1)kc aarer,wa; Luc c ; r,k ,,1

t tic AP.MA ;ct a' ff./ It414',Ittasit Z.( or.t

tut441,c/ Itart, of j.r a to4c1co, Co3:ft1n10 $110,330-ite

Ihcec jcetylt mat* klycr tc.1.44to

1 c Cormirstrir-A C ( ostip; t c t I t.t t, fat c

.1) F to t.-.; 3 f (-t 3-rti et t tetrte, t Itrik

; t .01 41 res= It ar.0 at E kauts(i Ci 41.1 At f c I ( -ttactw, r 1,c ci A., I 1A-A ttif r t usAt IkAt t I Aril Ipt r, ft tic .lt,t c fa( c flits*1 f rl ki au.

tt.I t , k 1.&r,E EA . Cr,.l I 1-, f 11,A t I,1T4 1Ct lAtml Avs:kzul ct C '17

I t 1.c Mai A I Ar,.; ( ; c1 i so:4 11-4 t c1,AI at 1.rt4) 1. Int1,(trLi c*, I ;at rrt ft tin ;t,E t-ar 14Utic I VC, 103

:t.ic.ictt Lt, Ic ,:: ( kt t.k I (r.tIE lizEc 7...ILrl.,&11 I anti 11-

I 1 gum . c c ( 1 I. tut .41 t :t.t t C S.-I t .ptaivat CI4 C Cl sa:, :c. 1,1... 1. Lt. hitt rLzt ZI ;' :444

IPA LA41 inc r. t , t t if IfIVIC I C

13

t :Ect,c,114 ittf,,77F#2t t# ?At 7 ic,'A) C ye t Fre Lc cl--"ac '3

t ,}trrt, r, f Itt,,, , c ir.tctfa,c , p tr: rn2 t 7 C v C. f(.7

cffc, t it, t t rc., t PII,Pyl A .,!Cc-7

.1.4. C C 113. 1 1 3

T;L! 1

,1; c c 1 i, cvctc th: ; r, . flurry ,t, f 7 AtItc 1,4, 71

(0,37 t t 11 C 'Jr 1,1C j113C- C t c.r,, cr.; 1,,

; 1L t c u:1-1;r-

; r, ; t.4 t :A roc c. ,1 < r ; t c c ; a F, ; .:41 3, r ; r Fri I t,

i :.; 1 ; , , , i -1 1 .c , mot, et, ; f.t c t'l :., c c }, ,,11; 1,c, .,c... 1 F.c..

i, F . i.c rII, ,f :5 ,..:,1::::+1 evetem 11,4...I

),,I e 1. I .1 et. 1 i,e z 7

a3 0;t,iic :1:1,11:, It I},C ;.:C't Urtl, :133 }ht.- ,,f,- `C:'C't V

, .t1.1, .111 lc: , f = vett ;c c; :,:tc,rt ;#cto;l: Doc k;,-ct.

;r,f'.c t t F.c ic., t , : , ttsti,, r,c ,r. t : I 1 11.c ;r1c1fA(c. i;ctc,`} 1,c

i , 'to,

;;,, , ,c t , c..r.,1 tr, I C mon cr t Ir c

. /CI, t I t< c C i 1

el. c:t4c r. t c , c 7 i,c t . ,

c ; c

1 I '71 v t 1.; , ,;;;;

c : :`,c c : I oe. 133, t C.

c

: sy

0 1 11 t I t

///

4.

I

Mc ,*ct ./.14 .ro C .

s t h4 I, 1*,es,r, ,

SP I 1S,1.1 'pH vs ,/p t!"*44,* !I( 0 #0c, It

ft

,111

t

11.

4,

;

14

5 1, v .5 114' I I: 4

rp' ,ss r 441..4( 46% 1: IC

s i 1 ,

f.... i', ..- "...,

., --; 4-'.F... ,

1 1

1 , 4 4 411 , o r , S ' ,.0., A ;

C 'tit vv., t .1,4, Ir 114:

c, vr x It. t. Ikt , .

S.: :SO: 4... 0-1,4 /011, I: t. t I, WI.'

::041 r t 114

'

4*,.. vs: *r*, p t C I ! , ?toys:

t ',IL I, 41..4 t C.14111, 0.), ' Yrt 4 ISCS4

IticC1,4,t4 Sr at tG Lae tve111:14 'II t yr I c.,

15

fully in :',oction C. Tho development of more advancedt

c crurmini ( t onP ri,anno1a Pwa t. c a Tro)re c(Imp1cte expoaiticm

of t onc, tr1 be ;11 ormrd by t he interface

1 c,i wino rn Cc-JIMA-1,1(i

1 of OT MA 4 ( 4

I 1.0)'# E o

Out A t i)dy of t hr r (mmand 1 anEuale f or Ar vet. a 1

xctricval AyPt 414 41'40 41 4._ 0;:i411( t. A (

a

d uc; I n a n1.$44lrC3 of 1 wpor t a'nt . 1f At 1 11

tentative, onc. 11)0 1 ono

4

a poi ticval 1-4 ic pc Tall A. ( Epp?' c ;Iv 3 c of3r(dividua1 f uric .tri a .

TI, ata 1 y; r hr (irriavrio!:. ticityAtc3y, 1 hoti lec tut /0 i

truc late t hr H.= v pi 'a ivr f t117, 3 co.:7 1 cm,

t i cc Ings...21-,0 c *4-; t .c *i;c0 tt.AC2 c A

lic fwc.Vat ;c!l" acct (,r,vet;lcttcc 11 1e 7IC4 c c ' I I

clatu.27.1(1: i,C tItA Iztl re* t t 1+=t$ i 1,t! 3 v I 41.4* !yr11:: t S t Ttlz

V.11 A 1 t. c trttlttiat,c1 1 AT ti.:(F.:,c v c-/ri,;:, f t t

hc v 7tc !tit a arc c lic 1 v 1 r f

c(millc:c .1 ((v.STIchrl-1:1vc I .c r..1:21c 1 tt.. cfs

tic r ( f (,13:4 .O

,,s t1.c z.1.2c.vc 1...c:tic I 1 : :r 1c (

At.11.1At c cfA( S 1. 11t.:z. At,r I.:IN-et. ...outt.'44.4 I:. c

A t 1 313, c 4 Inri4.2 1141 t I 11;11: C

1,h 1; t t c-31.: 1 .c-11 out:3.33131,1 :Atec. 1c (.a:.a

C 41 I .1 it I c c c nam, 4,; I

=:11, s- : t c

7.c. (.11 (. hc 1 ,,At !is, f L33 t1,cI zit,z,

10

It is a common misconception that the difficulty intranslating among command languages results ft6mthe different names given to the functionally similarcommandn and their argumentn in the differentsystems. As nuggehted above it _in rather, thediver-Pity of 4function bundlers" and the incompati-bility of exact translation that are the primedifficulties. It in, however, convenient for theliner to he able to reassign command names into morefamiliar or Pony-to-remember labels. Thin in onerequirement of an interface language which, in effect,allow; many dialects within the common-language,framework.

II. The ry f or , it is proc1pnt to ronrarc h (-cry-ary1 lanFoJap,c,4a 1 onp the 1011(twitig tn.; 4 :

Ann1ynin of retrieval commanda in order todinnect thvm into their ptisnitiva functions.

Development of 4 common command lanRunEecqnnintiny. of macron of these primitive funcltiona.

iii 1=464e:tip 141_1 Cl/41190CD foie aaB1Pt10E 00Ct P inflituaW;rit of Kenetal Income- at lit y and its

cx:Kt4iam of trannlatimi. as well ac, Ficncral

gricla fort capict owatem UDC.

iv initial cmIlhapio on acccza to tbc 1N netwotitt tt301 Ibe f cletevrti let EC t 11C: t ban t (10E11the lanEneKce of env of tfic 'ecvclal I 14 eyztemcthcaricc1vcc.

Opcn cncicclncpa, flexii,;11tv en4 motinialitT incicciEn of the fnucTontl lanEncE.c in ileeliable

in he env oquzieni of tc..layco 1 N

cycjemc.

10(clfacillg P.c(Wccti itivctac Vocal,nlaiicc

10 (*tA cat iT effoite we inveetiEsted the notitql

r jut, tI.c It,11cx 12 Cc rc i atL11.ci v tcc Itr,icltica 101 1411 CIC.C:

17

decomposition and stemming, to alleviate the difficulties

of searching data bases that have been indexed under

different controlled vocabularies. During the reseIrch,

this notion was extended to comprise the concept of

the Master Index and Thesaurus (MIT). The MIT contains all

the index and thenauron elements of each of the damn banes,

including an ordered lint of all vocabulary ,terms used for

indexing t01-;ether with the country of the number of docu-

mcnta indexed by each and the thesaurus relations for each

In additioa. Iffirougb ur.ie of the techniques of phrase.de-

ompoa t ion ( that ia, brcakins, a phrase down into it et

Ind v dna I word j and A t earn nF. (dropping: word endings r4o a a

Co «rapider only thr,w0td atrMa) we can automatically identi-

fV mat intetvoc=t-ulaty relationahip.a in addition to the

idetitv telatitphip.

vhe lot readlr etoriny. and referrIng

tt, t!..e.te 7elat.,ra:-. t( rtovidp itt thc mactcl iti(1,..x and

The c =LA %Jr:: t ei et en, ea ti. a 11 indcp, %o( al=n 1 at v t el ma under

cal 1, k= t rl ztem that ai,pealm in that total. ( :.cc r14,. 2 )

4

:I es, am/41 I he 1,A7,A fetal etc, ts i( al inzn 1 at i00 is

I'.!. A Thc : A 1 1.1.ahr t i=a1 ltpdat e :fept 1 .4 7 1

electr-

icityicol

is

H T Nelectric fields T,

Helectric intolotinq popert

Helectric sporl1

al electric/1 intirIntinnT'N

T

RT

4/11.11.1 electrical intulotont

18

;owl-

otorsN

ationT 'N

1. MO

oting

Omo

ftift11.

.N..

thermal intalotionT'N

B1--NT ill copocitor poperT

,RT ... i asbestos rN

..RT dielectricsN

ft.ft.

1 T,NRIwiring

UTft.ft.

U

RT ieloctrIcol portcloinT

PAY

relationship ettalithed outomaticollycl 01 i en from e,ittin9 thesaurus

Don, TIST Tt4ISAIIRUS

Jv NASA 1.11SAlJRUS

R1 RIAA11.0 11RM

N1 NARROW, R 11RM

liT fiROAtitR T(KM

111 11SIO FOR

f i0. 2 Sarnplo Relationship aftx>ng Ttren as Maintained In Master Index and Thesaurus

19

is automatically found to be related to the following

TEST* terms in the way specified: specific to electricity

and insulation, generic to electric insulating papers,

synonymous with electrical insulators and, of course,

electrical insulation, and otherwise related to electric

fields, electric sparks, and thermal insulation.

The MIT can be used to help determine which terms

to search under and, indeed, which data bases to search in.

The MST can be used either in a purely automatic mode ar in

a Manual or prompting mode in whith the user is given a

display of terms to search under. Interestingly, the MIT

should be a useful adjunct even for a single system with a

single data base having a controlled vocabulary. In sh6rt-,

the MIT concept clearly has important potential in the

development of I-R networks**

5. Bibliographic Data Elements and.Structures

In our research we have determined that another

prime consideration in the development of means for users

to in

*

teract conveniently with different data bases is

to.

Thesaurus of Engineering and SCientific Terms, preparedfor U.S. Department of Defense by O,ffice of Naval ResearchProject LEX, 1967

**See Section C.6 and the appendices for additiohal details.

the interrelation of the div rse4data elements and

structures Lim those data es. Three ways in which

.

the interrelations are imp r etant thay be enumerated. First,

searching is done on one or more data elements:, in order

to translate a search done on ope system into another,

the correct correspondence of data eleme is must be found.,...

20

Similarly, user output requests iequire he specification

of combinations of data elements from the catalog records.

Finally, in order -to combine retrieved document sets from

different data bases and to create keychable document

sets from separate'data bases, we need to: identify when

document references from different systems refer to the

same document: establish dommonreference formats: and

create common index (inverted file) and catalog data

structures.

Our basic solution to this probleceis the

concept of a common data structure based on the identifica-

tion of data primitives or basic data elements analogous

to the basic component functions of the common command

language. Compound data elements in any system can then.

be translated into, or composed from, combinations of

basic data elements in the common data structure. The

21

basic dataelements would be hietarchically arranged'

into a data structure' and, typically, the compound data

elements.of a system would be equated to a higher level

node of the common data structure.-

At the'highest level we haVe subdivided the

common data structure for bibliogiaphic data ineo

seven major categories. An initial breakdown of one of

these, the Abstract-Indexing-Contpntg category, is shown

in Fig. 3. There are 21 basic data elements identified

in this category with 14' higher level hierarchical

groupings. Note that the abstract sentences, which are434

included under the abstraCt grouping, are separated out

individually to make it easier to use this information in

subject indexing.

Three additional major categories which have been

similarly analyzed are Titles, Names-and-Relations, and

Related-Document-Citations as shown in Figs. 4-6, respective-

ly. The other three major categories which were identified --

but not fully analyzed --- are DescriptLve Document Features,

Library Systems Holdings and Shelving, and COntrol Fields.

Alr

Ir1A

.: T

.4...

0:

AL

S rP

,AC

I$T

IAT

.

&!

A(

a4S1

14{1

.

C C

Of

Mrf

Cf.

As

CA v.

1.9

I

SO, C

AS

..A41

4 44,

tnn.

,*

'Sr,

,,.. ,

fl.S

4,fe

r,rn

S.

.A,..

"A

AA

nstr

,,1

.10

9.11

I .1

1yr

IU1.

1

(SC

Cl

IiC

CO

E

S -

C11

C .0E

-O

A -

rASS

-1A

: r,

cia

mae

4

C L

A S

S r

5

FTL

ASS

-

,

C

CL

ASS

-.y

.Q..1

11

C A

SS

jA

:2

S'IØ

5).

'C

ei

az,

1 A

OC

4-I

._R

NI0

.«,-

I-.

7C

-:;.-

-cIC

iJ

C.T

.AT

E E

.,Ar

PHR

ASE

'*5

A-

Ccc

i

'-.1.

Ai a

-t55

Ci

I

of A

l1,

15

Fig.

3C

omm

onB

iblio

grap

hic

Dat

a N

hine

nts

and

Str

uctu

refo

rth

e In

dexi

ng C

ateg

ory

(bni

tiol V

ersi

on)

v.( ,ztillt'V

1*1_1.1:33

S*

J:

c.,rz:

2I'

-

Vitir

J

I-

r-i-*

2i. i.0.,

.

go,,r. _

,: 164 .1:.,.,

10, ,I

.-.1.

.s.1

1,1 '415

;,.

, 11-1.

C.

is.

-

Zy 3

t.-

t4:-).T

1S

1L1

-vrelf

I

I3111: - S- 311.11.

I

0,, 4

."II .%

wi,. tic

14I.

..V

V-

.

I

.

10.

..I a:

iti

It,'II.

t tA

' -I'A

O,.

IkI

',oil

4,,1.- ',1

.1Al.,.

..t;1.21i4IG

-ot%J.. a .1

.,;.*.;

.V

- MI

14

, :14.

V

1.61...,

It,...I

I1.

Etl# 1.v.

nr.*V

, VP

..9-

,i4161111:,.

5,U

.I

.,I

i II

I.

I.t.

Iv

'

Oh ...A

.4.1 J-55,51

01I55A

O -I

....I.e.

0_..

(...

U...

1r.7 .6

'41.)tyit1.1

r.1.1:41 4,h.44)

,1.411/eii.)* 4.

ay

.tv1.1.314.,3.

,/et

434T

A:

-

0...f3/4/4:1

e43.4140

IA,

111.0

111

,

43.Ia

.0,'

t!

1 0,

ay

l

-4

A.

IA

.443

Act

elfrV. 1

"131.

10.1

s

4:11031I

3.

1,3WC

I3.4

JrI

W.'

IV...131_ .

.r* ,.:4433

1-61

-;,10,

..14- 47.:4

V

34'4 ...IN

.3.4,

; r

; Ic ,IC t 't t Cr C tc I fl. C 111C t ! : r .

!

WC, Itt.: c t :., I c. c f

1 C : COY

c : : c i , t.c , c. t c 1 :.,, c :.-,- c A, i sC : C C.?:,:

1.C. C 1 rt-44c 1-101c. r, t ; r , : tr. c c I :,1 t , c. 1

4

t t c c t t ; tc Ct

c t a C c

(.cs. ctrl .0 C zit ,,! 1 jtttcY atc

1,c

-11 it 2, I ; tcY IttC 1,t ..t

j1i1C/ f.:sf C

:)C -111, .t! t c t

t!t it?. r t ler . T tut :kr.

lc Icv:tj trci

ccI ; c-,(4;c7 r crItt.tet t it n t t cl t } t

c f c anEltai..:c cctsti a

?lc an: 5, Y t Y an clat 1 ; nt, thic at]rua c t

cithrt

tctlat.tiucc f ait1It. in tltc t; ant lat 1.-ttinklr it Vt.( aF.31:11 cl

1 c vitic Si lc at (1, and t. f (,E. out putI tt t hr t I 1:11:1/1 11 1

%7

t ccA7,i .1,f1c/ct,tc t r, A Ittestv or, f i I c

, i Al jilt.; to %let C;tht 7 t'-Ctctt, t, tt,c ftillt C t,t 1 it:\t, :7 I cr,t 33 t icc t,c t,E

I t IttrliA r.'3 A 7.E t LA c 1 f c, ,1.;c1Icit

1 '1,<( r 1 a I c:t Lc-0 r t cpt,ct ittscnt lxith the°iL! Pf io t c 1-4.r. r;:,c. ttic- t .t,c ,,t hc

It ;, 1 :(,i I. ; Ti ct.. t IAt ,,tC t 1114 t 1,.ttA( }.1C vAt it 1.47, t }cat r t jc.cc

A 3r ;. 3 4 3c nt. ,1,31:41 t 1 ht. :( 11:4t v ptt,c n 1

I c v , t 1 . 1 a : i r ;JO I 4 j Leno. cac at,: 1,c

-r,: 1 t c r, t t (Pt? (otiplyrti (ft1,1,1 I

lcitft t hc r)c 1 4 tic :pi T.C1 t1J3 ittit1A1

i n t (-7 l a ( t c l : a r t .1 ( t 1 N 3 7 1 ; iAr (i S 1 ; 0 that t l,c 1 ( ( 1 1 ( 1 w -

I pC I- A t jt cr,t)111 I C, IC f ()7 :tit 41 1):0 t,h tl.c 3 11 C f f c

A :+c 1r t t (roc t wc, C 1 ;) cnt (trticct r0 j )1 t c- 1)1't) Ac4:think (can thIA :,),CtcM A)

1 :,cicc t c-Ithr.t t I tc « Farb( rn (:t di IT 1ar:EtsaFc c.y t lirc.f t c:x., A In Ch I( h it, AD:

II :.yttctr) A la Fc is L.:cc ct an l' tctjcicc,ta n I'..vctc-a, A that atc 1tt,r-r-i1)1c wittii:1 that ayatcrn.

41 I CON I .1 1 att!,:nat.:c7 c pct t tic ly41(.nEc IltUak.47,3 ti A

I . Maltc Ic pc, at t tt t ccincnt

it . ria14.4- a t ittplc tr1i.tlt IrtucDt t (. r,cc tata1t1hill! ()I t i t tl (1r1 vt'd 1 I it:: tit,i t ; t otstld i ti t tic3,1 cv lour_ It I

iii itartc a f i) c int tl.c tccttl1C 3.1 tt.c1,7jttt 7 c (.;)Ic C t an lc Cl (r7 f t ct$1,

c-cf4lIcnt ; ttk

a V ; 7 ;r 7 tr. V cµ'i alcc '')°, IZI caVcrjt4 t iii).

'Pr-7 I i '777 e re t hc fipct cr ihtic4 P770t hcr:11117 f(4' a1r1 in 4c. i 114. i At ccat( f, t crmc

t.c. o cur I tic 333/133a3-331c k.1vc-n jn ( a) anti (I)) cnau 23, ualcr t an() oc h .1 hc hcr I cyttcmv.),13" h ;e t;rt cnt ( tai,

)',3 t c t 1,zt t ht (tk,101 j)7 i At c (07a017)At

t hc cc ( telt:AA:At tFtt tcs,)11c (,f scot ( hinE in 01 lIcTcnt

vc..tc-tIA Can tc '11c, tc(1 1t1 a, f igittoyn fl lc

fl.c t 1 c 12 t at tie (if (q4t implement rot (yn

I 17 1 (lc c En 10 E vcta 1,c 1(444

4 Al Int (ttInc, t I (MC and INC I Wci7 1, nistitin IL at I.< kti

0117 (it I El nal 1 racnt ct31 %cap t ci tua Mc a s <ttit3cc t S Ott of

t ttc Int Cl r. AC C to I110 cx Ictt irval tvc.tc-333 at M I I atul

.:Ir :a( tr trt i rya rlx; tat 41;11 chinE t (ntrauni Z. a-

2.,,1 (In lc 1 tilt t t) 41.) thta G 1.011 3 C11 c c I (('3 t than c ()1 I E. I tIA 1 1 V

.u.ut it i pat r tl liowcyc t we I na 11 v t vctt ccvc.3al tli111 3. 131

t IC0 and %ea Qta ac 131c-Vc a 3333x311itt 13 at jtt11C haac t hat colt

1,1v paucc 31 out trti Eoailu, 1:3 pat t It ul al (1t11

t c l ; at c III (iF. 3 att3 t r I tlic t1 CAIN 1 ( (01111c t ((r7 I 07 he t 10:03 itct;

I T, f ; a.a t I, n 71 vcc, c TtR t a)]c ro-) t

'in.; r ,,r!!! ci jr.t c V;1"; 7 I (,f. c 1.,.ct t T

tl,c Atr4,;.1.7 t _.(i)x 47 int c-1 1;4, j c ,c c. (141: Ar c. c. 1

,;:c..7 f t , orl, 1 LT t itc t

iiC- I 144.,.

c T ..cc- I t F. J

711C. ;Tit c_ j l At C., IA, t, 11411tt :1i; 1 A I c 1:1 I I

, i c t t i c v p ; ,,1 VA b41

IC I r.at rti 4,11 At. /144114,),...1 7 c f ir :1 t nccc is tLtt

11 -if cp :vat I. i 4 cr.,1 l t cc l (lc IA tit, I hc )1 I 7.

7 () / t fftLVljt C 7'; rc ccr ta,,J C Vtt Ctli is 1.1U)1 4 NI v, is

I ,1c tit Atl I 1N,!4, () / 1 *, lr, rtc t Plat v la nil In

4 clt11 t tun we. c c t Al, 1 i C 11 I hc c An: t r, (.7,roc-c t t,,11 int c.

1 a c, with TY1VtI;7 kit t }..c I It y ((rtuutit,ItAIc

1,c1 t 1. t vc paw tit C,(.1 rilot 1,cri (11:1.11 c vtt

t be Ii1A11K, :vc t t he 1'.at t c-11c 1,7, : yot vat,

the I n $ o t tua t it .L( v c A \CC 1 A: Al., A CS Si= t ;1St ,I1C.

t t n'ITIC I tt 'II I I N:r ya t ic-tr.,

ft tai I rt.: der:, 1X1.1 i I tl.c ttc., n=t, I th: I...1.i, it

t ci onnc, t ion: r tit vc(1 i r. lvct, it. c rs,t c-

(Alai lc a 1. ilartt ict, Dat a ( trti:r4ttn 11 at iota: I ci Al I . i,C 1 1111C tat A

121; :a ket ,c 1 ,i,c.t '. ii Intel A c rf, 1 1,, 11, IA,izl,oint. Ictl.tlit al .1r 1:1( '1 A1111.:;.:, i..l ; 11

t 1 1'4

t .

vc c , .4,7.

r, ii), r.r 1 ,C mr11,01', A I f4f,e

I 7 s.i.r., t .7.: I c 1.c ; t

C

.4n

C 7,:: A 1 f. 1..t !. .4-1'4! 7 7 c tT.4 ) nl I

t ;'1..rcc.7c1 .1n 11.1. ;A/ Cc?: f

I ;1 iAl AtDc-tct C he cpteit,A) 3 t, C C t C ,t4i1,c, 1 )(I ; c c tt, r.:; 1 e. ; cc I v he- 3 ic 5IIA ct t c ::1 T.:4 I 1 t 1.c I c sr Thc, frwc-e

t 1 sx t III ;4; ;!*1 1c ) y ss.i3c t I, A 11(4 t in the.

NA1 I 01. r c AT;11.7: 7 II' 71.0 ht ir3 1-11 cil V

(1. Me- ; r I C 3 vc l C ticc f f 1 c ty:/c h iitUAC

t r.: f t 11;1.1 : nt t,c t he Nr.,7, e,,, t hA i we

Luny 15143 c,ic t Tx. 114 c 3 3 ",:tic tit t

t wc Cr'. t I,c. !1 )1 c t,t1 t 1,c 713 itcrtc atc 1-v

ol, AC 1 . :.0

1C I fo.'c c t I I.c I x.: c S A. c Ca;J,1

;:: I c ; .1.: : LICVC .1 t 1.1 A I, t t I.c 1 11

: A t a 1 4::tei 1. 1, t I 11

I I , 1..t A.:11. 1 11 t 4; 1.1 144:c\I

Hi . t 4:11i c I,N ; ct ctt:-.11,11ch

; :.!e., . 1.1 .::. lIc 113 1. .'t.c 1.AtA Ccl AId

,

I

L...

1.

r'Lrs

,

.o

::N"'"

--JN

,,g,

,..,

,,1.

.4' A

.

--".

.N,

1-

,1

5.'

: 1%

.Z

-77"

.`--

--,,_

,..__

,,,,,

Th,

....-

--...

..,,,

...._

..._.

.:---

[i

tt

th_

11, \

\ \ \

.--"

-.,

!41

\-,

.N

._--

-1_j

Vi.

1

....-

---.

...e

<

,

\--

.'1

444,

v, .,

-J

t

I.ty

I*V

' IA

s2'

71.4

,,, "

A I

11:

:'1

7.

!I. .

OV

A ''

.

TT

A14

1,14

.-,t

;;,7

I17

1-3

.72

1-

:71.

44.1

071/

'g4

' 5''t

5 d

'zte

'TT

S

32

zrttini: remain pararrrtrrt in the TIP for thr connection.

irtminal it thn onne, tro imul data Art

to the '1.1.7. MI 370/116 in which iP located.

Finallv. a dircit «mnrction- in entahliphed hetwern the

tr flata thr couinonication link

.r:twrrn thr APPANVT TIP and Intrrx. The electrical and

twihanicai hAdwAte by which the teiminal 14 nwitched from

TIP to 370 (o,DIYoter, and hy which the two data Netn are

connected toi..,rther. haa been termed the "network

crocaovrt boy" And war.: deriined and conntrucied at the

irrtyonit :iyatemtz Imhoratory. (Sec t4chematic dinKram

in,

Thr mar. mrchanitim. jr card to entablinh

(c-nnrction brtwrrn thr TIP and thr TYMNET network by

taryrly rnakinE thr urcond terminal call be to the local

1y!1NLI aatr11itc (omputrr rathrt than to thr M.I.TI 370

t ):11.111 r.hould he noted that the network crOSGover-

hAr, the ilryihility to he connected (nt

diflricnt tittirt) to diffrivnt coaputrrt,. Thin Ilexibility

is of imVoltan.c in c4pclimcntinF. with diiierent 1-11

nyntrma.fr

(to 370)

./DATA SET B

33

(to TIP)

DATA SET AV

CROSSOVER BOXTTY INPUT

J

POWERON/OFF

SELECTORSWITCH

RS-232 INPUT

INPUT SELECT

TTY PRINT/NO PRINT

MODEL 33 TELETYPE

Fig. 8 Hardware Components Used in the TIP to System/370 Connection'

a

34

In any .case, once any TIP has been connected

to a non-ARPANET computer, the CONIT interface program

must establish access to the specified TIP port through

appropriate calls on ARPANET software. This having been

done, CONIT can send commands to either of the two

currently connected I-R systems through the ,respective_

TIPs and receive responses from these I-R systems Over

these same full. duplex connections.

Thus, at any one time, CONIT can be

communicating ith either of the two I-R-systems current --I,

ly connected: one being the MEDLINE system through the

NBS TIP, the other being either Intrex or one'of the' -

TYMNET. systems through the CCA TIP. The different TYMNET

systems can be connected to CONIT sequentially by having

CONIT send the commands to logout the currently connecte1

system and thenlogon a second TYMNET I-R system. _Finally,

CONIT can switch from a TYMNET system to Intrex, or viceJ

versa, by redialing the connection o the CCA TIP from:,

the network crossover box.

3. Thelr'Basic Interface Programs -

The CONIT-1 system contains as a nucleus a set

of program modules ..that enable a user to have an

3.5

interactive dialog with the computer. These modules

include, for example, routines,that provide for message

transfer between a user terminal and CONIT, the parsing

of user requests into individual command and argument

strings, transfer of control tp routindthat execute

commands, and the storage and selection of messages sent

the user in, full or abbrTiated_formats.

Another set of routines enables and disables

the ARPANEt TIP connections,,as mentioned in the previous

section.

A third set of routines allows for the

translation of user commands into commands in CONIT or

external retrieval systems. These routines are based

upon the concept of translation table. CONIT automatical- .

ly selects the appropriate translation table when, for

example, the user is speaking the CONIT language to

MEDLINE or Intrek. Special purpose routines also perform

other functions reqdired for proper translation tp extern-

al systems; far-example, MEDLINE.

in upper case, and user commands

verted.

must receive characters

to MEDLINE are so con-'

36

4. CON IT -1 Command Language

In the design of command language for CONIT-1

the question naturally arose as to the structure and

characteristics of such a language. We have inclined toward

a simple command-name/argument string type format with common

English words for the vocabulary (Ibbreadations allowed),

very simple punctuation (e.g., spaces) for delimiters, and

general indepenakice of command and argument ordering. This

structure seems preferred for ease of use --- especially com-

pared to a more complicated programming-type language.

English, used as a common language, suffers from the twin

. defects of being of complicated structure and being very am--

bigupous; these defects make it hard to explain to users what

precisely can be accomplished in the system at hand and even

more difficult for the stem to parse the user's request

syntactically and semantically.

These views on language structure have been supported

by our own previous work in Project Intrex*and the direction

See, Marcus, R.S., Benenfeld, A.R,, and Kugel, P., _"TheUser Interface for the Intrex Retrieval System" infnOetactive Bibliographic Search, D.E. Walker, (ed), AFIPSPreas 1971; pp: 159-201.

.

31

we see existing systems taking as evidenced, for example,

by the previously mentioned Stanford seminar. With these

views in mind, we began the design and implementation-of

the CONIT command 1.4nguage. The initial set of commands for

CONIT-1 that were implemented are listed in Table 1. Note

that in several cases e.g., FIND, PRINT only the

simplest default conditions were implemented. In addition

to these regular CONIT commands, if a language other than

CONIT is being spoken, any command of that language may be

given; i.e., a "transparent" mode is possible. It can be

seen by reviewing the explanation of the commands that

with one exception all the goals for CONIT-1 as set

forth in the initial design given in Section -C.1 above have

been met including the translation of a couple of commands

from the common command language into the ,languages of.two

target systems. (The status,of the one exception --- display

of the Master Index and Thesaurus --- is des"cribed below.)

The details of the CONIT command language, while

following the general principles described above, were de-

veloped in only a very preliminary stage; should our work

be continued, they are subject to change. Nevertheless, the

Table 1. List of Commands Executable in CONIT-1

COMMAND

SELECT x:

SPEAK x:

LIST CONDITIONS:

EXPLANATION

38

Select a system x to sdarch in andenable TIP connections.

Select command language x.

List current language and system beingsearched.

VERBOSE: Have CONIT give full-form, instructiveresponses.

BRIEF: Have CONIT give abbreviated responses.

SET TABLE x: Pick translation table named x.'

REPLACE $x "y: In current translation table, add therule that the string x is replaced by y.

DELETE RULE $x: Delete from translation table that rulewith x as the string to be replaced.

LIST TABLE: Print out rules in current translationtable.

DISCONNECT x: Disable TIP connection to system x,

(T) FIND x: Do a simple search on x in system currentlyselected.

(T) PRINT: Print out standard catalog information ondocuments in current list.

NAME FILE x: Select file x (create a new file if xdoes not already exist) to be used to savefuture catalog output.

Add response of next request to currentlyselected system'to save file.

EXIT: Leave CONIT and return to MULTICS commandlevel.

FILE:

*(T) indicates Xhat a translation to the selected sfttem isrequired.

39

decision-making process for choosing certain of these

details may he of ROMP interest. The use of the command name

FIND for the basin search request deserves some comment.

It in different in form from both of the systems it now

translates into. In Intrex there are three different search

commands (SUBJECT, AUTHOR, TITLE) to specify the different in-

verted files or inverted file subsets (title words are a

subset Of the subject inverted file) to be searched. In

MEDLINE the absence of a command name implies a search request

(just the term to be searched is given), although one may use,

of need to use in certain situations, the form "FIND x".

Thus, both Intrex and MEDLINE may be characterized as

having, in general, the search command being a default situa-

tion; i.e., being unspecified as such. Our Intrex experience

has suggested that users find the command language easier to

learn and remember when it is more consistent. The default

command upsets the simple rule: command name first then ar-

guments. Leaving off the command name does not save much

since an abbreviation --- as short as one letter --- may be

used. It also makes parsing and error checking more difficult.

Thus an explicit command, one in common use, was chosen.

It should also be noted that even in this rudimentary

interface it can be seen how a number of the complexities of

40

tine of multiple diverge systems can be mitigated. For

example, the CONIT unclY need not know the intricacies of

logging on to the remote systems --- only the SELECT command

need be given; the rent is taken care of for him by CONIT.

In the case of regular MEDLINE use through\the ARPANET, there

is the special problem of accessing through one of 5 TIP

ports. CONIT automatically transmits thp proper ?rotocols

and even cycles through the S ports, one at a time, until

it finds one that is available. Similarly, CONIT given the

proper log off, or QUIT, message to system x whether because

of a DISCONNECT x or implicitly required through an EXIT1

from CONIT. Proper termination of a uqer from a system is

important not only for the individual user (e.g., for proper

charging) but also for other users who might otherwise face

problems in logging on.

5. Implications for Command Language DesignpT

In consideqng how to design a common

language that could serve as an interface to existing

retrieval systems, we came to a rather paradoxical conclusion:

it is both impossible and not too difficult to translate from

such a language to the languages of the existing systems.

It is impossible in the sense described above in Section B.3

61

in that existing nyntemn are usually rather limited in the

functions they can perform and simply calloot perform an

arbitrary request. On the other hand, if an approximate

translation in allowable, it in often not too difficult Co

make a reasonably good one. For example, the simple PRINT I

command in CONIT can be translated to the "PRINT" command

in MEDLINE or the OUTPUT command in intrex with reanenably

close results even though the catalog information printed

might occasionally vary slightly in the default modes of

thene commands.

Two ways to aid in the translation process pre

described elsewhere in thin report: the Master Index and

Thesaurus (MIT) andjthe common bibiographic data structure.

Two other elements we have found important in the interface

problem are the identification of search statements and the

disambiguation of named entities (commands, arguments, neta,

etc.), as discussed below.

Firstly, it should be recognized that the names given

entities in the target systems can be isolated from those

names given in the common language, an ldng an an unambiguous

translation in possible from common-to-target language. Thus

the main task is to provide a mechanism for insuring an

42

unambiguous set of named entities at the common system

leVel. (Of course, the naming of entitieR, such as

coananda, at the common level should consider such factors

an the advantage of using already entablinhed widely used

terms_ --- e.g., MINT.) The common interface then needs to

keep account of all entities that are named a priori in the

coamon command langungn or that may be named in the course

of a given uner'n neanion. Such entities can be generically

classified as:

1. command names

2. names of argumento to commanda

3. namen given to retrieval aearch setsI

4. new names given by a user (an through aREPLACE function) to any one or combinationof entities in the above 3 categories.

Argument names cover such entities AS OAMC5 of systems, data

bases, paved filed, catalog elements, vocabularien, vocabulary

elements, modes of command operation, and so forth. The

common interface would insure Against dmbiguitiea by checking

namen a user might initiate against the current list of named

entities and request )that the user choose again if a duplica-

tion is detected. It is pasible, of course, to diaambiguate

at times on the basin of context, but it seems simpler and

tt 43

easier to explain ayatcm %Jae to a oart if all ambiguities

are avoided in the ffrnt place.

The identification and recording of nearch ntatementn

and their renultn appears to be an especially vital tank where

multiple nyntemn and data banen are involved. The elements

aanociated with a nearch that need to be recorded include:

the aearch statement number (annigned by apnea));

2. an alternate nearch ntatment name given by user;

3. the nenrch ntntement itnelf;

4. the aystem(n) and data bane(n) for whichintended;

5. the number of references (or documents)retrieved;

6. the name of search given in any target system;

7. the actual hint of references retrieved;

8. for an subnenrchrn nrcennitated by thinuenrch either at level of common system orthe target nyntem, all the above items ofinformation.

All items of information above --- except, for acme systems.

itrm (7) --- could be obtained and maintained ntsthr vommon

interface level without requiring modilication of the target

See the discussion in Section, C.6 below of the automaticUNC Of the MIT for an exempla of how subsearcheN may begenerated.

44

6 The Mater lndPs and Theaaoros

The Maator Index and Theeaurus (MIT) concept

described in Section 13.4 %cap developed for inclusion

in the experimental interface. A detailed deaign war* pre-

pared describing the specific information to be contained

in the MIT and the logical interrelations among, the various

parametera no contained. Appendix A describes 6hia initial

deaign.

How the MIT could be diaplayed under manual control

in deacribed in Appendix B. How the MIT could be uacid in an

automatic way has alao been atudied. Aaaume a toter mAkan a

search request of the form FIND A 13 C, where A, 13, and C are,

ordinary Engliah words. The ayatem could then search the

MIT for all words or phraaca that had a word atom that

matched the stem for thy word A. Call these trrma Al; A2,

A3.... Then

11a

Or4rCh consiating of the union of OCAY0104 on

all throe trrul A would be found; i.e.; As FIND Al OR A2

OR A3.... Similarly, B and C are found. Thy search art

answering the initial request would than be the intersection

ofthrthrreartaA.AND Ds AND C.. Previous Project Intrex

/45

'work quEgrrtg that tide treuling Pct . w'ruld generally give

very rffrctivo mrarrh rreulta

Of courgr, many of }'per modra Of IICP of thP MT may be

fieeirabir in pattirular fontrxtA in lading auttnatit. arlrc-

tion/A aynonymrtua and Aprcific rrlatrd trrma and URFT aelec-

tion from Bata of trrma loin-id automatically. as abovc.

Programp were wTittrn to traneform Information frcnn

the data b4PPP Of oia two main rxperimental ayAtema

MLIAINL and INTRO( into the Maatrr Index and Tbraaurun

itorganization ahown in Appendix A The Intrrx data haat,

waa grnrratrd from INSPLC tapro containing bibliographic

data fin thrrr arpatatr date baara: Science Abatracta;

Electrical Englnrerink Abatracta; and Computrra and Control

Abatracta. The MIT formcd from thrar data baara contained

both the frar and controlled-vocahulary information that

could by gleaned from both the INSITC cata10).!. record:a as

reformattrd in the Inttvx r:yatcm and the Itarcx inverted file

Qtroultin, Ito= invctiing certain of the INSPLC firlda. In

particular. thy controllrd-vocabulary tenon both claaaifica-

See. 1) Project Intrex Srmiannual Activity Report. I1 -1:,

15 Srptrmbrt 19/1, Maaaachuartta Inatitutr of Trchnology.pP. 29-47.

2) Ovyvh4F.e. C.F.J.. and Rcintjeu. I. "ProjectIntrrx. A General Revirw." Co be publiahed in Information

St.Pr4L.c_11!1d

**It IN of intr.:nut to note that the catalog relcoVdri for thelntrex IN SPEC data baar were organi:.r,I In thr hierarchicaldata structure an dracribed in Section h.5.

146

tiOn tprmg and controlled index terms and their Index

counts are captured an are all free-vocabulary word stems

and their index counts from titlen, abntractn, and free

subject expresnionn. In addition, the noting of controlled

terra under each of the word ntemn they Contain wan generated.

Similarly, programa were written to generate MIT

information from MEDLINE data batten in the new MEDLARS-II

format. This information included the thesaurus information

given in the MEDLARS vocabulary filen.

sIV. CONCLUSIONS

A. Work AccoTpliahed. Research on the coupling of

interactive information aystemn has progrenned in a

nignificant way. The research has fc\cunncd on the concept of

a trannlating computer interface by which the networking of

heterogeneous interactive information retrieval nystems can

hr achieved. Thin concept appears to be a viable approach to

the development of I-R networks in the interim per.lod during

which I-R nyntem and network ntandardn are gradually evolving.

Particular concepts and techniques which'appear --- through

analyain and the denign, implementation, and tenting of ex-

perimental int_erfackn --- to be especially useful in developing

47

the over-all interface concept include: (1). the virtual

system concept by which users perceive the network as a

single homogeneous system; (2) a common - command language

synthesized from a basic language of atomic I-R functions;

(3) a master index and thesaurus which. stores theo

vocabularies, of the separate data bases along with index

term interrelatioilships gild counts; (4) a common

bibliographic data structure by which the data elements for

`bibliographic information may be enumerated, hierarchically

structured, and interrelated among different data bases.

B. Recommendations for Further Work. While a basis

for research into the coupling of retrieval systems has been

laid; much additional work is obviously needed including the

further elaboration of the techniques listed above; their

implementation in additional demonstration systems which

connect several systems and several data bases and cover most

retrieval functions; the testing and evaluation of these

systems with real users; and the development of more effective

computer-to-computer communications. Also, we see the need

for additional study of the relationship of the computer

interface to the network of I-R systems including such

48

questions as:

1. How many of the I-R functions shbuld be.

performed within the interface as distinct, from

being performed by the separate I-R systems?

2. What are the technological and economic

reasons for treating the separate I-R systems withIn

the network as _inviolate :"black b,oxesq (tt'llAsumption

we have made in our research to date), as cons asted1

with the alternate concept,that they should be modi-4

fied to interact more effectively with the interfate?

3.. To what extent can and should the common

interface concept becbme a de facto standard toward

which existing systemslimay evolve in order to take

maximum advantage of networking potentials?

OP"

L_

APPENDIX A

DATA 'COMPONENTS FOR THE MASTER INDEX AND THESA4URUS (MIT)

Each entry in the Master Index and Thesaurus contains up tofive fields. These fields with their data components arelisted below.

A. Header

1. Is'the entry for a single word stem or a phrase? (1 bit)

2. Entry type (3 bits) (e.g., subject, author, other).

3. Name (variable length A/N string) (e.g., "magnet ","magnetic resonance", "Smith", "Van Pelt")

4. (a) Number of English words in phrase (6-bits). (phrase4entry only)

(b) Number of affixes (stem entry only)

5.. Counts (document and reference counts for each collec=tion --- i.e., data base, for which counts are beingkept in MIT)

B. Affixes (This field needed only for word-stem entry).

This field contains, the set Of affixes for a word stem.

For each affix there would be:

1. Relative (to this entry) affix number (6 bits) (sortendings alphabetically)

2. Absolute affix code (12+bits) (at present this would bethe Intrex 12-bit suffix code; however, we should allowfor futuFe expansion to permit prefixing, as well).

'49

50

3. Document and Reference counts (i.e., 2nanc

counts,

where na= number of affixes, and n

c= number of

collections as in A.5).'

C. Controlled Vocabulary Terms (may be null)

This field lists all the cons olled vocabulary terms forthis entry.

For each term there would be:

1. Term numbgr (relative to this entry) (6,bits)

2. Controlled vocabulary code (6 bits) (i.e., what vocabu-lary the term is in). (sort by ending, this code, andthen type).

3. Type of term (4 bits) (e.g., classification or indexterms).

4. Af.fix code (5 bits) (needed only for single word --code comes from B.1) .

5. Status (2 bits). Is this a currently used term? (Ifnot, we will eventually include dates when it wasused --- also date of initial use for recent term).

6. Reference counts fr each collection, i.e., nent counts

where nt= number of controlled vocabulary terms; note

nt

includes separate item for each vocabulary - type -

affix combination.

D. Phrases With This Stem (For word-stem entries only; may be null).

For each such phrase there would be

1.' Phrase number (6 bits).I

2. Term number of phraie (from Field C.1 of entry forphrase) (6 bits)

0

51

2

3. Type of term of phrase (from Field C.3 of ehtry orphrase) (4 bits).

4.. Word number of word stem in phrase (4 bits).

5. Controlled vocabulary code (6 bits).

6. Phrase name (variable-length A/N string) (sortalphabetically, then by vocabulary and term type).

7. Reference counts for each collection.

E. Thesaurus Relations (may be null)

The standard thesaurus relations are given here. For eachrelation there would be:

y. Term code for giveti term (6 bits) (from Field C.1).

2., Type of relation (4 bits) (e.g., broader term, narrowerterm, synonym, use, used for, related term,morphological variant).

3. Controlled vocabulary for related term (6 bits).(Could conceivably be different than for given term - --note could also relate "free vocabulary' to controlledvocabulary).

4. Name of related term (variable-length A/N string). (sortalphabetically).

5. Is related term a word or a phrase? (1 bit).

6. Automatic search expansion? (2 bits) --- should therelated term be automatically added tq a search when userasks for given term.

7. Remarks (variable-length A/N string) (probably null).(Remarks in free-form English on nature of the relationand when it should be applied).

8. Reference counts for related term for each collection.

APPENDIX B

DISPLAYING THE MASTER INDEX AND THESAURUS

Preliminary specifications for the display of the MIT underuser control are given below.

The command RELATE is used to look up and display informationfrom the MIT. The syntax is

RELATE A B C X

The last argument (X) is that word or phrase eo be looked upin the MIT. The first argument (A) specifies the type ofrelation to be displayed as follows:

ALPHA--terms that surround X alphabetically

PHRASE--terms having wora'stems in common with X

THESAURUS--thesaurus relations for X

Eventually, some combination of relation types may be permissibleon one display. For now, the default option should probably beALPHA)

The second argument (B) specifies that only terms and relationsin a particular vocabulary be considered. Sample values for Bmight be MESH or INSPEC or INSPEC PHYSICS etc. Sekreral

vocabularies could be specified at once, so that the actualsyntax is Bl B2 B3...;-i.e., the-connector OR is implicit. Thedefault condition would mean all vocabularies considered.

The third argument (C) specifies, analagoUsly to B, the particularcollection(s) to be considered; e.g., MEDLINE, SDILINE, INTREX/INSPEC, etc. 'Again, the default condition means all collectionsconsidered.

These specifications will be further illustrated through examples ofdisplay output for some RELATE commands. Note the default optionsfor arguments B and C. The index terms are not meant to be actualterms from the given vocabularies.

52

53

A. SAMPLE DISPLAY OF ALPHABETICAL RELATIONS

READY

relate alpha radiation

The index terms alphabetically near the term radiation (7) area.

listed below, (1)(MRA

(6)

TERM NO. (3) : TERM(4)

: VOCABU1ARY (5): COLLECTION

220 TY rabid dogs MESH; HEADING MEDLINE

15 TZ rachet wheels INSPEC EE; TERM INTREX

20 INSPEC COMP; TERM INTREA

305 TA rad- (English stem) INTREX

21 SDILINE:

107 TA1 radical (English) INTREX

5 SDILLNE

203 TA2 radiation (English) INTREX

17 SDILLNE

1276 MESH; HEADING MEDLINE

45 SDILLNE

410410 INSPEC EE; HEADING INTREX

13 TA3 radius (English) INTREX

192 TB radiation Y(8)

damage INSPEC; HEADING INTKI.

0 217 MESH; HEADING MEDLINE

130 TC radiation ef-fects ontissues MESH;HEADING MEDLINE

(8)

If you want to see terms that follow alphabetically,type relate more.

To see phrase thesaurus relations type relate phrase X or relate thesaurus' X,

where X is the name or'term no. (see above) of the term whose relations(6)

you want to see. <MRAZ>

ft

54

NOTES FOR SAMPLE DISPLAY OF ALPHABETICAL RELATIONS

(1) This is the VERBOSE mode message: in TERSE mode no message

would be -liven. If the argument X did not correspond to

any term in the MIT the following message/would be given:

There is no index term radiation. [kowever, there

are terms with the same stem: rad-.] Terms alphabetically

near radiation are listed below. <MRA3>(6)

Theowntence in brackets would be inserted when the stem

for (the single word) X does exist in the MIT, although the

full word X does not. In the former case, the display would.be:

TA radiation (No Such Term)

(2) The document (or reference) counts are given in this column

when available. (A stem where no free vocabulary, word

exists would not have a count.)

(3) The' term number is assigned byCONIT relative to the term for

argument X, which is TA: The two terms coming alphabetically

before TA are given first and labeled TY and TZ. As many

terms after TA are given so of to have at least 20 lines of

term display. Note full words under word stems are given

numbered final ch'aracters. If a continuation display is

requested (relate more), then numbering is continued from

previous display.

(4) Full words under word stems are indented one space. If a term

is repeated (or a different vocabulary or different collection),

its printing is not repeated. For each full word in Field B of

55

the MIT, first the free-vocabulary countslrould be given

(from 'Field B.3) and then the controlled counts (from

Field C).

(5) The vocabulary is given followed by a semicolon and then

an indication of the type of vocabulary term (e.g., class

heading or index term). The vocabulary and type term

parameters come from Field C.2 and C.3, respectively, of

the MIT (see Appendix A).

(6) These are message names.' They could be arguments to a

SUPPRESS command which thereafter causes their deletion (or

replacement by TERSE form) or the EXPMIN command.

(7) Some emphasis on the playback ot argument X is desirable.

Here, for example, radiation could be in a different color

or capitalized.

(8) If information under a given column is too long to fit in

that column, some convention such as suggested here will

be desirable.

56

READY

B. SAMPLE DISPLAY OF PHRASE RELATIONS

1

relate phrase radfaiii?n damage

The index terms with words whose stems match the-stem(s)(1)

of

rad-cation (and damag-e)(1)

are listed below: (MRP>

POSTINGS TERM NO. : STEM 1(4)

: VOCABULARY : COLLECTION

305 TA 'rad- (English Stem) INTREX

21 SDILINE4

107 TA1 radial ,(English) INTREX(2)

13 TA3 rgdius (English) INTREX

1

PHRASES WITH STEM(4)

50 TB polarized radiation INSPEC EE: HEAD INTREX

192 TC radiation damage INSPEC EE: HEAD INTREX

217 MESH: HEAD MEDLINElor

: STEM 2

TD damag- (Stem) INTREX

PHRASES WITH STEM

18 TE damaged goods BUSINESS INFORM

* see term TC above(3)radiation damage

..

To see thesaurus relations type relate thesaurus X, where X is

the name or term no. (see above) of the term whose relations you

want to see. C1tRP25

READY

57

NOTES FOR SAMPLE DISPLAY OFT' PHRASE: RELATIONS

(I) Delete parenqhetical parts if only one word in X. Null

result message:

There are no index terms with words whose stems

match the stem(9 for rad-iation (or damag-e). (MRP3>

(2) The full display for this stem, as in Section A, goes here.

(3) Duplicate phrases are indicated in this fashion.

(4) Stem information is given in addition to multi-word phrases

as a convenient way to indicate what the stemming is for

the given words in argument X as well as to show single-

word "phrases".

58

SAMPLE DISPLAY OF THESAURUS RELATIONS

READY

relate thesaurus radiation damage

The index terror; with thesaurus relations to the term radiation

damage are given below:(1)

KMRT>

POSTINGS TERM NO. TYPE(2)

: RELATED TERM : VOCABULARY : COLLECTION'

NONE TA CODE 6.21 4NSPEC

NONE TB WOE 11.50.31 MESH

2005 TC BROADER radiology -MESH: HEAD MEDLINE

75 TD NARROWER radiation dosage MESH: HEAD MEDLINE

READY

NOTES FOR SAMPLE DISPLAY OF THESAURUS RELATIONS

(1) Null messages:

a

There is no index term radiation damage cHRT2>

There are no thesaurus relations for the term radiation damage cMRT3>

(2) From Field E.2 of MIT (see Appendix A)

9