location terminologies
DESCRIPTION
Location Terminologies. ASIS&T Annual Meeting Austin, TX November 7, 2006. Agenda. Who we are Overview Using ISO 3166 Accommodating special needs. Who we are: Ron Daniel, Jr. Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies - PowerPoint PPT PresentationTRANSCRIPT
Strategies LLCTaxonomy
November 7, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Location Terminologies
ASIS&T Annual Meeting
Austin, TX
November 7, 2006
2Taxonomy Strategies LLC The business of organized information
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
3Taxonomy Strategies LLC The business of organized information
Who we are: Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies (acquired by
Interwoven, November 2000) Technical Staff Member, Los Alamos National Laboratory Doctoral and post-doctoral research in pattern recognition
Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group Acting chair, XML Linking working group Member, RDF working groups Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
4Taxonomy Strategies LLC The business of organized information
Recent & current projects
5Taxonomy Strategies LLC The business of organized information
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
6Taxonomy Strategies LLC The business of organized information
8 Common Taxonomy Facets
Facet Definition Potential Sources
Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your organizational structure, etc.
Content Type Structured list of the various types of content being managed or used.
DC Types, AGLS Document Type, AAT Information Forms , Your records management policy, etc.
Industry Broad market categories such as lines of business, life events, or industry codes.
FIPS 66, SIC, NAICS, Your market segments, etc.
Location Place of operations or constituencies. FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc.
Function Functions and processes performed to accomplish mission and goals.
FEA Business Reference Model, Enterprise Ontology, AAT Functions, Your business functions, etc.
Topic Business topics relevant to your mission & goals.
Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, Your research areas, etc.
Audience Subset of constituents to whom a piece of content is directed or intended to be used.
GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc.
Products & Services
Names of products/programs & services.
ERP system, UNSPSC, Your products and services, etc.
7Taxonomy Strategies LLC The business of organized information
Potential facets in the petroleum industry
E&P Lifecycle
Hydro carbon System
Geologic Age
Process Mgmt
Lease Mgmt Orgs.
Basins, Reservoirs
& Fields
FacilitiesWells Disciplines
Maint.
ReservesHuman
Resources
Content Types
Production
Should be part of community standard
Community Standard
Company Facets
LocationsCompany
Org
Strongly related to location
Moderately related to location
8Taxonomy Strategies LLC The business of organized information
Location names serve as surrogates for other things
Company divisions
Company facilities
Regulatory regimes
Currency regions
Product marketing areas
Sales territories
Customer locations
9Taxonomy Strategies LLC The business of organized information
What is a good taxonomy?
A means to an end, and not the end in itself.
Not perfect, but it does the job it is supposed to do—such as improving search and navigation.
Improved over time, and maintained.
Incremental, extensible process that identifies and enables owners, and engages stakeholders.
Quick implementation that provides measurable results as quickly as possible.
Not monolithic—has separately maintainable facets.
Re-uses existing IP as much as possible.
10Taxonomy Strategies LLC The business of organized information
Location names are used as part of different purposes
Typical correspondence and shipping “Libya” “South Korea”
Official correspondence with government ministers “Great Socialist People's Libyan Arab Jamahiriya” “Republic of Korea”
Corporate division of responsibility “Western Region” – does that include Montana?
11Taxonomy Strategies LLC The business of organized information
Location terminologies may be used to organize different collections of information
ABC Computers.com
AllBusinessEmployeeEducationGaming Enthusiast
HomeInvestorJob SeekerMediaPartnerShopper
First TimeExperiencedAdvanced
Supplier
Audience
AllHome & Home Office
GamingGovernment, Education & Healthcare
Medium & Large Business
Small Business
Line of Business
AllAsia-PacificCanadaEMEAJapanLatin America & Caribbean
United States
Region-Country
DesktopsMP3 PlayersMonitorsNetworkingNotebooksPrintersProjectorsServersServicesStorageTelevisionsOther Brands
Product Family
AwardCase StudyContract & Warranty
DemoMagazineNews & EventProduct Information
ServicesSolutionSpecificationTechnical NoteToolTrainingWhite PaperOther Content Type
Content Type
Business & Finance
Interpersonal Development
IT Professionals Technical Training
IT Professionals Training & Certification
PC ProductivityPersonal Computing Proficiency
Competency Industry
Banking & Finance
Communica-tions
E-BusinessEducationGovernmentHealthcareHospitalityManufacturingPetro-chemocals
Retail / Wholesale
TechnologyTransportationOther Industries
Service
Assessment, Design & Implementation
DeploymentEnterprise Support
Client Support
Managed Lifecycle
Asset Recovery & Recycling
Training
12Taxonomy Strategies LLC The business of organized information
Location terminologies may be used to limit search results
Category
Company
City
State
Salary
13Taxonomy Strategies LLC The business of organized information
Problems with location vocabularies
Placenames change over time
Codes may be reused over time
Familiarity leads to proliferation Many versions of pseudo-
standard lists Guessing what the standard will
become (e.g. KOS as a code for Kosovo)
Approximate alignment between placenames and business functions leads to errors when mapping data from one purpose to another Geopolitical names get applied
to sales territories with different company history and importance (e.g. Japan vs. Asia-Pac)
Natural messiness of human affairs States vs. Provinces vs.
Protectorates, Territories, Possessions, Tribal territories,…
Disputed territories (Palestine, Kashmir, Taiwan, Kurdistan)
Proto-states (Kosovo, Somaliland)
Complexity tradeoff in software Very few invariant properties of
countries and their groupings
Passions Boycotts and death threats have
been received by people who do or do not list particular places in their lists of ‘countries’
14Taxonomy Strategies LLC The business of organized information
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
15Taxonomy Strategies LLC The business of organized information
ISO 3166 is a fundamental vocabulary for dealing with locations
UPS maintains a central World Wide Code Repository (WWCR) to store the metadata used throughout the corporation Based on the data identified in the enterprise data models
They also have a Corporate Code Table Database, populated via extract files from the WWCR. These tables contain the complete list of standardized corporate
code values for each code type. Country codes are ISO 3166-1, with local extensions obeying ISO
restrictions. The data modeler for the Corporate Code Table Database is the
primary contact from UPS to ISO and the UN with respect to codes for countries.
Source: Barbara LaRobardier, “Taxonomy and Metadata at United Parcel Service (UPS): World Wide Code Repository and Corporate Code Tables”; Semantic Technologies
Conference, San Francisco, 2005.
16Taxonomy Strategies LLC The business of organized information
ISO 3166 is the world’s most widely-used list of country names
Country or area name
numeric -3
alpha -2
alpha -3
Afghanistan 004 AF AFG
Åland Islands 248 AX ALA
Albania 008 AL ALB
Algeria 012 DZ DZA
American Samoa
016 AS ASM
Andorra 020 AD AND
…
Zimbabwe 716 ZW ZWE
3166 is divided into 3 lists: 3166-1: Countries 3166-2: Sub-regions 3166-3: Changes
The lists contain three different codes for the same places: alpha-2 alpha-3 numeric-3
The source for the list is the UN Statistics Division
17Taxonomy Strategies LLC The business of organized information
ISO 3166 codes change, and are even re-assigned!
Country alpha-2
Assigned Removed
CZECHOSLOVAKIA CS 1974* 1993
SERBIA AND MONTENEGRO CS 2003-07-23 2006
SERBIA RS 2006-09-26 current
MONTENEGRO ME 2006-09-26 current
* ISO 3166 first published in 1974. Czechoslovakia dates from 1918.
18Taxonomy Strategies LLC The business of organized information
What is the code for Kosovo?
No code currently exists for Kosovo, but “KS” is unassigned. Should we use it in the expectation that eventually it will be assigned?
No.
To quote from ISO 3166-1:1997, clause 8.1.3 User-assigned code elements:
"If users need code elements to represent country names not included in this part of ISO 3166, the series of letters AA, QM to QZ, XA to XZ, and ZZ, and the series AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively and the series of numbers 900 to 999 are available."
19Taxonomy Strategies LLC The business of organized information
There are many categories of ISO 3166-1 alpha-2 codes
AAAB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ
BA BB BC BD BE BF BG BH BI BJ BK BL BM BN BO BP BQ BR BS BT BU BV BW BX BY BZ
CA CB CC CD CE CF CG CH CI CJ CK CL CM CN CO CP CQ CR CS CT CU CV CW CX CY CZ
DA DB DC DD DE DF DG DH DI DJ DK DL DM DN DO DP DQ DR DS DT DU DV DW DX DY DZ
EA EB EC ED EE EF EG EH EI EJ EK EL EM EN EO EP EQ ER ES ET EU EV EW EX EY EZ
FA FB FC FD FE FF FG FH FI FJ FK FL FM FN FO FP FQ FR FS FT FU FV FW FX FY FZ
GA GB GC GD GE GF GG GH GI GJ GK GL GM GN GO GP GQ GR GS GT GU GV GW GX GY GZ
HA HB HC HD HE HF HG HH HI HJ HK HL HM HN HO HP HQ HR HS HT HU HV HW HX HY HZ
IA IB IC ID IE IF IG IH II IJ IK IL IM IN IO IP IQ IR IS IT IU IV IW IX IY IZ
JA JB JC JD JE JF JG JH JI JJ JK JL JM JN JO JP JQ JR JS JT JU JV JW JX JY JZ
KA KB KC KD KE KF KG KH KI KJ KK KL KM KN KO KP KQ KR KS KT KU KV KW KX KY KZ
LA LB LC LD LE LF LG LH LI LJ LK LL LM LN LO LP LQ LR LS LT LU LV LW LX LY LZ
MA MB MC MD ME MF MG MH MI MJ MK ML MM MN MO MP MQ MR MS MT MU MV MW MX MY MZ
NA NB NC ND NE NF NG NH NI NJ NK NL NM NN NO NP NQ NR NS NT NU NV NW NX NY NZ
OA OB OC OD OE OF OG OH OI OJ OK OL OM ON OO OP OQ OR OS OT OU OV OW OX OY OZ
PA PB PC PD PE PF PG PH PI PJ PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY PZ
QA QB QC QD QE QF QG QH QI QJ QK QL QM QN QO QP QQ QR QS QT QU QV QW QX QY QZ
RA RB RC RD RE RF RG RH RI RJ RK RL RM RN RO RP RQ RR RS RT RU RV RW RX RY RZ
SA SB SC SD SE SF SG SH SI SJ SK SL SM SN SO SP SQ SR SS ST SU SV SW SX SY SZ
TA TB TC TD TE TF TG TH TI TJ TK TL TM TN TO TP TQ TR TS TT TU TV TW TX TY TZ
UA UB UC UD UE UF UG UH UI UJ UK UL UM UN UO UP UQ UR US UT UU UV UW UX UY UZ
VA VB VC VD VE VF VG VH VI VJ VK VL VM VN VO VP VQ VR VS VT VU VV VW VX VY VZ
WA WB WC WD WE WF WG WH WI WJ WK WL WM WN WO WP WQ WR WS WT WU WV WW WX WY WZ
XA XB XC XD XE XF XG XH XI XJ XK XL XM XN XO XP XQ XR XS XT XU XV XW XX XY XZ
YA YB YC YD YE YF YG YH YI YJ YK YL YM YN YO YP YQ YR YS YT YU YV YW YX YY YZ
ZA ZB ZC ZD ZE ZF ZG ZH ZI ZJ ZK ZL ZM ZN ZO ZP ZQ ZR ZS ZT ZU ZV ZW ZX ZY ZZ
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/iso_3166-1_decoding_table.html#AW
Officially assigned code element Code element may be used without restriction
User-assigned code element Code element may be used without restriction
Exceptionally reserved code element Code element may be used but restrictions may apply
Transitionally reserved code element Code element deleted from ISO 3166-1; stop using ASAP
Indeterminately reserved code element Code element must not be used in ISO 3166-1
Code elements not used at present stage Code element must not be used in ISO 3166-1
Un-assigned code elements Code element free for assignment (by ISO 3166/MA only!)
These are reserved for local extensions. Use them when you
need a new code!
20Taxonomy Strategies LLC The business of organized information
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
21Taxonomy Strategies LLC The business of organized information
Usual and unusual requirements for handling country names
One client needed to maintain multiple country lists:
ISO 3166 used in most systems
Maintained a separate editorial style list for correspondence and reports
Still other lists were used for statistical information on country subdivisions and multi-country regions
Organization maintained a variety of historical information on countries and regions:
Effective dates for codes were needed (note – dates were for codes within a system, not for the countries)
Mappings from old countries to successors were also needed
Country Alpha-2 Start Date End Date
Bosnia and Herzegovina
1992
Czech Republic CZ 1993-06-15
Czechoslovakia CS 1974 1993-06-15
Yugoslavia YU 1974 2003
USSR 1974 1992-08-30
Zaire ZA 1974 1997-07-14
Congo, Dem. Rep. of CD 1997-07-14
3166 Short Name
Redbook Country Name
Redbook Full Form
Redbook Short Form
STA Code
Afghanistan Afghanistan, Islamic State of
Afghanistan, I.S. of
512
Åland Islands
not in Redbook
Albania Albania 914
Aruba Aruba Kingdom of the Netherlands-Aruba
314
… … … … …
22Taxonomy Strategies LLC The business of organized information
Problems when mapping between location terminologies
ISO Code
ISO OfficialShort Name ISO Full Names
Redbook Country Name
Redbook Full Form
STA Name (60 chars) Issues
Missing entities not listed in any of the recommended country lists. (e.g. The Azores, Kosovo)
CIV CÔTE D'IVOIRE Republic of Côte d'Ivoire
Côte d’Ivoire - Côte d'Ivoire Use of accents in Country names.
BIH BOSNIA AND HERZEGOVINA
- Bosnia and Herzegovina
- Bosnia & Herzegovina
Inconsistent use of conjunctions special characters ('and' or ampersand ‘&’)
TLS TIMOR-LESTE Democratic Republic of Timor-Leste
Timor-Leste Democratic Republic of Timor-Leste
Timor-Leste Direct order of official country name does not alphabetize where users expect to find it.
HKG HONG KONG Hong Kong Special Administrative Region of China
China,P.R.: Hong Kong
China,P.R.: Hong Kong SRA
China,P.R.: Hong Kong
Variation between ISO and company practices.
MKD MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF
The former Yugoslav Republic of Macedonia
Macedonia, former Yugoslav Republic of
- Macedonia, FYR Long names are more frequently abbreviated.
PSE PALESTINIAN TERRITORY, OCCUPIED
Occupied Palestinian Territory
West Bank and Gaza
- West Bank and Gaza
Unclear what the correct form of name is. Note: Redbook name is from front matter, not table.
KNA SAINT KITTS AND NEVIS
- St. Kitts and Nevis
- St. Kitts and Nevis
ISO spells out “Saint” but company uses abbreviation.
VNM VIET NAM Socialist Republic of Viet Nam
Vietnam - Vietnam Spelling and name order variations between ISO and company
23Taxonomy Strategies LLC The business of organized information
Published Facets
Consuming Applications
IntranetSearch
’’
Web CMS
Archives
ERMS
Custodians
Notifications
Change Requests & Responses
ISO3166-1
Other External
ERP
Other Internal
Vocabulary Management
System
Other Controlled
Items
…
’’
Intranet Nav.
DAM
…
Enterprise taxonomy governance environment
Taxonomy Governance Environment
CVs
2: Team decides when to update facets within Taxonomy
3: Team adds value via mappings, translations, synonyms, training materials, etc.
1: External vocabularies change on their own schedule, with some advance notice.
4: Updated versions of facets published to consuming applications
CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy.
24Taxonomy Strategies LLC The business of organized information
The client defined a process for country vocabulary changes
The different vocabularies had different processes.
Custodians of the different vocabularies communicate so that if one changes, the others know about it.
Submit Change Request
Delegate to Other
Custodian
Inform Requester
Fast-track from ED?
Mark as IN-PROCESS
SEC drafts circular, sends
to ED
ED approval?
Y
YC
R C C
C
Review Request byRejection Criteria
Violates Criteria?
Wrong CV?
Y
Y
Marked as REQUESTED
Update CV and Mapping
C
F,O
EV
Mark as PROVISIONAL
C
V
Mark as APPROVED
Exit
C
V
Updates PublishedV
V
V
Marked as DENIED
C
Send to BoardInform
Requester
E
CMarked as DENIEDV
EDV
R – Requester V –VM System C – Custodian E – Email from VM SystemED – Exec. Dir. F – Forms Interface O – Other (Phone, Fax, etc.)
Submit Change Request
Delegate to Other
Custodian
Inform Requester
Fast-track from ED?
Mark as IN-PROCESS
SEC drafts circular, sends
to ED
ED approval?
Y
YC
R C C
C
Review Request byRejection Criteria
Violates Criteria?
Wrong CV?
Y
Y
Review Request byRejection Criteria
Violates Criteria?
Wrong CV?
Y
Y
Marked as REQUESTED
Update CV and Mapping
C
F,O
EV
Mark as PROVISIONAL
C
V
Mark as PROVISIONAL
C
V
Mark as APPROVED
Exit
C
V
Updates PublishedV
Mark as APPROVED
Exit
C
V
Updates PublishedV
V
V
Marked as DENIED
C
Send to BoardInform
Requester
E
CMarked as DENIEDV
Inform Requester
E
CMarked as DENIEDV
CMarked as DENIEDV
EDV
R – Requester V –VM System C – Custodian E – Email from VM SystemED – Exec. Dir. F – Forms Interface O – Other (Phone, Fax, etc.)
Notify Board
Notify Board
– Indicates Role(s) – Indicates Tool(s)
25Taxonomy Strategies LLC The business of organized information
Conclusion
Location terminologies are commonly used They fulfill many different purposes
Keeping up-to-date is an ongoing effort The rate of change is low, but ongoing Codes will be reassigned at times, get ready for it
The issues can be complex Anything out of the ordinary will not be well-served by
off-the-shelf software
Most organizations have a proliferation of pseudo-3166 vocabularies. Start there to get things under control.
Strategies LLCTaxonomy
November 7, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Questions?
Ron Daniel
925-368-8371