Location Terminologies

Download Report

Transcript Location Terminologies

Taxonomy Strategies LLC
Location Terminologies
ASIS&T Annual Meeting
Austin, TX
November 7, 2006
November 7, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
Taxonomy Strategies LLC
The business of organized information
2
Who we are: Ron Daniel, Jr.
 Over 15 years in the business of metadata & automatic
classification
 Principal, Taxonomy Strategies
 Standards Architect, Interwoven
 Senior Information Scientist, Metacode Technologies (acquired by
Interwoven, November 2000)
 Technical Staff Member, Los Alamos National Laboratory
 Doctoral and post-doctoral research in pattern recognition
 Metadata and taxonomies community leadership
 Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group
 Acting chair, XML Linking working group
 Member, RDF working groups
 Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC
The business of organized information
3
Recent & current projects
Taxonomy Strategies LLC
The business of organized information
4
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
Taxonomy Strategies LLC
The business of organized information
5
8 Common Taxonomy Facets
Facet
Definition
Potential Sources
Organization
Organizational structure.
FIPS 95-2, U.S. Government Manual, Your
organizational structure, etc.
Content Type
Structured list of the various types of
content being managed or used.
DC Types, AGLS Document Type, AAT Information
Forms , Your records management policy, etc.
Industry
Broad market categories such as lines
of business, life events, or industry
codes.
FIPS 66, SIC, NAICS, Your market segments, etc.
Location
Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div,
US Postal Service, Your sales regions, etc.
Function
Functions and processes performed to
accomplish mission and goals.
FEA Business Reference Model, Enterprise
Ontology, AAT Functions, Your business functions,
etc.
Topic
Business topics relevant to your
mission & goals.
Federal Register Thesaurus, NAL Agricultural
Thesaurus, LCSH, Your research areas, etc.
Audience
Subset of constituents to whom a
piece of content is directed or intended
to be used.
GEM, ERIC Thesaurus, IEEE LOM, Your psychographics or personas, etc.
Products &
Services
Names of products/programs &
services.
ERP system, UNSPSC, Your products and services,
etc.
Taxonomy Strategies LLC
The business of organized information
6
Potential facets in the petroleum industry
Moderately related
to location
Strongly related
to location
Wells
Maint.
Disciplines
Facilities
Production
Content
Types
Community
Standard
Taxonomy Strategies LLC
Lease Mgmt
Should be part of
community standard
The business of organized information
Orgs.
Process
Mgmt
Reserves
Human
Resources
E&P
Lifecycle
Hydro
carbon
System
Geologic
Age
Basins,
Reservoirs
& Fields
Locations
Company
Org
Company Facets
7
Location names serve as surrogates for
other things
 Company divisions
 Company facilities
 Regulatory regimes
 Currency regions
 Product marketing areas
 Sales territories
 Customer locations
Taxonomy Strategies LLC
The business of organized information
8
What is a good taxonomy?
 A means to an end, and not the end in itself.
 Not perfect, but it does the job it is supposed to do—such
as improving search and navigation.
 Improved over time, and maintained.
 Incremental, extensible process that identifies and
enables owners, and engages stakeholders.
 Quick implementation that provides measurable results as
quickly as possible.
 Not monolithic—has separately maintainable facets.
 Re-uses existing IP as much as possible.
Taxonomy Strategies LLC
The business of organized information
9
Location names are used as part of
different purposes
 Typical correspondence and shipping
 “Libya”
 “South Korea”
 Official correspondence with government ministers
 “Great Socialist People's Libyan Arab Jamahiriya”
 “Republic of Korea”
 Corporate division of responsibility
 “Western Region” – does that include Montana?
Taxonomy Strategies LLC
The business of organized information
10
Location terminologies may be used to organize
different collections of information
ABC Computers.com
Content
Type
Competency
Industry
Service
Award
Case Study
Contract &
Warranty
Demo
Magazine
News & Event
Product
Information
Services
Solution
Specification
Technical Note
Tool
Training
White Paper
Other Content
Type
Business &
Finance
Interpersonal
Development
IT Professionals
Technical
Training
IT Professionals
Training &
Certification
PC Productivity
Personal
Computing
Proficiency
Banking &
Finance
Communications
E-Business
Education
Government
Healthcare
Hospitality
Manufacturing
Petrochemocals
Retail /
Wholesale
Technology
Transportation
Other
Industries
Assessment,
Design &
Implementati
on
Deployment
Enterprise
Support
Client Support
Managed
Lifecycle
Asset
Recovery &
Recycling
Training
Taxonomy Strategies LLC
The business of organized information
Product
Family
Desktops
MP3 Players
Monitors
Networking
Notebooks
Printers
Projectors
Servers
Services
Storage
Televisions
Other Brands
Audience
Line of
Business
RegionCountry
All
Business
Employee
Education
Gaming
Enthusiast
Home
Investor
Job Seeker
Media
Partner
Shopper
First Time
Experienced
Advanced
Supplier
All
Home & Home
Office
Gaming
Government,
Education &
Healthcare
Medium &
Large
Business
Small Business
All
Asia-Pacific
Canada
EMEA
Japan
Latin America &
Caribbean
United States
11
Location terminologies may be used to
limit search results
 Category
 Company
 City
 State
 Salary
Taxonomy Strategies LLC
The business of organized information
12
Problems with location vocabularies
 Placenames change over time
 Codes may be reused over
time
 Familiarity leads to proliferation
 Many versions of pseudo-
standard lists
 Guessing what the standard will
become (e.g. KOS as a code for
Kosovo)
 Approximate alignment
between placenames and
business functions leads to
errors when mapping data from
one purpose to another
 Geopolitical names get applied
to sales territories with different
company history and importance
(e.g. Japan vs. Asia-Pac)
Taxonomy Strategies LLC
The business of organized information
 Natural messiness of human
affairs
 States vs. Provinces vs.
Protectorates, Territories,
Possessions, Tribal territories,…
 Disputed territories (Palestine,
Kashmir, Taiwan, Kurdistan)
 Proto-states (Kosovo,
Somaliland)
 Complexity tradeoff in software
 Very few invariant properties of
countries and their groupings
 Passions
 Boycotts and death threats have
been received by people who do
or do not list particular places in
their lists of ‘countries’
13
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
Taxonomy Strategies LLC
The business of organized information
14
ISO 3166 is a fundamental vocabulary for
dealing with locations
 UPS maintains a central World Wide Code Repository (WWCR)
to store the metadata used throughout the corporation
 Based on the data identified in the enterprise data models
 They also have a Corporate Code Table Database, populated
via extract files from the WWCR.
 These tables contain the complete list of standardized corporate
code values for each code type.
 Country codes are ISO 3166-1, with local extensions obeying ISO
restrictions.
 The data modeler for the Corporate Code Table Database is the
primary contact from UPS to ISO and the UN with respect to codes
for countries.
Source: Barbara LaRobardier, “Taxonomy and Metadata at United Parcel Service (UPS): World Wide
Code Repository and Corporate Code Tables”; Semantic Technologies Conference, San Francisco, 2005.
Taxonomy Strategies LLC
The business of organized information
15
ISO 3166 is the world’s most widely-used
list of country names
 3166 is divided into 3
lists:
 3166-1: Countries
 3166-2: Sub-regions
 3166-3: Changes
 The lists contain three
different codes for the
same places:
 alpha-2
 alpha-3
 numeric-3
 The source for the list is
the UN Statistics
Division
Taxonomy Strategies LLC
The business of organized information
Country or area
name
numeric 3
alpha 2
alpha 3
Afghanistan
004
AF
AFG
Åland Islands
248
AX
ALA
Albania
008
AL
ALB
Algeria
012
DZ
DZA
American
Samoa
016
AS
ASM
Andorra
020
AD
AND
716
ZW
ZWE
…
Zimbabwe
16
ISO 3166 codes change, and are even reassigned!
Country
alpha-2
Assigned
Removed
CZECHOSLOVAKIA
CS
1974*
1993
SERBIA AND MONTENEGRO
CS
2003-07-23
2006
SERBIA
RS
2006-09-26
current
MONTENEGRO
ME
2006-09-26
current
* ISO 3166 first published in 1974. Czechoslovakia dates from 1918.
Taxonomy Strategies LLC
The business of organized information
17
What is the code for Kosovo?
 No code currently exists for Kosovo, but “KS” is
unassigned. Should we use it in the expectation that
eventually it will be assigned?
 No.
 To quote from ISO 3166-1:1997, clause 8.1.3 Userassigned code elements:
"If users need code elements to represent country
names not included in this part of ISO 3166, the
series of letters AA, QM to QZ, XA to XZ, and ZZ,
and the series AAA to AAZ, QMA to QZZ, XAA to
XZZ, and ZZA to ZZZ respectively and the series of
numbers 900 to 999 are available."
Taxonomy Strategies LLC
The business of organized information
18
There are many categories of ISO 3166-1
alpha-2 codes
AB
AC
AD
AE
AF
AG
AH
AI
AJ
AK
AL
AM
AN
AO
AP
AQ
AR
AS
AT
AU
AV
AW
AX
AY
AZ
BA
BB
BC
BD
BE
BF
BG
BH
BI
BJ
BK
BL
BM
BN
BO
BP
BQ
BR
BS
BT
BU
BV
BW
BX
BY
BZ
CA
CB
CC
CD
CE
CF
CG
CH
CI
CJ
CK
CL
CM
CN
CO
CP
CQ
CR
CS
CT
CU
CV
CW
CX
CY
CZ
DA
DB
DC
DD
DE
DF
DG
DH
DI
DJ
DK
DL
DM
DN
DO
DP
DQ
DR
DS
DT
DU
DV
DW
DX
DY
DZ
EA
EB
EC
ED
EE
EF
EG
EH
EI
EJ
EK
EL
EM
EN
EO
EP
EQ
ER
ES
ET
EU
EV
EW
EX
EY
EZ
FA
FB
FC
FD
FE
FF
FG
FH
FI
FJ
FK
FL
FM
FN
FO
FP
FQ
FR
FS
FT
FU
FV
FW
FX
FY
FZ
Officially assigned code
element
GA
GB
GC
GD
GE
GF
GG
GP
GQ
GR
GS
GT
GU
GV
GW
GX
GY
GZ
AA
HA
HB
HC
HD
HE
HF
HG
IA
IB
IC
ID
IE
IF
IG
User-assigned code element
Exceptionally reserved code element
JA
JB
JC
JD
JE
JF
JG
KA
KB
KC
KD
KE
KF
KG
Transitionally reserved
code
element
LA
LB
LC
LD
MA
MB
MC
MD
Indeterminately reserved
code NC
element
NA
NB
ND
LE
LF
LG
ME
MF
MG
NE
NF
NG
OA
OB
OC
OD
OE
OF
OG
PA
PB
PC
PD
PE
PF
PG
Code elements not used at present stage
Un-assigned code elements
QA
QB
QC
QD
QE
QF
QG
RA
RB
RC
RD
RE
RF
SA
SB
SC
SD
SE
TA
TB
TC
TD
TE
UA
UB
UC
UD
VA
VB
VC
WA
WB
XA
Code
element
mayGKbe used
without
restriction
GH
GI
GJ
GL
GM
GN
GO
HH
HI
HJ
HK
HL
HM
HN
HO
HP
HQ
HR
HS
HT
HU
HV
HW
HX
HY
HZ
IH
II
IJ
IK
IL
IM
IN
IO
IP
IQ
IR
IS
IT
IU
IV
IW
IX
IY
IZ
Code element may be used without restriction
Code element may be used but restrictions may apply
JH
JI
JJ
JK
JL
JM
JN
JO
JP
JQ
JR
JS
JT
JU
JV
JW
JX
JY
JZ
KH
KI
KJ
KK
KL
KM
KN
KO
KP
KQ
KR
KS
KT
KU
KV
KW
KX
KY
KZ
Code
element
from
ISO
stop
using
ASAP
LH
LI
LJ deleted
LK
LL
LM 3166-1;
LN
LO
LP
LQ
MH
MI
MJ
MK
ML
MM
MN
MO
Code
element
must
not be
used
in ISO
3166-1
NH
NI
NJ
NK
NL
NM
NN
NO
LR
LS
LT
LU
LV
LW
LX
LY
LZ
MP
MQ
MR
MS
MT
MU
MV
MW
MX
MY
MZ
NP
NQ
NR
NS
NT
NU
NV
NW
NX
NY
NZ
OH
OI
OJ
OK
OL
OM
ON
OO
OP
OQ
OR
OS
OT
OU
OV
OW
OX
OY
OZ
PH
PI
PJ
PK
PL
PM
PN
PO
PP
PQ
PR
PS
PT
PU
PV
PW
PX
PY
PZ
Code element must not be used in ISO 3166-1
Code element free for assignment (by ISO 3166/MA only!)
QH
QI
QJ
QK
QL
QM
QN
QO
QP
QQ
QR
QS
QT
QU
QV
QW
QX
QY
QZ
RG
RH
RI
RJ
RK
RL
RM
RN
RO
RP
RQ
RR
RS
RT
RU
RV
RW
RX
RY
RZ
SF
SG
SH
SI
SJ
SK
SL
SM
SN
SO
SP
SQ
SR
SS
ST
SU
SV
SW
SX
SY
SZ
TF
TG
TH
TI
TJ
TK
TL
TM
TN
TO
TP
TQ
TR
TS
TT
TU
TV
TW
TX
TY
TZ
UE
UF
UG
UH
UI
UJ
UK
UL
UM
UN
UO
UP
UQ
UR
US
UT
UU
UV
UW
UX
UY
UZ
VD
VE
VF
VG
VH
VI
VJ
VK
VL
VM
VN
VO
VP
VQ
VR
VS
VT
VU
VV
VW
VX
VY
VZ
WC
WD
WE
WF
WG
WH
WI
WJ
WK
WL
WM
WN
WO
WP
WQ
WR
WS
WT
WU
WV
WW
WX
WY
WZ
XB
XC
XD
XE
XF
XG
XH
XI
XJ
XK
XL
XM
XN
XO
XP
XQ
XR
XS
XT
XU
XV
XW
XX
XY
XZ
YA
YB
YC
YD
YE
YF
YG
YH
YI
YJ
YK
YL
YM
YN
YO
YP
YQ
YR
YS
YT
YU
YV
YW
YX
YY
YZ
ZA
ZB
ZC
ZD
ZE
ZF
ZG
ZH
ZI
ZJ
ZK
ZL
ZM
ZN
ZO
ZP
ZQ
ZR
ZS
ZT
ZU
ZV
ZW
ZX
ZY
ZZ
These are reserved for local extensions.
Use them when you need a new code!
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/iso_3166-1_decoding_table.html#AW
Taxonomy Strategies LLC
The business of organized information
19
Agenda
Who we are
Overview
Using ISO 3166
Accommodating special needs
Taxonomy Strategies LLC
The business of organized information
20
Usual and unusual requirements for
handling country names
 One client needed to maintain
multiple country lists:
 ISO 3166 used in most
systems
 Maintained a separate editorial
style list for correspondence
and reports
 Still other lists were used for
statistical information on
country subdivisions and multicountry regions
 Organization maintained a
variety of historical information
on countries and regions:
 Effective dates for codes were
needed (note – dates were for
codes within a system, not for
the countries)
 Mappings from old countries to
successors were also needed
Taxonomy Strategies LLC
The business of organized information
3166 Short
Name
Redbook
Country Name
Redbook Full
Form
Redbook
Short Form
STA
Code
Afghanistan,
I.S. of
512
Afghanistan
Afghanistan,
Islamic State of
Åland
Islands
not in Redbook
Albania
Albania
Aruba
Aruba
Kingdom of the
NetherlandsAruba
…
…
…
…
…
Alpha-2
Start Date
End Date
Country
914
Bosnia and
Herzegovina
314
1992
Czech Republic
CZ
1993-06-15
Czechoslovakia
CS
1974
1993-06-15
Yugoslavia
YU
1974
2003
1974
1992-08-30
1997-07-14
USSR
Zaire
ZA
1974
Congo, Dem. Rep. of
CD
1997-07-14
21
Problems when mapping between location
terminologies
ISO
Code
ISO Official
Short Name
ISO Full Names
Redbook
Country Name
Redbook Full
Form
STA Name (60
chars)
Issues
Missing entities not listed in any of the recommended
country lists. (e.g. The Azores, Kosovo)
Republic of Côte
d'Ivoire
Côte d’Ivoire
-
Côte d'Ivoire
Use of accents in Country names.
Bosnia and
Herzegovina
-
Bosnia &
Herzegovina
Inconsistent use of conjunctions special characters
('and' or ampersand ‘&’)
Democratic
Republic of
Timor-Leste
Timor-Leste
Direct order of official country name does not
alphabetize where users expect to find it.
China,P.R.:
Hong Kong
SRA
China,P.R.: Hong
Kong
Variation between ISO and company practices.
-
Macedonia, FYR
Long names are more frequently abbreviated.
West Bank and
Gaza
-
West Bank and
Gaza
Unclear what the correct form of name is. Note:
Redbook name is from front matter, not table.
St. Kitts and
Nevis
-
St. Kitts and
Nevis
ISO spells out “Saint” but company uses abbreviation.
Vietnam
-
Vietnam
Spelling and name order variations between ISO and
company
CIV
CÔTE D'IVOIRE
BIH
BOSNIA AND
HERZEGOVINA
TLS
TIMOR-LESTE
Democratic
Republic of
Timor-Leste
Timor-Leste
HKG
HONG KONG
Hong Kong
Special
Administrative
Region of China
China,P.R.:
Hong Kong
MKD
MACEDONIA, THE
FORMER
YUGOSLAV
REPUBLIC OF
The former
Yugoslav
Republic of
Macedonia
Macedonia,
former Yugoslav
Republic of
PSE
PALESTINIAN
TERRITORY,
OCCUPIED
Occupied
Palestinian
Territory
KNA
SAINT KITTS AND
NEVIS
VNM
VIET NAM
Taxonomy Strategies LLC
-
Socialist Republic
of Viet Nam
The business of organized information
22
Enterprise taxonomy governance
environment
Change Requests
& Responses
1: External vocabularies
change on their own
schedule, with some
advance notice.
ISO
3166-1
Other
External
2: Team decides when
to update facets
within Taxonomy
Archives
Intranet
Search
Vocabulary
Management
System
ERMS
’
Notifications
ERP
Custodians
Other
Internal
Taxonomy Strategies LLC
Consuming
Applications
Web CMS
CVs
CV (Controlled Vocabulary) –
The list of values for one
facet in the Taxonomy.
Published
Facets
3: Team adds value via
mappings, translations,
synonyms, training
materials, etc.
Other
Controlled
Items
Intranet
Nav.
DAM
…
…
’
4: Updated versions of
facets published to
consuming
Taxonomy Governance
applications
Environment
The business of organized information
23
The client defined a process for country
vocabulary changes
 The different
vocabularies
had different
processes.
R
F,O
Submit Change
Request
V
Marked as
REQUESTED
Taxonomy Strategies LLC
C
V
SEC drafts
circular, sends
to ED
Mark as INPROCESS
Y
ED
ED approval?
Review Request by
Rejection Criteria
C
 Custodians of
the different
vocabularies
communicate so
that if one
changes, the
others know
about it.
C
Fast-track
from ED?
V
V
Delegate to
Other
Custodian
C
C
Marked as
DENIED
Inform
Requester
E
Y
Y
C
V
Wrong
CV?
E
Y
Marked as
DENIED
Inform
Requester
C
V
Mark as
PROVISIONAL
C
Send to
Board
Notify
Board
C
Violates
Criteria?
Notify Board
Update CV and
Mapping
V
V
Mark as
APPROVED
Updates
Published
Exit
R – Requester
ED – Exec. Dir.
The business of organized information
V –VM System
F – Forms Interface
 – Indicates Role(s)
C – Custodian
E – Email from VM System
O – Other (Phone, Fax, etc.)
 – Indicates Tool(s)
24
Conclusion
 Location terminologies are commonly used
 They fulfill many different purposes
 Keeping up-to-date is an ongoing effort
 The rate of change is low, but ongoing
 The issues can be complex
 Anything out of the ordinary will not be well-served by
off-the-shelf software
 Most organizations have a proliferation of pseudo3166 vocabularies. Start there to get things under
control.
Taxonomy Strategies LLC
The business of organized information
25
Taxonomy Strategies LLC
Questions?
Ron Daniel
925-368-8371
[email protected]
November 7, 2006
Copyright 2006 Taxonomy Strategies LLC. All rights reserved.