Transcript Document

GIS Data: Types and Structures
Geographic Data:
Concepts, File Formats, Topology
Anatomy of Spatial File Formats
shapefile, geodatabase, coverage
Coordinate Systems and Projections
Spring 2008
GISC 6382 Applied GIS
UT-Dallas Briggs
1
Geographic Data: Classic Approach
• Two components of geographic data
– Spatial Data: representations of geographic features
associated with real-world locations
• Stored in files and managed by the GIS software
– Attribute Data: descriptive information
• stored in tables and managed by an RDBMS (relational database
managemnt system)
(originally ESRI’s proprietray Info system, but now any standard
commercial system such as Access, Oracle, SQL Server )
• Two formats for geographic data
– Raster data
• Rectangular array of cells or pixel
– Vector data: three feature types
– points/nodes
– lines/arcs
– areas/polygons
(single x,y locations)
(linear string of x,y locations)
(closed string of x,y locations)
GISC 6382 Applied GIS
UT-Dallas Briggs
2
Geographic Data: Another (object-oriented) View
Object View
• The real world is a series of entities located in space (houses, poles, soil types)
– Some locations have values, others are null
• An object is a digital representation of an entity, with three types
• Point objects
• Line objects
• Area objects
– The same entity can be represented at different scales by different object types: the
multi-representation problem
– Behavior can be associated with objects thus they can change over time
Field View
• Real world properties vary continuously over space;
– every place has a value
– represent as raster data or as vector data in a TIN (triangulated irregular network
Object versus Field View
• Not as distinct as first appears
• If the value is a categorical or integer variable, then places with the same value
(e.g. soil type) can be grouped--which give us area objects!
• The world is how we decide to look at it!
From O’Sullivan and Unwin
GISC 6382 Applied GIS
UT-Dallas Briggs
3
File Formats for Vector Spatial Data
Coverage: vector data format introduced with ArcInfo in 1981
• multiple physical files (12 or so) in a folder
• proprietary: no published specs & ArcInfo required for changes
• Can be “exported” to a single E00 (E-zero-zero) file for transfer
Shape ‘file’: vector data format introduced with ArcView in 1993
• comprises several (at least 3) physical disk files (with extension of
.shp, .shx, .dbf), all of which must be present
• openly published specs so other vendors can create shape files
Geodatabase: new format introduced with ArcGIS 8.0 in 2000
• Proprietary, next generation spatial data model
• Can be saved in several different physical formats (as of 9.2)
– File based, MS Access based, commercial DBMS based
– Versions available which support multi-user editing and replication
Shapefiles are the simplest and most commonly used format. Used them in
GIS Fund. Will use Geodatabases in Applied GIS (and some coverages).
GISC 6382 Applied GIS
UT-Dallas Briggs
4
Database
Environments
Old Model:
Geo-relational Database
•
•
•
•
•
•
the old “classic” environment
coverages in proprietary INFO
database
Raster data (in GRIDS) and 3-D
data (in TINS) kept in separate,
proprietary files
shapefiles use openly published
dbIV database (readable by
Excel)
Based on points, lines, polygon
model
Attribute data kept in separate
databases and must be combined
with coverages or shapefiles for
spatial applications
GIS
User
SDE
db
New Model: Geodatabase
Replacement for coverages, with support for:
• Simple features: points, lines polygons
• Complex features: real world entities modeled as
objects with properties, behavior, rules, &
relationships
Three Formats (as of 9.2):
MS Access-based Personal Geodatabase (8.0>)
• Single-user editing, multiple read-only users
• Stored as one .mdb (Access) file
• Max 2GB total & 250,000 features per layer
(effective max is 250-500MB)
File-based Geodatabase (9.2>)
• Single-user editor, many read-only users
• Faster and more efficient than personal gdb.
• Unix and Microsoft supported
• Max 1 TB (256 TB for raster)
SDE-based Geodatabase
• Personal (4), Workgroup (10) and Enterprise (??)
versions
• Multi-user simultaneous editing via versioning and
long transactions
• uses standard db: ORACLE, SQL Server, etc
• Attribute and spatial data in same database
GIS Data Models
File-based and “Databased”
Geodatabase
Features
Workspace
Coverages
Rules
Tins
Images
Relationships
Images
Shapes
Grids
Grids
Tables
Tables
One Repository
Source: ESRI, Inc.
GISC 6382 Applied GIS
UT-Dallas Briggs
6
Concept of Topology
• Topology distinguishes GIS data models from non-topological
data models supported by many CAD, mapping and graphics
systems
• Topology refers to knowledge about relative spatial
positioning of features.
– knowledge about how features are connected and which features are
adjacent to each other.
• Can be viewed as a mathematical procedure that determines
spatial relationships and properties, including:
– The three Cs
• Connectivity (US 75 connects to IH 45)
• Congruency--same location (Red River & TX/OK border)
• Contiguity--adjacency or “next door” (TX & OK)
– Lengths of arcs and the areas of polygons
GISC 6382 Applied GIS
UT-Dallas Briggs
7
Topology Rules for Coverages:
the classic view of topology
– Each arc has a beginning node and an ending node this determines directionality. Directionality is
determined during digitizing.
• Actual direction is important only if your application requires
directional modeling.
– Arcs connect to other arcs at nodes
• Nodes must be present wherever arcs join or cross
– Connected arcs form polygon boundaries
• arc coordinates are stored only once because two adjacent
polygons share the common arc between them.
– Arcs have polygons on their left and right sides
The next three slides illustrate this
GISC 6382 Applied GIS
UT-Dallas Briggs
8
Topology Concept I: Arc-node topology
– Nodes are the end-points of arcs. Arc-node topology
keeps track of which arcs are connected to other arcs
through shared nodes X It defines length, direction,
and connectivity for arcs.
The from-node is an arc’s starting point; the to-node is
its ending point.
• They are determined as you digitize your data.
• You can see the from-node and to-node whenever you list
attribute records for a coverage containing lines.
• Arcs connect if they share a node.
GISC 6382 Applied GIS
UT-Dallas Briggs
9
Topology Concept II: Polygon-arc topology
– Polygon-arc topology expresses the relationship
between the arc features and the polygon features for
which the arcs create boundaries. It defines area and
adjacency. Arcs or a set of arcs that form a closed
figure define the area of a polygon. Two polygons are
adjacent if they share an arc. Polygons are stored as a
list of arcs to avoid redundancy.
GISC 6382 Applied GIS
UT-Dallas Briggs
10
Topology Concept III Left-right topology
– Left-right topology refers to contiguity -- how
polygons are associated with their neighboring
polygons. Each arc has a list of which polygons are on
the right side and which are on the left side.
Commands in Arc/INFO use this information to
determine from one polygon what the adjacent
polygons are:
1
5
4
2
3
GISC 6382 Applied GIS
6
7
UT-Dallas Briggs
11
Topology: Coverages v. Geodatabase v. Shapefiles
Coverages (classic view of topology)
• Topology is a property of the data itself
• Applying Topology potentially changes the data file (coverage) via Clean
(location of points) and Build (table structure) commands
• A single coverage may have multiple geographic data types (points and lines,
polygons and lines, but not points and polygons)
Geodatabase (new view introduced with ArcGIS 8.3)
• Topology is a set of rules selectively applied by the user ( 28 or so currently
defined)
• Does not alter the data file (feature class), unless user chooses to ‘fix’ violations
• Topology saved as a relationship class within a geodatabase feature dataset
• A feature class contains only one geographic data type (point or line or polygon),
but all can be related together by a topology relationship class providing they are
in the same feature dataset
Shapefiles
• share some similarities with coverages but are not fully topological
– May need to covert to coverages for some analyses.
Discuss topology for coverages later today and for geodatabases later in the course.12
GISC 6382 Applied GIS
UT-Dallas Briggs
Anatomy of Spatial File Formats
Shapefile
Geodatabase
Coverage
The following two diagrams show how geographic files appear in:
•ArcCatalog
•Windows Explorer
We will refer back to these as we discuss each of these file formats.
GISC 6382 Applied GIS
UT-Dallas Briggs
13
Spatial File Formats—example
ArcCatalog View
Personal Geodatabase
In a gdb, feature
class can have
Feature data set
only one feature
Feature class (feature type = polygon)
type.
Feature class (feature type = arc)
Coverage (= feature class)
A coverage can
Feature type (arc)
have multiple
feature typesFeature type (point)
now viewed as a
Feature type (polygon)
shortcoming.
Feature type (point)
Coverage (= feature class)
Feature type (arc)
Tracts feature class table
Feature type (point)
(attributes in columns)
Locator (table)
Raster
Shapefile
Shapefile
Features
(rows)
Feature ID
(key field)
Feature
type
Secondary or
Foreign key
Spatial File Formats: NT Explorer View
Info ‘master’ folder for AVCAT workspace
Tracts coverage
Trans coverage
Locator (table)
Personal Geodatabase
Raster
Tracts
shapefile
Trans
shapefile
GISC 6382 Applied GIS
UT-Dallas Briggs
15
Shapefiles
• openly published structure for spatial data (Coverages &
Geodatabases are proprietary)
– Partially an attempt (successfully!) by ESRI to make “their” format the
industry standard
• much simpler than coverages: rather than multiple
folders and files, three main files with same name (road)
but different extensions, e.g.
– road.shp
road.shx
road.dbf
• Attribute (feature) data stored in dBase (.dbf) file
– Can be edited in Excel (or other) but do not change the number of rows
– If you add columns, may need to change “refers to” definition via
Insert/Name/Define
• Files can be dragged, dropped, cut and pasted into other
folders -- providing the complete file set is moved.
GISC 6382 Applied GIS
UT-Dallas Briggs
16
Geodatabase (gdb) File Structure
GISC 6382 Applied GIS
UT-Dallas Briggs
17
Geodatabase (gdb)
Feature (vector) datasets
Spatial Reference
Object classes and subtypes
Feature Classes and subtypes
Relationship classes
Network Topology
Planar topology
Domains
Validation Rules
Raster Datasets
rasters
TIN (3-D) datasets
nodes, edges, faces
Locators
addresses
x,y locations
Zip codes place names
route locations
Anatomy of a Geodatabase
Geodatabases may contain: feature datasets,
raster datasets, TIN datasets, locators
Feature datasets contain vector data
All data in a single feature dataset share a
common spatial reference system
Similar Objects (e.g. Jane Blow, land owner) are
instances of object classes (e.g. land owners)
and have no spatial form.
Features and feature classes are spatial objects
(e.g. land parcels) which are similar and have
same spatial form (e.g. polygon)
Object (or feature) classes are the tables, and
objects (or features) are the rows of the table
Attributes are in the columns of the table
Subtypes are an alternative to multiple object (or
feature) classes (e.g. ‘concrete’, ‘asphalt’,
‘gravel’ road subtypes): think of subtype as
the most significant classification variable
(attribute) in the class table
Domains define permitted data values.
Topology is saved as a relationship between the
feature classes in the feature dataset.
Organizing Information: Classes
Object Classes
a set of non-spatial entities with similar characteristics e.g.
owners of property
Feature Classes
a set of spatial entities with similar characteristics e.g. property
parcels
Classes are represented in Tables which are physically stored in the
computer system in one or more files
Object or Feature class=Table
name
jane
joan
jim
jean
address
201 N. Hi
207 N Main
20 Elm
40 Oak
Attribute = column
dob
45
55
75
80
ssn
274-54-8910
234-81-7890
890-75-9876
x04-23-7890
Object or feature = row
Key Field = attribute which uniquely identifies each feature or object
GISC 6382 Applied GIS
UT-Dallas Briggs
19
Feature classes (FC), feature datasets (fds) and subtypes
• feature datasets (fds) are “spatial folders” which contain feature classes (spatial
data sets, such as land parcel file or street file)
– All feature classes in a fds must have the same spatial reference system, but may
have different topology (can have points and lines and polygons in same fds)
– Organize by thematic similarity e.g transportation
– If you wish to create topology, must be in same fds
– If they share geometry (street forms political boundary), should be in same fds
– If you create a geometric network (e.g. to model water flow) must be in same fds
– Security (read/write permissions, etc..) applied at the fds not the fc level!!!!
• feature classes are spatial data sets containing geographic features (e.g. land
parcels): a table with spatial data
– Data in FC must have same topology type (all points, all lines, all polygons)
• Water feature class with lakes (polygon) and streams (line) not permitted
– Minimizing the number of feature classes improves performance
• Use different feature classes only when attributes are significantly different
– Use roads feature class rather than freeway, arterial, streets feature classes
– Use subtype to differentiate freeway, arterials, streets (all have similar attributes)
• Subtypes are “subclasses” within a feature class that allow you to further
distinguish objects without creating new feature classes
– based on a single column’s values (must be integer or long integer)
– Same subtype has similar attribute values and behaviors
– Use where attributes are the same across all subtypes
Attribute Data Types: Geodatabase
• For every attribute field, must select a data type
• Each RDBMS stores data slightly differently
• ESRI generic data types will translate into closest RDBMS equivalent
• Values given below may differ with RDBMS used
ESRI Generic Data Types
String: text field. Be sure its length (number of characters), absolute or what you
specify, is sufficient to record longest data value.
Short Integer: (or integer) whole numbers (no decimal point) generally
+/-32,767 (2 bytes). OK for size of family, not OK for city size
Long Integer: (or long) only supports integers to +/- 2,147,483,647 (4 bytes)
Float: (or single) single precision floating point; again, be careful-- supports
decimal point but perhaps only 6 digits long with decimal moveable 34 places
(E34) (4 bytes)
Double: double precision floating point; the safest-- supports 12-15 digits with
decimal moveable up to 308 places (E308) (8 bytes)
Blob: binary long decimal for special programming applications
Note terminology:
• Precision: the total number of digits (before plus after decimal)
• Scale: number of digits after decimal
Domains and Defaults
Why Use Them?
• Data Integrity: prevents entry of invalid (“obviously wrong”) data values
• Data Efficiency: choose from a set of valid values rather than type in each time
Domains define a set of legal values for a field’s attributes
• Range domain: specifies a valid range of values for numerical attributes
– A water pipe must be between 1 and 100 inches wide
• Coded value domain: specifies a valid set of values for an attributes. Can apply
to any type of attributes
– Parcels can only have RES or VAC land use values
• Domains are defined as a geodatabase property & then applied as appropriate
– Multiple objects in the same database may use the same domain
– May be applied to an entire field (attribute), or separately by subtype
Defaults are values automatically assigned when a feature is created
– Of course, may be changed during data entry/edit process
– Again, may be applied to an entire field (attribute), or separately by subtype
Provide a way by which “business rules” can be incorporated.
Lesson: Geodatabases contain more than just “data”!
GISC 6382 Applied GIS
UT-Dallas Briggs
22
Relationships and Relationship
Classes
• Contain “associations” between feature classes, or
between individual features within a feature class
• A join between feature classes may be stored in the gdb as
a relationship e.g. join between “parcel” and “owners”
files
• Topology may be stored in a relationship class
e.g. information on which Red River segments also
form Texas/Oklahoma state line
• Geometric networks may be stored in a relationship class,
e.g. water lines associated with water valves
Lesson: Geodatabases contain more than just “data”!
GISC 6382 Applied GIS
UT-Dallas Briggs
23
Spatial Reference for a Geodatabase
All feature classes within a feature dataset must have the same spatial reference.
• Coordinate System
– Datum
– Geographic (lat/long) or projected?
– Projection parameters: central meridian, standard parallels, coordinate system origin
(false easting and northing)
– Measurement (map) units: dd (for lat/long), feet, meters, etc. (for proj.)
• Spatial domain
– The allowable coordinate range for the geographic coordinates
• X/Y Domain: MinX, MaxX, MinY, MaxY (horizontal extent)
• Z Domain: Min, Max (vertical extent)
• M Domain: Min, Max (other parameter, e.g. distance from river mouth ) (can differ within
feature data set)
– Once created, the spatial domain for feature dataset/class cannot be changed.
– Data outside extent will require a new feature dataset or standalone feature class.
• Precision
– Number of system storage units (SU) per one map measurement unit (MU)
• If precision is 1 and mu= 1 meter ( 1 SU per MU), cannot record values less than 1 meter
• If precision is 100 and mu= 1 meter (100 SUs per MU), can record values
to 1/100 = .01 = 1 cm
Coverage File Structure
GISC 6382 Applied GIS
UT-Dallas Briggs
25
The Coverage
• Digital version of a single map sheet layer and generally contains one type of
map feature such as streets, parcels, soils,
• Can contain both the coordinate/spatial data and the descriptive data for features
in a given geographic area.
• Additional attribute data about features (entities) can be stored in data base
tables using proprietary INFO relational data base system
– Allowed user to customize, organize and store substantial amounts of attribute data
and relate to spatial data
• Spatial data stored in indexed binary files for performance
• Full topological relationship information maintained: e.g. nodes that delimit a
line
– Permits sophisticated spatial analysis
• Coverage will be stored as a directory (folder) within a workspace. An identifier
(feature ID), a unique number for each feature in the coverage, ensures strict
correspondence between spatial and attribute data and between the various data
types (e.g. point feature ID also identifies the from or to node for an arc)
• Names for coverages are maximum 13 characters in length and cannot include
blanks or “special characters” (-,#, etc) other than under_score
GISC 6382 Applied GIS
UT-Dallas Briggs
26
Workspace
• Coverages must be stored in workspaces
• A workspace is the work area used during an
ARC/INFO session.
• Within the computer file system, the workspace is
a directory (folder) containing one or more
geographic data sets (e.g., coverage, tin, grid), a
local INFO database, and other supporting data.
• at a minimum it is a folder containing an INFO
subfolder (subdirectory)
• More than one user can read data from the same
workspace, however, it is strongly recommend
that only one user access a workspace for creating
or updating data.
GISC 6382 Applied GIS
UT-Dallas Briggs
27
Role of Features IDs
GISC 6382 Applied GIS
UT-Dallas Briggs
28
File Structure: Coverage
•
ArcInfo coverages consist of a series of files in two folders
– The INFO folder
– And a folder named the same as the coverage (e.g. water, soil)
– both are at the same directory level, which is called a “workspace”.
•
•
The INFO folder contains the feature attribute tables and related tables for all
coverage in that workspace.
Unfortunately, file names do not correspond to the names of files we work with!
GISC 6382 Applied GIS
UT-Dallas Briggs
29
Soil
POLYGON
G
T
ARC/INFO Spatial
Database Structure
(coverage)
INFO
ARC
Soil
AAT
TIC
BND
ETC.
PAT
These are the files we work with within ArcInfo:
--PAT: Polygon (or Point) attribute table
--AAT: Arc Attribute Table
--BND: bounding box
--TIC: tie coverage to real world location
Manipulating Coverage File Structure
• Ramifications of Coverage File Structure
– Do not drag and drop, cut, copy, paste, delete, or rename a
coverage from the NT explorer window. Any of these actions may
result in corruption (and loss) of not only the coverage
manipulated, but of the entire workspace.
– Must use ArcCatalog GUI application, or use ArcInfo Workstation
and issue Arc commands (see next slide for full list) within the
relevant workspace to work with coverages:
• Exceptions:
– Can drag and drop, cut, copy, paste, and delete the entire
workspace
– Can drag and drop, cut, copy, paste, and delete the interchange file
(e00) created by exporting the coverage
• Naming Coverages
– Names for coverages are maximum 13 characters in length and
cannot include blanks or “special characters” (-,#, etc) other than
under_score
GISC 6382 Applied GIS
UT-Dallas Briggs
31
Topology Maintenance for Coverages
• BUILD and CLEAN are the essential commands for
creating/maintaining topology and defining/updating feature
attribute tables for coverages
• You must BUILD topology after creation of a new coverage or
after modifications to the coverage such as in ArcEdit or after
changing the projection.
• You must CLEAN a coverage if the build command detects
errors. CLEAN will correct geometric relations (thus changes
spatial structure and/or point locations) using the parameters you
specify by
• adding nodes at intersections
• fixing dangling nodes
(if within dangle length)
• Combining nodes (if within fuzzy tolerance)
• BUILD constructs topology and defines and updates feature
attribute tables for a coverage. After creating a coverage you will
not have attribute tables unless topology is constructed.
GISC 6382 Applied GIS
UT-Dallas Briggs
32
Feature Attribute Tables
• When Arc/INFO constructs topology for a coverage, topological and
geometric properties are defined and stored in a file called the feature
attribute table.
• Depending on the feature type (e.g., point, arc, polygon), the contents of
feature attribute tables differ; however, they all have some characteristics
in common, including
– Feature attribute tables are INFO data files
– Each feature in a coverage occupies one record or row of data in the feature
attribute table
– Attribute data comprise columns (items) placed after the internally stored data
– You can have more than one feature attribute table for a coverage, e.g. arcs
and polygons define both streets and blocks.
– But you cannot have both points and polygons in the same coverage.
• Common feature attribute tables:
– Points - Point attribute table - PAT
– Arcs - Arc attribute table - AAT
– Polygons - Polygon attribute table - PAT
GISC 6382 Applied GIS
UT-Dallas Briggs
33
Data Stored for Points
• Coordinate information is stored in a LAB file. Each point is described
by a single x,y coordinate pair and an internal sequence number.
• A point attribute table (PAT) is created when topology is constructed for
a point coverage. The PAT is used to hold the attribute data about points.
There is one record (row) in the PAT for each point. The record is
related to the point by the sequence number.
• At a minimum the PAT contains four items
– AREA
Holds the area of a polygon. The value is 0 for points
– PERIMETER Holds the perimeter of a polygon. The value is 0 for points
– <cover># Arc/Info assigned unique internal sequence number of the
point feature in the LAB. Same as RECNO - do not tamper
with these values (sometimes called “pound id”)
– <cover>-id User assigned unique feature ID for each point (sometimes
called “dash id” or “user id”)
You can add items (columns) to the PAT after the <cover>-id item.
GISC 6382 Applied GIS
UT-Dallas Briggs
34
Data Stored for Arcs
•
•
•
•
Coordinate information is stored in an ARC file. Each arc is described in a
single record by a series of x,y coordinates, the from-node and to-node (for arcnode topology) and an internal sequence number
An arc attribute table (AAT) is created when topology is constructed for an arc
coverage. There is one record in AAT for each arc in the coverage. The record
is related to the feature (ARC file) by the internal sequence number.
At a minimum the AAT contains seven items
– FNODE# Internal sequence number of the from-node
– TNODE# Internal sequence number of the to-node
– LPOLY# Internal sequence number of the left polygon; set to 0 if the
coverage does not have polygon topology
– RPOLY# Internal sequence number of the right polygon; set to 0 if the
coverage does not have polygon topology
– LENGTH Length of the arc in coverage units
– <cover># Arc/Info assigned unique internal sequence number of the
arc in the ARC file. NEVER modify this value.
– <cover>-id User assigned unique feature ID for each arc
You can add items (attributes) to the PAT after the <cover>-id item.
GISC 6382 Applied GIS
UT-Dallas Briggs
35
Data Stored for Polygons (PAT)
• A polygon is defined by the arcs comprising its border and interior
islands, with polygon-arc topology stored in the PAL file, and arcnode/left-right topology stored in the ARC file, and a label point
inside the polygon stored in the LAB file. The label point id identifies
the polygon and is consistent between files.
• A polygon attribute table (PAT) is created when topology is
constructed for a polygon coverage. The PAT is used to hold the
attribute data about polygons. There is one record in the PAT for each
polygon. The record is related to the polygon by the label point id.
• At a minimum the PAT contains four items (same as point attrib table)
– AREA
Holds the area of a polygon, in coverage units.
– PERIMETER Holds the perimeter of a polygon. The value is 0 for points
– <cover># Arc/Info assigned unique internal sequence number of the
polygon feature in the LAB, ARC and PAL files
– <cover>-id User assigned unique feature ID for each point
You can add items (attributes) to the PAT after the <cover>-id item.
• The first polygon is always the universal polygon which represents the
coverage boundary.
GISC 6382 Applied GIS
UT-Dallas Briggs
36
Polygon data stored in PAT
GISC 6382 Applied GIS
UT-Dallas Briggs
37
Understanding Item Definitions
• An item (variable stored in a column) is defined by four
characteristics
– name - the name of the item, up to 16 characters in length
• e.g. cover-id, landuse, pop97, etc.
– type - the data types used to store values
•
•
•
•
•
•
I - integer (one byte per digit)
B - binary integer (requires less storage than I types)
C - character
N - floating point (e.g. decimal) number stored as one byte per digit
F - floating point binary number
D - date (e.g. yyyymmdd)
– width - the width of the item in bytes required for storage
• I - 1-16 bytes
B - either 2 or 4 bytes
• C - 1 to 320 characters
N - 1 to 16 digits
• F - 4 for single, 8 for double precision D - always 8 bytes
For F or N also provide the number of decimal places for real numbers
– Output width - the width of item values when displayed
GISC 6382 Applied GIS
UT-Dallas Briggs
38
A Example of Item Definitions
DATA VALUE
TYPE
ABBREV.
WIDTH
Main Street
Character
C
1 to 320
10/15/1990
Date
D
8
23675
Integer
I
1-16
347.22
Numeric
N
1-16
1344719822
Binary number
B
2 or 4
99378164.788
Binary floating
point
F
4 or 8
Maximum 4 byte binary is 2,147,483,648;
GISC 6382 Applied GIS
maximum 4 byte integer is 9,999
UT-Dallas Briggs
39
How to Convert Between File Formats:
multiple different ways!
In ArcCatalog:
• By importing from one format into another
– E.g import shapefile into geodatabase
• By exporting from one format into another
– E.g. export shapefile to a geodatabase
(Each achieves same thing. gdb must already exist)
In ArcMap:
• ArcMap can read and overlay all three data types
• Can use data/export to output and (thus potentially convert) to
a gdb feature class or a shapefile (but not a coverage)
– Note: will read coverages but cannot export to a coverage
In ArcToolbox:
• The greatest number of conversion options are available here.
GISC 6382 Applied GIS
UT-Dallas Briggs
40
Coordinate Systems
GISC 6382 Applied GIS
UT-Dallas Briggs
41
Coordinate Systems
• All spatial data is in a coordinate system
– You must know what it is!
• Often loosely, but incorrectly, called a map projection
• Coordinate System consists of two main things:
– Datum: normally NAD 27 or NAD 83
• The same location may have different coordinates just ‘cos of the datum
– Projection
• The transformation by which 3D lat/long is converted to 2D X/Y Cartesian values
– parameters normally required to describe the exact nature of the projection
– measurement units: usually feet or meters, also must always be specified
• A “geographic projection” uses lat/long values as X/Y Cartesian coordinates (not
recommended)
• Thus, for any a spatial data set, knowing simply the name of the
projection is not sufficient. Must also know:
– Datum
– Parameter(s)
– Measurement units
We often say map projection, when we really mean coordinate system!
Define versus Project: a critical distinction!
Define
• Informs the ArcGIS system of the data’s actual, current projection.
• Is essentially metadata. For shapefiles or coverages, saved in a .prj
file
• Does not change the actual data.
• Define it wrong, and all subsequent analyses or projections of that
data will be wrong!
• The existing projection is specified with Define command
Project
• Actually projects the data. Think of this as “reproject.”
• The data does change.
• The current projection (input) must already be known by the ArcGIS
system,
– That is, you have to do a Define first, if somebody has not already done it
• The desired projection (output) is specified with Project command.43
GISC 6382 Applied GIS
UT-Dallas Briggs
How to Project (and Define) Data:
multiple different ways!
In ArcToolbox
• Generally, use tools in ArcToolbox to project data
• Tools to DEFINE and PROJECT all data types are available
• Coordinate system must be “defined” before running Project
In ArcCatalog
• You can define the projections for shapefiles and coverages, but you cannot generally
reproject the original data without multiple steps.
• Providing that it is already defined, data brought into a new or existing geodatabase
feature dataset will automatically be reprojected to the coordinate system of the feature
dataset as it is saved there
– It can be exported in this (potentially) new projection, if desired.
• In effect, this “projects” the data.
In ArcMap
• Providing that it is already defined (projection system known to ArcGIS), data brought
into a data frame (whose coordinate system is also known) will be reprojected in
memory to the coordinate system of the frame for display.
– It can be exported in this (potentially) new projection, if desired.
• In effect, this “projects” the data.
– Note “double proviso:” known coordinate system for data inputted and for frame.
GISC 6382 Applied GIS
UT-Dallas Briggs
44
Warning!
• Failure to correctly deal with datums and
projection is the single major source of
problems in GIS!
• Assuming that “the software will take care
of it” is an invitation for eventual disaster!
GISC 6382 Applied GIS
UT-Dallas Briggs
45
Appendix
GISC 6382 Applied GIS
UT-Dallas Briggs
46
ESRI Vector Definitions:
Primitives
•
label point: a point defined by a
single pair of x,y co-ordinates
–
–
•
arc: line defined by ordered set of
x,y coordinate pairs
–
•
•
•
point feature (tree, airport)
polygon User-ID
may be straight or curved
vertices: points on an arc, which
are not nodes; used to define
curves
node: endpoints of an arc, or
intersection of two arcs, including
features at the intersection (e.g.
stop lights)
polygon: an area defined by the
arcs making up its boundary
GISC 6382 Applied GIS
Vertice
Node
UT-Dallas Briggs
47
ESRI Vector Definitions: Topology
The spatial relationships between adjacent or connected primtives
(arcs, nodes, polygons, points).
•
•
•
from-node/to-node
to-
from-
– arcs have direction therefore
node
node
1
have:
right
– left polygon/right polygon
(also, to-node
polygon
2
– left side/right side feature
3
for arc # 3)
attributes (e.g. address range)
– first from-node and last to-node
in polygon must be identical.
Sections
route: linear feature made up of
Route
two or more arcs
– may be divided into sections
(arcs or portions of arcs)
Three
region: area made up of two or
polys
more polygons
1
GISC 6382 Applied GIS
2
1
2
1
3
UT-Dallas Briggs
3
Arcs
4
2
Region = Poly 2 & 3
48
ArcView & ARC/INFO
Additional Terms/Concepts
•
•
•
•
•
annotation: feature labels &
names
tic: points on map which are
known locations on earths
surface; used for registration;
allow all coverages to be related
to a common coord. system
links: ‘forced’ connections or
‘snaps’ so features line up (e.g. at
map edges)
tile: map subdivision used for
storage/data handling; can be
regular (squares) or irregular
(e.g. a county)
map extent: outer limits of map:
xmin, xmax,ymin, ymax
GISC 6382 Applied GIS
Main Street
UT-Dallas Briggs
49