What is Stata - Firestone Library | Princeton University

Download Report

Transcript What is Stata - Firestone Library | Princeton University

SOC 505 Research Seminar in
Empirical Investigation
Data Resources at
Princeton University
Data and Statistical Services
Data Resources at
Princeton University
DSS Computer Lab, A-16-H-3
Research and Instructional Services
Firestone Library
[email protected]
http://dss.princeton.edu
Contacts
 Mary George([email protected])
Senior Reference Librarian (data)
 Susan White ([email protected])
Sociology Librarian (literature)


Bobray Bordelon
Data Librarian
Oscar Torres-Reyna & 2nd position vacant
Data & Statistics Consultant
[email protected]
Coverage:
 Time lag from date survey is conducted
until data files released often 2+ years
 Sub-national data: U.S. only, other large
nations (China, India, Canada) where data
are collected. U.S. has many state level
surveys, some dealing with large cities
(N.Y. and L.A.), some case studies
(primarily crime) for other cities, areas.
Numeric Data Holdings
 Micro-data: Survey or administrative
data about an entity. (e.g. person,
family, establishment)
 Summary statistics: Aggregated
counts of survey or administrative
data: Number of persons in an area.
A Few Definitions (adapted
from ICPSR)
longitudinal or panel study
 same group of individuals is interviewed at
intervals over a period of time. Note that some
cross-sectional studies are done regularly. For
instance, the General Social Survey and the
Current Population Survey: Annual Demographic
File are conducted once a year, but different
individuals are surveyed each time. Such a
study is not a true longitudinal study. An
example of a longitudinal study is the National
Longitudinal Survey of Labor Market Experience,
in which the same individuals have been
followed over time.
A Few Definitions (from ICPSR)
cross-sectional study
 data from particular subjects are obtained
only once. Contrast with longitudinal
studies, in which a panel of individuals is
interviewed repeatedly over a period of
time. Note that questions in a crosssectional study can apply to previous time
periods.
A Few Definitions (from ICPSR)
hierarchical file
 contains information collected on multiple units of
analysis in different record types. For example, the
physical housing structure may be 1 unit, and individual
persons within the structure are another. An example is
the Current Population Survey: Annual Demographic
File which has household, family, and person units of
analysis. Studies that include data for different units of
analysis often link those units to each other so that, for
instance, one can analyze the persons as they group in
a structure. Such studies are sometimes referred to as
having a relational structure.
A Few Definitions (from ICPSR)
relational structure

includes different units of analysis, particularly when those
units are not arranged in a strict hierarchy as they are in a
hierarchical file, has a relational structure. Note that the data
could be arranged in several different physical structures to
handle such a data structure. For instance, each unit of
analysis might be stored in a separate rectangular file with
identification numbers linking each case to the other units; or,
the different units of analysis might be stored in one large file
with a hierarchical file structure; or the different units could be
stored in a special database structure used by a relational
database management system. An example of a study with a
relational structure is the Survey of Income and Program
Participation, which has 8 or more record types; these record
types are related to each other but are not all members of a
hierarchy of membership. For instance, there are record types
for household, family, person, wage and salary job, and
general income amounts.
A Few Definitions (from ICPSR)
rectangular file
 contains the same number of card images
or the same physical record length for each
respondent or unit of analysis. Contrast
with hierarchical files.
Coverage:
 International macro-economic, social,
political, & financial indicators.
 National surveys, statistics for U.S.,
many European nations, public
opinion surveys from many nations,
internationally sponsored surveys
dealing with health, fertility, and
nutrition.
Major Data Archive Subscriptions
 Inter-university Consortium for
Political and Social Research
 Roper Center for Public Opinion
Research
 Social Science Electronic Data
Archive
More data!
 Economic, business and
financial data services.
http://firestone.princeton.edu/
econlib
 DSS Subject and Regional
Guides.
 CPANDA www.cpanda.org
Even more data!
 Federal, state, & independent
government agencies
 Other data archives from around the
world
 Academic institutions, scholars, think
tanks, & private organizations.
 Consult the Main Catalog.
 Google
Literature about data sets:
 A useful way to find useful data
sets is to look at the literature of
the field.
 ICPSR’s Bibliography of Data
Related Literature
 Sociological Abstracts
 Annual Review of Sociology
Data Analysis Options
 Refer to published statistics
 Use on-line analysis tools
 Download data and use a statistical
program
On-line Analysis Tools
Current Population
Survey (Unicon version)
General Social Survey (in
SDA format)
ICPSR SDA files
When you download data…
 Documentation
 Survey – sampling methods
 weights
 Variables
 Format
 Size
Downloading data
Statistical Software Extensions
SPSS
STATA
SAS
Command
File
.sps
.do
(.dct)
.sas
Data File
transport
.sav,
.por
.dta
.sas7bcat,
.xport (.xpt)
Log File
.spo
.smcl,
.log
.log
Output File
.spo
.smcl
.log
.out
Not Formated Files: .raw .txt .asc .dat
Reading Data into Stata
Common Data Format
Stata
Text (ASCII)
.dta use
Free/Fixed Columns
Comma Separated
Fixed Columns
Excel
SAS Export
Database
Command
.txt infile using
.csv insheet using
.dat infix using
.xls, .xlsx import excel
.xport, xpt fdause
.mdb dbms
Microdata Structure




Cross-sectional
Hierarchical
Time-series
Panel
Transforming Data
 Adding Variables
 merge
 Adding Cases
 append
 Reshaping Data
 reshape
 Transposing Data
 xpose
append
+
merge
+
=
=
reshape
transpose
Using Set Up Files
Define variables and values
 Use set up files
 do file (command file)
 dictionary file
 data file
 Read in a few variables
 Create your own set up
.do
.dct
.dat, .txt
How to Access Stata
 Locally –
 Princeton (OIT) cluster computers,
 Data and Statistical Services Computer
Lab, Firestone Library A-16-H-3
 Remotely –
Through Research Computing
(nobel). Use Secure Shell.
Stata Remote Access
 Register for a server account
http://helpdesk.princeton.edu/kb/displa
y.plx?id=9682
 Download Secure Shell (windows)
http://helpdesk.princeton.edu/kb/displa
y.plx?id=4104
 X terminal (macintosh)
Using Unix Stata
 Accessing files
 save them in the “H” drive from Windows
 use Secure Shell File Transfer
 Unix Stata
 interactive
type stata at the command prompt
 background
nohup stata –b do yourdofilename.do &
https://dss.wikidot.com/stata-batch-job
 x-windows
http://dss.wikidot.com/system:page-tags/tag/xwindow
Demonstration
 log in using secure shell
 save a file
 using FileExplorer
 using SSH
 run interactive stata
 submit a do file
Data and Statistical Services
DSS Computer Lab
A-16-H-3
Firestone Library
[email protected]
http://dss.princeton.edu