What is Stata - Firestone Library | Princeton University
Download
Report
Transcript What is Stata - Firestone Library | Princeton University
SOC 505 Research Seminar in
Empirical Investigation
Data Resources at
Princeton University
Data and Statistical Services
Data Resources at
Princeton University
DSS Computer Lab, A-16-H-3
Research and Instructional Services
Firestone Library
[email protected]
http://dss.princeton.edu
Contacts
Mary George([email protected])
Senior Reference Librarian (data)
Susan White ([email protected])
Sociology Librarian (literature)
Bobray Bordelon
Data Librarian
Oscar Torres-Reyna & 2nd position vacant
Data & Statistics Consultant
[email protected]
Coverage:
Time lag from date survey is conducted
until data files released often 2+ years
Sub-national data: U.S. only, other large
nations (China, India, Canada) where data
are collected. U.S. has many state level
surveys, some dealing with large cities
(N.Y. and L.A.), some case studies
(primarily crime) for other cities, areas.
Numeric Data Holdings
Micro-data: Survey or administrative
data about an entity. (e.g. person,
family, establishment)
Summary statistics: Aggregated
counts of survey or administrative
data: Number of persons in an area.
A Few Definitions (adapted
from ICPSR)
longitudinal or panel study
same group of individuals is interviewed at
intervals over a period of time. Note that some
cross-sectional studies are done regularly. For
instance, the General Social Survey and the
Current Population Survey: Annual Demographic
File are conducted once a year, but different
individuals are surveyed each time. Such a
study is not a true longitudinal study. An
example of a longitudinal study is the National
Longitudinal Survey of Labor Market Experience,
in which the same individuals have been
followed over time.
A Few Definitions (from ICPSR)
cross-sectional study
data from particular subjects are obtained
only once. Contrast with longitudinal
studies, in which a panel of individuals is
interviewed repeatedly over a period of
time. Note that questions in a crosssectional study can apply to previous time
periods.
A Few Definitions (from ICPSR)
hierarchical file
contains information collected on multiple units of
analysis in different record types. For example, the
physical housing structure may be 1 unit, and individual
persons within the structure are another. An example is
the Current Population Survey: Annual Demographic
File which has household, family, and person units of
analysis. Studies that include data for different units of
analysis often link those units to each other so that, for
instance, one can analyze the persons as they group in
a structure. Such studies are sometimes referred to as
having a relational structure.
A Few Definitions (from ICPSR)
relational structure
includes different units of analysis, particularly when those
units are not arranged in a strict hierarchy as they are in a
hierarchical file, has a relational structure. Note that the data
could be arranged in several different physical structures to
handle such a data structure. For instance, each unit of
analysis might be stored in a separate rectangular file with
identification numbers linking each case to the other units; or,
the different units of analysis might be stored in one large file
with a hierarchical file structure; or the different units could be
stored in a special database structure used by a relational
database management system. An example of a study with a
relational structure is the Survey of Income and Program
Participation, which has 8 or more record types; these record
types are related to each other but are not all members of a
hierarchy of membership. For instance, there are record types
for household, family, person, wage and salary job, and
general income amounts.
A Few Definitions (from ICPSR)
rectangular file
contains the same number of card images
or the same physical record length for each
respondent or unit of analysis. Contrast
with hierarchical files.
Coverage:
International macro-economic, social,
political, & financial indicators.
National surveys, statistics for U.S.,
many European nations, public
opinion surveys from many nations,
internationally sponsored surveys
dealing with health, fertility, and
nutrition.
Major Data Archive Subscriptions
Inter-university Consortium for
Political and Social Research
Roper Center for Public Opinion
Research
Social Science Electronic Data
Archive
More data!
Economic, business and
financial data services.
http://firestone.princeton.edu/
econlib
DSS Subject and Regional
Guides.
CPANDA www.cpanda.org
Even more data!
Federal, state, & independent
government agencies
Other data archives from around the
world
Academic institutions, scholars, think
tanks, & private organizations.
Consult the Main Catalog.
Google
Literature about data sets:
A useful way to find useful data
sets is to look at the literature of
the field.
ICPSR’s Bibliography of Data
Related Literature
Sociological Abstracts
Annual Review of Sociology
Data Analysis Options
Refer to published statistics
Use on-line analysis tools
Download data and use a statistical
program
On-line Analysis Tools
Current Population
Survey (Unicon version)
General Social Survey (in
SDA format)
ICPSR SDA files
When you download data…
Documentation
Survey – sampling methods
weights
Variables
Format
Size
Downloading data
Statistical Software Extensions
SPSS
STATA
SAS
Command
File
.sps
.do
(.dct)
.sas
Data File
transport
.sav,
.por
.dta
.sas7bcat,
.xport (.xpt)
Log File
.spo
.smcl,
.log
.log
Output File
.spo
.smcl
.log
.out
Not Formated Files: .raw .txt .asc .dat
Reading Data into Stata
Common Data Format
Stata
Text (ASCII)
.dta use
Free/Fixed Columns
Comma Separated
Fixed Columns
Excel
SAS Export
Database
Command
.txt infile using
.csv insheet using
.dat infix using
.xls, .xlsx import excel
.xport, xpt fdause
.mdb dbms
Microdata Structure
Cross-sectional
Hierarchical
Time-series
Panel
Transforming Data
Adding Variables
merge
Adding Cases
append
Reshaping Data
reshape
Transposing Data
xpose
append
+
merge
+
=
=
reshape
transpose
Using Set Up Files
Define variables and values
Use set up files
do file (command file)
dictionary file
data file
Read in a few variables
Create your own set up
.do
.dct
.dat, .txt
How to Access Stata
Locally –
Princeton (OIT) cluster computers,
Data and Statistical Services Computer
Lab, Firestone Library A-16-H-3
Remotely –
Through Research Computing
(nobel). Use Secure Shell.
Stata Remote Access
Register for a server account
http://helpdesk.princeton.edu/kb/displa
y.plx?id=9682
Download Secure Shell (windows)
http://helpdesk.princeton.edu/kb/displa
y.plx?id=4104
X terminal (macintosh)
Using Unix Stata
Accessing files
save them in the “H” drive from Windows
use Secure Shell File Transfer
Unix Stata
interactive
type stata at the command prompt
background
nohup stata –b do yourdofilename.do &
https://dss.wikidot.com/stata-batch-job
x-windows
http://dss.wikidot.com/system:page-tags/tag/xwindow
Demonstration
log in using secure shell
save a file
using FileExplorer
using SSH
run interactive stata
submit a do file
Data and Statistical Services
DSS Computer Lab
A-16-H-3
Firestone Library
[email protected]
http://dss.princeton.edu