Economic Data Time Travel Adrienne Brennecke September 30, 2011 New York Times Article.
Download ReportTranscript Economic Data Time Travel Adrienne Brennecke September 30, 2011 New York Times Article.
Economic Data Time Travel Adrienne Brennecke September 30, 2011 New York Times Article Quick demo • http://alfred.stlouisfed.org/ Value of this history • Determine the accuracy of early estimates • Evaluate policy decisions using information available at the time, not what is known in hindsight • Allows economists to model the economy using data that was actually available Value of this website • Users can save data sets to their own account • Share Published Data Lists • Average about 3,000 unique visitors a month Looking for revisions, and then solutions • Former Research Director was looking for the economic data that were released originally—not the revised data • We searched high and low... – Libraries removed news releases when the final version was published – Agencies historically wrote over the data, as the computing storage costs were high Help from libraries • Searched online catalogs for press releases • Called documents librarians all over the country • Contacted issuing agencies and the Library of Congress • Depository libraries came through for us Challenges • How to design ALFRED to store revisions – See Developing Time-Oriented Database Applications in SQL • Finding and verifying old data and release dates • Early electronic information lost • Underestimating amount of work involved • Figuring out the best process, and dealing with changing workloads for staff Technical details • These data are saved only when there are revisions; each data value has three pieces of information – The time period it applies to (e.g., 2nd quarter 2011) – The time period it is true for (e.g., from July 30th to August 26th) – The date that the information was entered into the database to allow for tracking of data entry errors Technical details • Underneath the hood, FRED and ALFRED are the same application. – ALFRED was populated by collecting historical data for series in FRED, and ALFRED continues to be extended by capturing "expiring" FRED values when new ones are published. – The coverage dates for data series are the same in both FRED and ALFRED Conclusion • ALFRED shows revisions to a series and presents data as they were at a particular point in time • Unique information, FREE and easily accessed • Preserving important data for future research FRASER: Federal Reserve Archival System for Economic Research Technical Aspects of FRASER • Variation on LAMP software bundle – Linux operating system – Apache web server – PostgreSQL database (rather than the more common MySQL) – PHP programming • Google search appliance – Metadata plus full text (OCR) – Basic and advanced search options available – Standard Google search functions, plus a couple filters unique to FRASER Topic Collections Special/ Archival Collections Publications • Originally, data publications • Now include various types of serials and monographs • Statistical releases Available issues, arranged by date Bibliographic information Historical Documents • Based on categories • Originally “non-data” publications Categories Documents Special Collections Page Stacking • Purpose: – View a single data series over time • Solution: – Grouped page files – PDFLib+PDI Personnel • Center for Economic Documents Digitization (CEDD) consists of – 1 manager – 1 librarian – 5 part-time scanning clerks • Additional support from – Web group – Library director Digitization Process Selection and preparation Scan Quality check (QC) •Review paper documents & establish scanning procedures •Additional review, page by page •This is done by a person other than the scanner Transfer to server Create PDF Clean scanned image •This must be done by one of the two librarians •OCR •Add metadata •QC (brief) •Process varies based on project Post to FRASER •Items can be posted as publications , historical documents, special collections – each with their own interface and metadata options Add link to catalog record and OCLC record •This is done by the library’s cataloger, outside of the CEDD Locating Paper Copies • We scan documents from – Our own library collection, and other Fed libraries – FDLP Needs and Offers lists – Interlibrary loan – Partner institutions • But… – As we digitize, libraries throw out paper copies Copyright • We focus on public domain materials – Federal Reserve Bank publications • Not technically public domain, but we have an agreement to digitize – Federal Government publications – Pre-1923 publications Hardware and Software Hardware • Automatic Document Feeder (ADF) – 3 - Fujitsu fi-5650C – 2 - Fujitsu fi-6670 (newer model) • – Indus scanning • Overhead/planetary scanner – 1 - Indus Color Book Scanner 5002 • Software • ImageWare BCS-2 Flatbed scanner – 1 - Epson Expressions Graphic Arts 10000XL Techsoft PixEdit7 – Fujitsu and Epson scanning, and all cleaning • ABBYY FineReader 10.0 – OCR • Adobe Acrobat 9 Pro – Metadata Also: • Microsoft Access 2007 – Metadata and tracking purposes for some larger collections • PDF Summary Maker – Embedding metadata from Access into pdfs •Image/text areas as recognized by OCR software Green=text Blue=table Red=picture •Text recognized by OCR software Blue=uncertain character(s) Data Entry • Web-based forms for data entry • Here: setting up the overall publication (library cataloglevel metadata) Data Entry • Issue-level metadata – Issue date – Issue title (textformatted date, or other title) – Attach pdf – Enter table names and page titles for the page stacking described earlier Data Entry • Historical and Special Collection documents have both publication- and issue-level metadata Special Collection Document Output • 3 image files – Original multipage tiff – Cleaned multipage tiff – PDF • 3 types of text/metadata – Underlying text in pdf (OCR) – Title and author embedded in pdf – Other metadata entered in database when posting Contact Us Adrienne Brennecke alfred.stlouisfed.org/ Data Acquisitions, Reference Librarian 314-444-7479 [email protected] Pamela Campbell Digital Projects Librarian 314-444-8907 [email protected] fraser.stlouisfed.org/