My Website Was Lost, But Now It’s Found Frank McCown

Download Report

Transcript My Website Was Lost, But Now It’s Found Frank McCown

My Website Was Lost,
But Now It’s Found
Frank McCown
CS 110 – Intro to Computer Science
April 23, 2007
Frank McCown

Education




Work Experience




Ph.D. in Computer Science – Old Dominion Univ. (2007 expected)
M.S. in Computer Science – Univ of Arkansas in Little Rock (2002)
B.S. in Computer Science – Harding University (1996)
1997-2004 – Instructor of CS at Harding University (Searcy, AR)
1996-1997 – Software Eng for Lockheed Martin (Denver, CO)
1995 – Software Engineer Intern for Auto-trol (Denver, CO)
Honors




2007 – Outstanding Graduate Research Assistant
2006 – College of Sciences Dissertation Fellowship
2005 – Outstanding Graduate Assistant
2004 – Dominion Scholar
Industry vs. Academia
2000 survey by The Scientist magazine asked
their readers:
Overall which environment do you prefer?
No preference
18.0%
Academia
Industry
42.5%
39.5%
73% of survey respondents had held research positions in industry and academia.
http://www.the-scientist.com/2001/4/16/28/2/
Industry vs. Academia

Movement



Flexibility



Academia  Industry is common
Industry  Academia very uncommon
Schedule
Focus
Compensation
Research Interests

Digital preservation


Web crawling


Will we be able to see our websites 20 years from now?
How can search engines and web archives duplicate/
download our websites more efficiently and effectively?
Search engines


How much/what content do commercial search engines
index and cache?
How synchronized are search engines APIs with what the
general user sees?
Black hat: http://img.webpronews.com/securitypronews/110705blackhat.jpg
Virus image: http://polarboing.com/images/topics/misc/story.computer.virus_1137794805.jpg
Hard drive: http://www.datarecoveryspecialist.com/images/head-crash-2.jpg
Web
Infrastructure
Cached Image






First developed in fall of 2005
Available for download at
http://www.cs.odu.edu/~fmccown/warrick/
www2006.org – first lost website reconstructed (Nov
2005)
DCkickball.org – first website someone else
reconstructed without our help (late Jan 2006)
www.iclnet.org – first website we reconstructed for
someone else (mid Mar 2006)
Internet Archive officially endorses Warrick (mid Mar
2006)
Warrick-related Publications





Frank McCown, Norou Diawara, and Michael L. Nelson. Factors
Affecting Website Reconstruction from the Web Infrastructure.
JCDL 2007. June 2007. Vancouver, British Columbia, Canada.
Catherine C. Marshall, Frank McCown, and Michael L. Nelson.
Evaluating Personal Archiving Strategies for Internet-based
Information. IS&T Archiving 2007. May 2007. Arlington, Virginia.
Frank McCown and Michael L. Nelson. Characterization of
Search Engine Caches. IS&T Archiving 2007. May 2007.
Arlington, Virginia, USA.
Frank McCown, Joan A. Smith, Michael L. Nelson, and Johan
Bollen. Lazy Preservation: Reconstructing Websites by Crawling
the Crawlers. WIDM 2006. November 2006. Arlington, Virginia.
Frank McCown and Michael L. Nelson. Evaluation of Crawling
Policies for a Web-Repository Crawler. HYPERTEXT 2006.
August 2006. Odense, Denmark.
Search Engine APIs
Frank McCown and Michael L. Nelson. Poster: Search Engines and Their Public
Interfaces: Which APIs are the Most Synchronized? WWW 2007
Frank McCown and Michael L. Nelson. Agreeing to Disagree: Search Engines
and their Public Interfaces. JCDL 2007
Thank You
Questions?