Web Site Creation: Good Practice Guidelines Designing For Migration, Preservation and Dissemination
Download
Report
Transcript Web Site Creation: Good Practice Guidelines Designing For Migration, Preservation and Dissemination
Web Site Creation:
Good Practice Guidelines
Designing For Migration,
Preservation and Dissemination
Brian Kelly
UK Web Focus
UKOLN
University of Bath
UKOLN is supported by:
Email
[email protected]
URL
http://www.ukoln.ac.uk/
Contents
•
•
•
•
2
We’ve Been Here Before
Web-Based Dissemination
Mirroring, Migration & Preservation
Conclusions
What Happens When The
Funding Stops?
When the NOF project funding finishes what
happens?
The project gracefully turns into a fully-fledged
service, with new funding from NOF, the EU, your
organisation, etc.
The project staff all leave and the Web site is
shut down, is moved and can’t be found, or is
broken and there is no-one with the interest,
expertise or permissions to fix it
3
We’ve Been Here Before
The UK Higher Education sector has been
here before:
CTI Projects
• CBL applications locked into obsolete hardware
TLTP Projects
• CBL developers using Toolbook on standalone
PC, which could not be deployed on campus LAN
eLib Projects
• Web sites disappear
EU Programmes
• …
4
Survey of EU Web Sites
WebWatching Telematics For Libraries
Project Web Sites (Fourth Framework)
• Exploit Interactive article published in Oct 2000
• Web site availability:
Yes
Never
65
16
Domain
Gone
11
Page
Gone
12
• Server details:
Apache – 41
Netscape – 3
IIS – 10
NCSA – 3
Other – 6 (e.g. Mac, GN)
• See <http://www.exploit-lib.org/
issue7/webwatch/>
5
Survey of eLib Web Sites
WebWatching eLib Project Web Sites
• Ariadne article published in Jan 2001
• Of 71 Web sites, 3 domains no longer available
and 2 entry points have gone
SOSIG 7,076
• LinkPopularity.com results shown: OMNI
5,830
EEVL
3,865
• Survey also includes:
History
2,605
Analysis of entry points
Netskills 2,363
(links, HTML, accessibility)
Ariadne 2,144
Nos. of pages indexed by AltaVista
…
- 0 in some cases
xxx
~10
Due to robots.txt file
Due to frames interface or other robots barrier
• See <http://www.ariadne.ac.uk/
issue26/web-watch/>
6
Web Site Promotion
You want:
• Your quality pages to be found in a timely fashion
by users of search engines
• To encourage others to link to you
To ensure this happens you should:
• Have a domain and URL naming policy
• Exploit the Robots Exclusion Protocol - see
<http://www.robotstxt.org/wc/norobots.html>
• Be aware of barriers to robots (which may also be
barriers to humans)
• Think about a linking policy and procedures
7
URL Naming Policy
Issues:
• Having your own domain is a good idea
(e.g. http://www.ariadne.ac.uk/)
• Short URLs are good (more memorable;
search engines tend not to index deeply)
• Sub-domains may be a useful compromise
(e.g. http://ariadne.bath.ac.uk/)
• Keep URLs short by using directory defaults:
8
www.ariadne.ac.uk/issue5/metadata/intro.htm
www.ariadne.ac.uk/issue5/metadata/
Shorter, less prone to typos and allows for format and
language negotiation, new server management tools, etc
…/issue5/metadata/intro.fr.html
…/issue5/metadata/intro.pdf
(.cfm, .asp, .jsp)
Planning Search Engine
Strategy
You search for your project name and find a personal
page of a former colleague with informal information
To avoid this:
• Distinguish between (a) initial information about
the project (b) information for project partners,
funders, etc. and (c) information for end user
• Use search engine techniques to:
Ban search engines from indexing certain
pages
Register key pages (e.g. list of new
resources)
as appropriate
9
Robots
Make use of the Robots Exclusion Protocol (REP) to
ban robots from indexing :
• Non-public areas (e.g. area for partners)
• Pre-release Web sites
• Pages prior to an official launch
Note: Remember to switch off ban after launch!
User-agent: *
Disallow: /partners
Disallow: /draft
/robots.txt in Web root
Note that use of directories to group related resources will
have many benefits: controlling indexing robots, mirroring and
auditing software, etc.
10
Other Barriers To Indexing
Other barriers to indexing robots:
Frames
Most search engines can’t index framesets and rely on
appropriate <NOFRAMES> tags
Flash (and other proprietary formats)
Most search engines can’t index proprietary formats
Poorly implemented JavaScript pages
Search engines may not have JavaScript interpreters and
can’t index text generated by JavaScript
Poorly implemented user-agent negotiation (clientor server-side)
Most search engines don’t have a Netscape or IE useragent string and so will index “Upgrade to Netscape”
Invalid HTML Pages
11
Search engines may not be as tolerant of HTML errors as
Web browsers
Accessibility
• Robots have similarities to the visually
impaired
• Good design for robots is likely to be good
design for people with disabilities (and vice
versa)
• Make use of tools such as Bobby, WAVE,
etc. to check accessibility – see
<http://www.cast.org/bobby/>
You should formulate plans for making your
Web site search-engines friendly and
accessible
12
Other Ways Of Dissemination
Users find your Web site by:
• Search engines
• Following a link
• Entering a URL which they found on a mouse mat,
pen, in an article, etc
Links to your Web site are valuable as they:
• Drive traffic to your Web site
• Improve ranking in citation-based search engines
such as AltaVista
Possible problems with links:
• “Link-spamming services”
• Being in the “Web sites that suck” portal
• Resources needed to encourage linking
13
Encouraging Links
You can:
• Submit to directories (e.g. Yahoo!)
• Use directory (and search engine) submission
services
• Have clear entry points with static URLs for key
menu pages
• Think about who you want to link to you and why
they would do so
• Target them and think of motivation (e.g. attractive
small icon)
• Monitor trends in links to your Web site (e.g. try
<http://www.linkpopularity.com/>)
14
News Feeds
Providing
automated
news feeds
which can be
included in
third party
Web site with
no manual
intervention is
a good way to
support
dissemination
15
Extension to News Feeds
The RDN (Resource Discovery Network):
• Wants to provide news feeds about developments
by RDN hubs
• It’s using the RSS standard for news feeds (and
XML/RDF application)
• A CGI-based RSS parser (and authoring tool) has
been created
• To allow potential users to try it out easily, a
JavaScript parser has also been written
• See <http://rssxpress.ukoln.ac.uk/>
16
Can this (slightly) heavyweight CGI solution be
complemented by a lightweight JavaScript
solution be used within your NOF-digi project?
Mirroring and Preservation
Another way to maximise impact of your Web
site is for it to be mirrored:
• Use of Web mirroring software to install service at
another location (e.g. overseas to overcome
network bandwidth problems or behind a firewall)
• Issues about whether you are mirroring output
from a service or the service itself (affected by
push vs pull mode of mirroring)
• NOF, for example, may wish to mirror your
service in order to preserve it (once funding runs
out and everyone leaves)
17
Note that you may wish to mirror only the project deliverables Web site,
and not the Web site for partners or the Web site about the project –
another reason for having separate Web sites
Conclusions
To conclude:
• Make plans for the architecture of your Web
service (URL naming, mirrorability,
dissemination, etc.) at the start
• Ensure your Web site is friendly to robots
• Think about use of neutral resources which can
be processed automatically by software (avoid
the human bottleneck)
18