Web archiving: Recommendations
1. Both JISC and the Wellcome Trust should attempt to foster good institutional practice with regard to the management of websites. For example, they could consider the development of website management guidelines for adoption by their user communities or for inclusion in grant conditions etc.
2. Until the exact position is clarified by legislation, a selective approach to web archiving - with appropriate permissions secured - would be the best way to proceed for both the JISC and the Wellcome Trust. Other methods of archiving will need to be approached with caution due to problems with copyright and other legal issues (see also the conclusions and recommendations in
the associated legal study [PDF 343KB] by Andrew Charlesworth).
3. If the Wellcome Trust is to meet its strategic objectives in extending its collecting activities into the digital environment, it will need to consider undertaking some kind of web archiving activity. To achieve this the following approach is recommended:
-
Establish a pilot medical web archiving project using the selective approach, as pioneered by the National Library of Australia (see also recommendation 5).
-
This pilot should consider using the NLA's PANDAS software for this archiving activity. This pilot could be run independently or as part of a wider collaborative project with other partners.
-
The high-quality medical websites identified in the RDN gateway OMNI should be considered as the starting point for any medical web archiving initiative.
-
The Wellcome Library will need to develop a web archiving selection policy to help ensure that it can archive a broad, representative sample of medical websites. This policy should allow for the inclusion of 'low-quality' (e.g. medical quackery) sites that may be of interest to future historians.
4. If JISC is to meet its strategic objectives for management of JISC electronic records, in digital preservation and collection development then it will also need to consider undertaking some form of web archiving. To achieve this the following approach is recommended:
-
Establish a pilot project to test capture and archiving of JISC records and publications on project websites using the selective approach, as pioneered by the National Library of Australia (see also recommendation 5).
-
As part of this pilot, JISC should define selection policies and procedures.
-
This pilot should consider using the NLA's PANDAS software for this archiving activity. This pilot could be run independently or as part of a wider collaborative project with other partners.
-
Work in collaboration with emerging initiatives from the British Library and Wellcome Trust. There are significant synergies with some existing JISC services. Websites identified and described by the RDN gateways could be the starting points for any selective subject-based web archiving initiatives in the UK. The RDN gateways contain (November 2002) descriptions of over 60 500
internet resources available on the web.
5. Research: the current generation of harvesting technologies has limitations with regard to dealing with 'deep web' sites. This has a particular impact on web archiving approaches based on automatic harvesting. While some research is being carried out on this issue from a web search perspective, there is a need for more collaborative research into this issue from the
perspective of web archives.
6. Collaboration: for both the JISC and the Wellcome Trust there is significant opportunity for partnership on web archiving. For example, there will be opportunities to collaborate on strategic, technical, organizational or content issues.
For the UK, both should attempt to work closely with the British Library, the other copyright libraries, the Public Record Office, data archives and the e-Science centres that have experience of managing large volumes of data. The focus for this collaborative activity could be within the Digital Preservation Coalition (DPC). On an international level, close co-operation with
institutions like the US National Library of Medicine and the Internet Archive will be important.
As an exemplar of collaboration, it is recommended that the JISC and the Wellcome Library should seek to work together and with other partners to create their pilot web archiving services. Not only will this realise economies of scale, but more importantly provide a model demonstrating how collaboration can work in practice.
|