The Library of Congress Initiative for a World Digital Library Discussion Paper for UNESCO Experts Meeting UNESCO Headquarters, Paris, 1 December 2006
- Introduction
Librarian of Congress James H. Billington proposed the establishment of a World Digital Library (WDL) in a speech to the U.S. National Commission for UNESCO in June 2005. Billington suggested "that the time may be right for our country's delegation [to UNESCO] to consider introducing to the world body a proposal for the cooperative building of a World Digital Library. This would be a new type of activity that ... would hold out the promise of bringing people closer together precisely by celebrating the depth and uniqueness of different cultures in a single global undertaking."
This paper is intended to identify the main issues that need to be discussed and eventually resolved in connection with the creation of a WDL. It has been prepared as background material for the UNESCO Experts Meeting on the WDL on December 1, 2006. The participants in this meeting include representatives of leading libraries in countries with which the Library of Congress already has pursued or is pursuing digital library partnerships (Brazil, Egypt, France, the Netherlands, Russia, and Spain), and from two international organizations - UNESCO itself and the International Federation of Library Associations (IFLA) - that officially have communicated to the Library of Congress their interest in exploring cooperation in connection with this project. In addition to these participants, a number of participants from other countries have been invited to establish a better balance among regions and to solicit additional viewpoints.
- Background to the Project
In the past fifteen years, libraries and other cultural institutions have launched a large number of projects aimed at providing online access to digital collections. The Digital Collections Registry of the Digital Library Foundation (DLF) lists 754 publicly accessible digital collections hosted by the partners and allies of the DLF. Together, the UNESCO Libraries Portal and the UNESCO Archives Portal contain more than 22,000 links, many of which are to content-rich online digital library and archive projects. The OAIster project at the University of Michigan claims to be serving 9,624,092 records from 698 institutions. Other major projects include the Carnegie Mellon Million Book Digital Library Project and, in the commercial sector, Google Book Search and the competing mass book digitization projects that are now emerging.
In addition, a number of major regional initiatives are underway or under discussion. These initiatives were discussed at the UNESCO Open Forum at the IFLA conference in Seoul in August 2006 and will be reviewed again at the December 1 Experts Meeting. They include the European Library, the El Dorado Digital Library for Latin America and Iberia, and several initiatives in Asia and the Middle East.
While much has been achieved at the national level and many of the regional projects promise substantial benefits to their users, two challenges are likely to persist for the foreseeable future.
First, not enough digital content is being created. This is true for Europe (as witnessed by the attention that the European Digital Library initiative has drawn to this issue), North America, and developed Asia. It may even become more of a problem in the future, as resources shift, in relative terms, from digital conversion to the preservation of born digital content and from the digital conversion of cultural artifacts to the mass scanning of books. The situation in the developing world is of course far more problematic. In many countries, relatively little is being done to digitize collections and to make them available on the Internet. The result is that the distribution of digital content on the Internet is uneven with regard to geographic regions, cultures, language, and types of institution.
Second, content is often hard to find, difficult to search, and presented in a multiplicity of ways that confuse and frustrate users. Multilingual search and display are not well developed, and many features that young people are used to finding on commercial sites are not available on the cultural and educational sites maintained by libraries, archives, and other cultural institutions.
The WDL project will not be able to solve these problems on its own, but it can attempt to make progress toward doing so and help to point the way for other national and international projects aimed at improving both the quantity and quality of cultural content on the Internet.
- Objectives and Audience
The major objectives of the WDL should include the following: (1) the promotion of international and inter-cultural understanding and awareness; (2) service to education; (3) the expansion of non-English and non-Western content on the Internet; (4) promotion of awareness of foreign languages (which in turn could encourage and facilitate language learning); and (5) contribution to scholarly research.
To achieve these objectives, the WDL will have to appeal to a broad range of audiences on a worldwide basis. The target audience for the WDL should include students, teachers, librarians, and the general public (life-long learners). The WDL should be of value to scholars and researchers as well, but it should not attempt to compete with or supplant other scholarly projects, e.g., union catalogs that tell researchers where physical items can be found, or highly specialized sites that cater to the needs of particular scientific disciplines. The WDL can link to these projects and increase awareness of and direct traffic to them.
The strategy for appealing to a wide range of audiences should be not to provide a "lowest common denominator" user experience, but to create a website that is attractive to and useful for different users with different needs and expectations. This can be done through a sophisticated but user-friendly site that provides excellent connectivity and reliability, large (but not indiscriminately collected and unbalanced) amounts of content, multiple access points, powerful searching, and interpretation and presentation of content by leading scholars and curators. In keeping with the trend toward interactivity and user participation, the WDL also should provide opportunities for direct involvement on the part of users, without compromising its intellectual and curatorial integrity.
In order to be useful in developing countries, many of which do not have significant and plentiful broadband access points, the WDL also will have to be able to deliver some services to users that require low bandwith or mobile device solutions.
In addition to the objectives that will be furthered by the creation of a WDL website (and that are being served by many other digital library projects around the world), a large, international WDL project could have other positive spin-offs and could promote a range of ancillary objectives. The latter might include: (1) assisting developing countries to create their own national and regional digital libraries by providing them with equipment and expertise that can be used to create content for both the WDL and these national and regional projects (based on the principles of non-exclusivity and re-purposing); (2) promoting standards and best practices that institutions in developing countries (and smaller institutions in the developed world) can use as they create their own digital library projects, whether or not they participate in the WDL; (3) contributing to the security and preservation of artifactual collections by assisting with the creation of inventories and catalogs and by carrying out pre-digitization preservation work; (4) promoting economic development by creating jobs in participating countries for the performance of various tasks (cataloging, digitization, web design and programming, translation, and so forth) and through the promotion of tourism.
- Content Selection and Creation
The World Digital Library should concentrate on presenting rare and unique collections that are physically stored in geographically dispersed locations, and which, when brought together with other collections through cross-national and cross-cultural multilingual search and browse capabilities, will yield new knowledge and insights and will be of value to educators, students, and the general public, as well as to researchers. Examples of such collections are given in Appendix I.
Such content can come from two sources: it can be re-purposed from existing institutional, national, and regional digital library projects and contributed to the WDL; or it can be scanned expressly for the WDL (and then re-purposed, if desired, for use in institutional, national, and regional projects). The WDL should use both methods.
For developed country participants that already have staff and equipment for digitization and that already have digitized many cultural treasures, re-purposing is mainly what is needed.
For the developing world, the WDL project should aim to provide equipment, training, and whatever else might be needed (including preservation work if it is required prior to scanning) to enable institutions in these countries to participate in the project. The Library of Congress already has experience in working with partner institutions in Brazil, Egypt, and Russia on establishing dedicated installations for cooperative digitization projects. Much more could be done in other countries. Other methods that could be used to generate digital content are mobile scanning operations dedicated to smaller institutions (used by the Library of Congress and Russian partners in Siberia and the Russian Far East) and the engagement of local contractors with the requisite skills and equipment. This aspect of the project is likely to be complicated, expensive, and will require skilled and dedicated staff. However, expanding scanning activities in the developing world will be essential to ensure that the WDL is truly representative of all countries and cultures. Moreover, rather than being seen as an expensive burden on the project, it could be made a selling point to potential corporate, foundation, and government donors who will be attracted to the idea of working with cultural institutions in both the developed and developing world to set up turn-key scanning operations in places where they do not exist.
Irrespective of how the content is obtained for the WDL (re-purposed or originally scanned), two further issues need to be addressed: how big (in terms of numbers of images and collections) should the WDL be, and what particular collections should be solicited or, perhaps more controversially, what material should be rejected as inappropriate for the WDL.
Suggestions regarding the optimal size of the WDL have ranged from relatively small (a digital exhibit that will present from a few dozen to at most a few hundred representative images from the world's major cultures) to vast (an enormous portal that will link all of the other major national and regional portals into a single "union catalog" of digital content).
There is no need at this stage to give a definitive answer to the question of ultimate size, which will be determined by resources, the number and enthusiasm of the participating institutions, and other factors. Most likely, the optimal size for the WDL will be somewhere between these two extremes: it will be large, perhaps even very large, but it will not aim to be exhaustive. There are many reasons to reject the digital exhibit model. The existence of Google and other search engines has conditioned users to expect that "at least something" should be turned up by a search, no matter how obscure the topic or unintelligible the search request. The WDL cannot and should not try to compete with these expectations, but it should proceed from the assumption that users will expect to find their country, their topic of interest, their language, on the site. This argues for a large volume of material. Researchers also will want to find whole collections with significant volumes of material on the site, or they will quickly lose interest and turn elsewhere. On the other hand, the WDL need not be exhaustive. It will not be under any obligation to encompass all relevant content for a given topic or country. Rather, it can direct users to other projects and portals that do have the ambition to be comprehensive for a given country or region (South Korea's National Digital Library and the European Digital Library are examples that come to mind.)
Apart from the question of size, the issue of what content is selected for inclusion in the WDL and by whom needs to be addressed. Selection inevitably touches upon national and cultural sensitivities and could lead to disagreements among the partners or to external criticism of the project as a whole. Mechanisms therefore will need to be put into place to minimize the likelihood and intensity of possible disputes and criticisms.
One way to minimize the possibility of such controversies arising, at least in the early stages of the project, would be to work off existing lists of collections that already command the support of the relevant expert and cultural communities. UNESCO's Memory of the World Register, for example, lists more than 120 collections and/or items of major historical and cultural significance. Not all of these collections may be suitable for digitization, but a key goal of the WDL project might be to support the digitization and making available on the Internet of as many of these collections as possible. This would not be on an exclusive basis; rather, the same collections could be re-purposed for inclusion in the WDL, on UNESCO's own website, on major national and/or institutional websites, and in the major regional digital library projects.
Beyond working off such pre-established lists, the WDL will need to ensure a representative and intellectually credible process for the selection of content. This should involve setting up committees of scholars and curators for countries, regions, or cultural areas that will be responsible for identifying collections that are particularly important and/or representative and for vetting suggestions put forward by actual and prospective partner institutions. Such committees should be composed of leading scholars from both the countries concerned and from other parts of the world.
- Web Presentation
A key goal of the WDL project should be to offer a seamless user experience and a high degree of functionality. The WDL site must be fast, reliable, and easy to use.
Planning for the WDL should start with a recognition that the way in which scholars, researchers, and the general public are using the Internet has changed fundamentally in recent years. Traffic increasingly is coming from commercial search engines, with the search terms that bring users to library sites extremely varied and fragmented. This trend suggests that resources need to be devoted not just to helping users find material, but also to creating higher-value added content that will encourage users to spend more time in a given site and to take greater advantage of internal search and browse capabilities. The latter must be improved, however, if this is to happen.
Providing a good user experience will depend at least in part on the volume of digital content available, as discussed in the previous section. In addition, the WDL should offer intellectual added value that users can quickly recognize and exploit. This value will be provided in three areas: (A) access through multilingual search, retrieval and display, including cross-cultural, cross-national, and cross-temporal searching and browsing using multiple access points; (B) context through the provision of narrative and interpretive content; and (C) participation by inclusion of social networking and related features that allow users to interact with the WDL and with other WDL users, rather than just passively view its content.
- Access
Multilingualism
In order to qualify as credibly "multilingual," the WDL interface should be offered in the six official UN languages (Arabic, Chinese, English, French, Russian, and Spanish), plus Portuguese (given the participation of the National Library of Brazil as a founding partner). A medium-term goal of the project should be to add, as resources permit, interfaces in other major languages, e.g., Hindi, Japanese, and German.
Search, browse and display also should be in these seven languages. Since the content on the site will come from many institutions around the world, including those which catalog their holdings in languages other than the seven initial interface languages, in practice, search, retrieval, and (perhaps to a lesser degree) display will be in a 7 + 1 + 1 + 1 ..... mode. In other words, a user from Vietnam will not be able to access all of the material on the WDL using Vietnamese. However, he or she should be able to search in Vietnamese for at least all of the Vietnamese items (i.e., items contributed by institutions in Vietnam and cataloged in Vietnamese). Over time, it is to be hoped that leading institutions (national libraries and universities) in these countries will take responsibility for translating other parts of the WDL into additional languages, beginning with the materials most relevant to the interests of their national users.
Search and Browse
The following methods of searching and browsing (including further sorts within browses and search results) are shown on the WDL mockup, and are representative of the kinds of features the WDL should offer. Final selection and elaboration of these methods should be subject to discussion among the partners and to the full working out of the technical and resource requirements for implementation.
Search should be through a simple Google-type search box. Advanced search capabilities also will be available.
The browse feature should offer six additional entry points: institutional repository and collection; type of item; year or time period; place; topic; and today's featured theme. These features can be seen on the WDL mockup that will be presented at the December 1 Experts Meeting.
Browse by Institutional repository and collection will be the main place on the WDL where collections will be displayed qua collections. This will be an important consideration for the contributing partners and their internal custodial divisions, and will ensure that the integrity and identity of their collections is maintained on the site. This browse feature also will be useful to certain specialized categories of users (collectors, curators, and some scholars).
Browse by Type of item (rare books, manuscript, map, photograph, film, sound recording, and so forth) will help to make users aware of the range of materials on the WDL, meet the demands of teachers and scholars for multi-format access, and encourage the comparison of similar items from different cultures over different periods of time. Types of item should be grouped under headings in simple, non-technical language, without regard to the bureaucratic subdivisions of the contributing partner institutions.
The browse by Year or time period feature will help to facilitate cross-cultural and cross-national comparisons (e.g., of what was happening in China when something else was happening in Europe) and provide time lines that teachers find especially useful.
Place will enable geographic search and display, and lay the groundwork for a range of potential Geographic Information System (GIS)-related applications that can be developed for later versions of the WDL or by users of the WDL. Geographic tools (enabled by encoding of latitudinal and longitudinal coordinates in the metadata) also can be used to track items that refer to the same place but whose names have changed over time or that differ in different languages (St. Petersburg/Petrograd/Leningrad/St. Petersburg).
Browse by Topic will facilitate research and education with regard to general topics. Whether the topic browse feature uses Dewey Decimal classifications or some other system remains to be seen, but one objective of this feature probably should be aggregation to general categories that may not necessarily turn up in a literal Google-type search of the metadata. ("Religion," for example, might aggregate pictures of churches and mosques, sacred music and books, and so forth, even though the word "religion" might not appear in the item metadata.)
Today's featured theme is simply a way to introduce variety in the opening page of the WDL and to encourage exploration by providing an additional route into the collections. It also could be a way to generate additional participation by partner institutions and their curators, as these institutions could be asked to volunteer to select and develop the themes - using items from different institutions and different parts of the world - on a rotating basis so that the 365 days of the year are covered.
Item-Level Display
All search and browse paths will lead to item-level displays that will contain basic bibliographical data (provided in the language of the interface) and such features (zoom, enlarge, translate) as may be appropriate to the item.
- Context
In addition to its search and browse features, the WDL should offer presentations in intensively edited and curated digital features that build upon and showcase selected items from the digital repository. The purpose of these presentations will be to provide context and interpretation, and to offer general introductions to the collections and items contained in the WDL.
As will be seen on the WDL mockup, these country- or culture-specific presentations are envisioned as "Memory of ..." features that will be developed by committees of scholars, linguists, and historians. The central organizing feature of each "Memory of ..." presentation will be a time line showing the major periods in a given country's history, which will be introduced and explained by a brief narrative (vetted by scholars, but written to appeal to the non-specialist, general user). These narratives will be illustrated by images selected from the collections in the repository.
In addition to the basic time line, the "Memory of ... " sections may include other access points (e.g., maps), selections of top treasures, and other means of providing context and interpretation, such as webcast interviews with scholars and curators.
- User Participation
Features on the WDL that might be employed to allow users of the WDL to participate in the project might include blogs, chats with curators, and the lightbox feature that allows users to create their own "My WDL."
- Access
- System Architecture
The WDL system architecture should be distributed in nature, with each participating site contributing to the creation or maintenance of content and/or to ensuring the accessibility and preservation of content, depending upon the site's strengths and capacities. Some sites will primarily focus on selection and creation of digital content (mainly digitization from analog content, at least initially). Other sites will both select and create digital content and operate content collections and their supporting technologies and infrastructure to ensure regional/global availability and access.
In this suggested architecture, the hosting and maintenance sites are geographically distributed. The content in the WDL collections is highly replicated on each and every hosting and maintenance site. We refer to the hosting and maintenance sites as mirror sites. One of the mirror sites is a special central site that receives digital content from other sites (e.g., digitization sites or translation sites, if translation is to be done at different sites than the digitization or the central site). After receiving digital content from other sites, the central site ingests the content into an instance of a digital repository. A copy of the populated digital repository is then transferred from the central site to every other mirror site for hosting and maintenance. Figure 1 provides an example of a site layout for this type of architecture with content and metadata flow between sites. The figure shows the start of information flow (content and metadata) from digitization sites (circles in the figure) and the distribution of populated repositories to mirror sites from the central site.
Each mirror site will have a modular design and may have a distributed architecture in itself, but with co-located components, to support requirements as needed. Each mirror site will support preservation and access functionalities. In addition to preservation and access functionalities, the central site also has an acquisition sub-system that interfaces with content producers to receive content and stage it prior to ingestion into the repository. Technical analysis will be needed to determine the optimal number of mirror sites, with six or seven probably a reasonable starting assumption (East Asia, Europe with possibly an additional site in Russia, the Middle East/Africa, North America, South America, and South Asia).
An important issue in the architecture is the management of heterogeneity as content is created, transferred and ingested into the repository. By the time content and associated metadata are ingested into the repository and are made accessible in an efficient and useful manner on mirror sites, they will go through several stages and transformations within sites and between sites. As it is first created in digital form, content may be in several different file and aggregation formats with different naming conventions and qualities from one digitization site to the next. The content sets grow in complexity and volume as they are translated into several other languages, and may be transformed again into other formats at this stage of their lifecycle. Metadata is created and collected from the very beginning and is added to from one stage of the lifecycle to the next. Metadata schemas and their formats and structures thus vary from one stage to the next, generally increasing in complexity. As transfer of content from one stage to the next occurs, transformation into different forms and formats will occur as content is wrapped and packaged to facilitate transfer.
In the proposed WDL architecture, the number of file and aggregation formats for different content types and for different stages of the content lifecycle is kept to a select few. The architecture will adopt commonly used standards and best practices in this area. In addition, the uniformity of formats, structures and processes should increase as we move from creation to ingested, accessible and searchable content in the repository. Within the repository (i.e., the site that is replicated on mirror sites), the content formats and structures are normalized to provide uniformity within and between collections, as much as possible. This characteristic will enable effective search within and between diverse collections in ways that heterogeneous information sources could not afford.
This proposed architecture would support the following features:
Preservation: Existence of mirror sites ensures that multiple copies of every digital object exist in geographically distributed locations. Loss of a copy at one site can be recovered by transfer of that object from another site.
Performance: Existence of regional mirror sites increases performance by providing better response times to a greater number of users.
Reproducibility: Developing the technology and formally populating the repository at a central site in a uniform way, before replicating to other sites makes creating additional mirror sites in a timely manner feasible.
Ease of operation: A turnkey populated repository with uniform structure at the mirror sites makes operating the WDL system simple and possible at a lower cost.
Platform independence: The WDL system will be able to run on a multitude of hardware and operating system platforms to facilitate local diversity at the mirror sites.
Global search: Uniformity of formats and structures within the repository will enable global search within and across diverse collections.
Availability: Existence of mirror sites ensures that if a site is temporarily down, access continues to be provided via other running mirror sites.
- Finance and Governance
Digital library projects are expensive and becoming more so over time. The Library of Congress American Memory project to place an initial five million items online in the period 1995-2000 cost over $60 million. The Meeting of Frontiers project carried out by partners in Russia, Germany, and the United States spent approximately $3 million to create a site that contains just under a million images, displayed and searchable in two languages. The national librarian of France has suggested that Europe needs to spend $150 million to $200 million to create a European portal to compete with Google.
Estimates for international projects with a high degree of engagement in the developing world and the unique problems posed by multilingual search and display are inherently tricky. The initial estimate by the Library of Congress for a global digital library pilot effort involving just twelve countries was for $27 million over five years. An important priority of the WDL planning process should be to come up with more precise cost estimates, based upon ongoing pilot projects, and to decide how smaller, manageable chunks of a WDL can be created, block by block over time, using available financial and in-kind resources.
A possible model for governance might be a membership organization that individual libraries and cultural institutions would join as WDL participants. Institutions would ascribe to a charter that would include general provisions relating to intellectual property, technical standards, and other issues that otherwise might have to be painstakingly negotiated in bilateral agreements between each of the participating institutions. A general meeting and an executive council might in turn govern the WDL organization. The WDL also might have something like an overall editorial board to review the quality and balance of the collections and presentations on the WDL.
- Next Steps
It is proposed that the next steps in developing the WDL include the following:
- The convening of expert working groups to explore the key issues outlined above, plus any others that might be identified at the December 1 meeting.
- Development of a working prototype, based on the mockup presented at the December 1 meeting and the feedback regarding it, to be presented to UNESCO in October 2007 and used as a basis to generate political support, recruit partners from around the world, and solicit private sector, foundation, and governmental financial and in-kind support for the project.
- Additional suggestions for follow-up activity are welcome, and can be made at the December 1 Experts Conference or directly to the Library of Congress or any of the working group chairs.
Appendix I: Examples of Collections that Might be Included in the World Digital Library
- Rare Books. Possible candidates would include significant volumes on history, culture, science, and other topics, in all languages.
- Other Printed Material. Pamphlets and ephemera tend to be of great interest to scholars; railroad and shipping timetables are also of interest, as are early advertisements and directories.
- Maps. Hand-drawn maps by explorers; early printed maps; printed maps that played significant roles in shaping national consciousness in different countries; maps by indigenous peoples or that show geographic knowledge obtained from indigenous peoples; topical maps showing industry, transportation, ethnographic and linguistic distribution.
- Photographs. 19th and early 20th century collections that document everyday life, buildings, and infrastructure; albums (published or unpublished) created for special occasions such as visits and anniversaries; collections owned or created by important historical personages.
- Postcards. Late 19th and early 20th century collections that document the rise of tourism and the tourist industry.
- Prints, Posters, Lithographs. Collections that document buildings, infrastructure, flora and fauna, important historical events.
- Architectural drawings. Anything that sheds light on important buildings and structures; monuments from civilizations around the world.
- Manuscripts. Key historical documents; illuminated manuscripts; examples of calligraphy in all scripts; sacred texts and other religious texts; personal diaries and letters of general interest. Collections/items that most likely would not be candidates for inclusion would be large volumes of government or other institutional papers or the personal papers of individuals.
- Musical Scores. Any of particular historic or aesthetic significance
- Film Clips. Early films; documentary clips that show events and individuals of historic importance.
- Sound Recordings. Ethnographic recordings (folk songs); sound recordings of important historical personages.