The CoLib Project: Enabling Digital Botany for the 21st Century

John L. Schnase[1], John J. Leggett[2], Ted Metcalfe[1], Nancy R. Morin[3], Edward L. Cunnius[1], Jonathan S. Turner[4], Richard K. Furuta[2], Leland Ellis[5], Michael S. Pilant[6], Richard E. Ewing[6], Scott W. Hassan[1], and Mark E. Frisse[1]

[1] Advanced Technology Group, School of Medicine Library, Washington University School of Medicine; [2] Hypermedia Research Laboratory, Department of Computer Science, Texas A&M University; [3] Missouri Botanical Garden ; [4] Department of Computer Science, Washington University; [5] W.M. Keck Center for Genome Informatics, Institute of Bioscience and Technology, Texas A&M University; [6] Institute for Scientific Computation, Texas A&M University


The CoLib Project is a multi-institutional, inter- and intra-disciplinary effort aimed at establishing a large-scale, distributed, botanical digital library and research facility. Its initial focus is on enabling the Flora of North America Project (FNA), a collaborative effort to gather and disseminate information on all the plants of North America. CoLib's long-range goals, however, are the following: (1) to scale FNA into a comprehensive, world-wide, internetworked, botanical digital library, (2) to enable the scientific practice of botany, (3) to use digital library technology to create new opportunities for cross-disciplinary sharing, and (4) to create a context wherein the librarianship that is intrinsic to systematic botany can be extended to the work that is currently underway on digital libraries.

Large-scale hypermedia technology and broadband communications will be key components of future digital libraries. CoLib's research agenda focuses on collaborative hypermedia library systems, interactive multimedia computing and communications systems, and librarianship in the botanical digital library. The CoLib Project is providing a rich environment for research on large-scale, distributed, digital library systems and organizational processes associated with the use of digital libraries.

Keywords -- ATM, botanical informatics, collaborative hypermedia library systems, hypermedia-in-the-large, scientific practice, systematics, digital library machine.

1. Introduction

In the future, our work will be mediated through rapid, coordinated access to shared information. Digital libraries will provide the substrate for this dialog. Through shared digital libraries, people will collaborate with colleagues across geographic and temporal distances. They will use these libraries to organize personal information spaces and to read, write, teach, learn, and create. In traditional fashion, their intellectual work will be shared with others through the medium of the library--but their contributions and interactions will be elements of a global and universally accessible library that can be used by many different people and many different communities. By increasing the effectiveness and speed with which information is communicated and used, digital libraries are likely to potentiate major paradigm shifts in science. They will advance existing areas of study, promote disciplinary fusions, and enable entirely new discourses of study.

The CoLib Project is a multi-institutional, inter- and intradisciplinary effort aimed at establishing a large-scale, distributed, botanical digital library and research facility. The technology component of CoLib's research agenda focuses on collaborative hypermedia library systems and interactive multimedia computing and communications systems. We are using Asynchronous Transfer Mode (ATM) network technology for our communications infrastructure. We believe that large-scale, collaborative hypermedia technology and broadband communications will play a critical role in enabling scientific practice.

As a starting point, CoLib is focusing on enabling the Flora of North America Project (FNA), a major collaborative effort, to gather and make available, in digital and printed form, the most up-to-date information on the 20,000 species of vascular plants and bryophytes of the continental United States and Canada. CoLib will scale FNA into a comprehensive, worldwide, internetworked, botanical digital library. This will allow the librarianship that is intrinsic to the practice of systematic botany to be extended to the broad spectrum of work that is currently underway on digital libraries. More important, however, we approach CoLib as a means to opening many scientific disciplines through the clarifying and unifying frame of biological diversity.

The major participants in the CoLib Project include Washington University, Texas A&M University, and the Missouri Botanical Garden. We call the initiative "CoLib" because its focus is on expanding the range of collaborations, participation, and activities that can occur when digital libraries enable scientific practice. Our hope is that the CoLib Project will provide an opportunity for theoretical and empirical research that is firmly grounded in real-world needs and will help bring the conduct of botanical and library sciences into a new age where computer systems, high-speed networks, and ubiquitous access to information will revolutionize existing concepts of publication and research.

This paper provides an overview of the CoLib Project. We present our rationale for CoLib, describe the project, and then discuss its implications for the science of botany and for research on digital libraries in general.

2. Why Botany?

The fate and economic prosperity of human populations are inextricably linked to the natural world. Plants, in particular, are critical to our ability to maintain livable communities. All life on Earth depends on plants, algae, and certain bacteria to convert the energy of the sun into food and other forms of usable chemical energy. Beyond their value as a biological resource, plants contribute to the aesthetic quality of life, moderate climates, and increase our understanding of environmental processes.

It is not surprising that information relating to plants is vital to a wide range of scientific, educational, commercial, and governmental uses. Unfortunately, much of this information exists in forms that are not easily used. From rare book manuscripts to scattered and eclectic databases and physical specimens mounted on sheets of paper and preserved in herbaria throughout the world, our record of botanical information is largely inaccessible. Except for traditional, manual practices, there exist no comprehensive technology or organizational framework that allows this information to be used effectively by research scientists or other potential client communities.

There are, therefore, several important reasons why a large-scale, internetworked, botanical digital library would be of value:

* Botanical information is important. There are currently about 250,000 species[1] of plants on Earth. At least 25,000 species probably await discovery. While this represents only a small percent of the estimated 100 million existing species of organisms, the importance of plants is unequaled [29]. For example, about 100 species of plants are directly or indirectly the source of virtually all the calories consumed by the human population. Most medicines, including 20 with the largest worldwide sales, contain ingredients derived from plants. Plants contribute to the integrity of global ecosystems and are a critical component of America's economic base. The direct and indirect value of our botanical resource and its related information is virtually impossible to estimate [15].

* Botanical information is abundant and media-rich. It is difficult to estimate the quantity of information required to describe living species. If all the diversity of the world were revealed and described one-page-to- a-species and then published in bound thousand-page volumes, it would take nearly six kilometers of shelving to house this information [29]. This is the size of a medium-size public library. However, the complete record of biological information is orders of magnitude greater than this and exists in media types far more complex than paper. Botanical information certainly resides in traditional library holdings, such as books, monographs, and journals. It also exists in scores of institutional and individual databases and in hundreds of laboratory and personal field journals scattered throughout the country. The use of spatial information, geographic information, simulation, and visualization techniques is proliferating, along with an increasing reliance on two- and three-dimensional images, full motion video, and sound. Depending on the area of research, information may exist in the form of genetic maps of plant chromosomes or electron micrographs of subcellular structures.

Two important forms of botanical information are Floras[2] and herbarium specimens[3]. A botanical digital library built on next-generation communications and multimedia computing technologies can bring coherence to this vast and diverse assemblage of information.

* A large-scale, botanical digital library would be an important national resource. Over the years, the United States has created laws and policies to encourage responsible stewardship of botanical resources. Regulatory programs, acquisition of public lands, private conservation efforts, etc., have aimed at understanding how to sustainably support the plant-related goods and services upon which we all depend. Despite these endeavors, the nation's botanical biodiversity is in decline, and our understanding of key biological processes is incomplete. Critical long-term environmental decisions continue to be made based on inadequate information. There is no effective cross-institutional framework for identifying and conducting research of the highest priority, coordinating among current and future research activities, or making botanical information available in coherent and usable ways to the many agencies and organizations that have responsibilities for protecting, restoring, and managing our biological heritage [15]. A national botanical digital library would provide an organized framework for collaboration among federal, regional, state, and local organizations in the public and private sectors; provide improved programmatic efficiencies and economies of scale through better coordination of efforts; and provide an extensive and common information base that could be used to anticipate and lessen potential conflicts about biological resources.

* A botanical digital library can focus ongoing independent research efforts. Numerous national and regional efforts are underway to improve the way researchers access collections data and communicate with one another, and significant funding from private and public sources sustains the ongoing collections management and informatics efforts of our nation's free-standing museums of natural history and botanical gardens [1]. Additionally, the National Research Council Committee on the Formation of the National Biological Survey confirmed the need for a National Biological Survey for resource management and future research. It has also been recommended that work on plants have high priority [15]. A botanical digital library would provide a common focus for these independent efforts and would build upon the substantial investments that have and will be made in components of such a library. A botanical digital library would do more than simply complement ongoing informatics efforts; it would significantly leverage these activities by creating a more global and neutral context for sharing information, thus broadening participation and creating opportunities for greater "buy in" from new client communities.

* A botanical digital library would serve many client groups. Of the various types of scientific information, botanical information is of value to perhaps the widest range of client groups. Botanists, ecologists, scientists and researchers from other communities, forests mangers, farmers, municipal and regional planners, agriculture consultants and extension agents, weed and pest controllers, recreation managers and planners, flood control engineers, landscape architects, interior designers, plant breeders, seed companies, animal feed companies, dermatologists, rare and endangered species agents, poison control centers, publishers, artists, and teachers are among those who would use information contained in a convenient, network-accessible botanical digital library. It is important to realize that most of these client groups are currently isolated from botanical information either because the information is not available in an electronic form, or because the lack of a coherent and usable information infrastructure results in ineffective use of existing digital information [1, 14, 15].

4. The CoLib Project

We assume that botanical information in the future will be produced, transmitted, and consumed primarily in electronic form [1,14]. This will represent a major paradigm shift for the science of botany and will require substantial rethinking of how to design, implement, and use botanical information systems at all levels. In order to achieve effective solutions and bring coherence to the field, we feel that it is necessary to view the challenges of botanical informatics from a digital libraries perspective.

The CoLib Project consists of the following components: (1) a large-scale, geographically-distributed, collaborative research project that will benefit from the application of digital library technologies, (2) a substantial collection of botanical information whose development, analysis, access, and associations with other resources will be extended by the effort, (3) specific client communities and software environments that will utilize CoLib's technology and information products now and in the future, (4) high-capacity ATM networks as the primary communications infrastructure, (5) institutional infrastructure to support the library, and (6) a long-term plan for management of the library and coordination of research activities and commercial and other development. We now briefly describe these elements.

4.1. The Flora of North America Project

As indicated above, the Flora of North America Project (FNA) is at the heart of the CoLib Project. FNA will gather and make available, in digital and printed form, information on the 20,000 species of vascular plants and bryophytes of the continental United States, Canada, and Greenland--the first overall account of the plants of this vast area. The FNA workforce currently numbers over 500 scientists, including professional plant taxonomists in North America and other parts of the world, biologists in government agencies, such as the U.S. Forest Service, Bureau of Land Management, U.S. Fish and Wildlife Service, and state conservation and biological survey offices. The FNA editorial committee consists of 34 plant taxonomists distributed throughout the United States and Canada. Thirty institutions have committed major staff time and facilities to the successful completion of the Flora. The project began in 1985 and is expected to be completed early in the next century [13,14].

FNA researchers study plants by examining herbarium specimens and critically evaluating published reports of previous work. Sometimes it is necessary to perform detailed biochemical, electron microscopic, or other studies of the plants. The assemblage of information resulting from the analysis of a plant is referred to as a "treatment." Every treatment includes original research as well as a fresh consideration of any previous studies, and each treatment is prepared by a taxonomic specialist. Treatments fully integrate knowledge about the plant from vast and diverse sources of information. Many of the treatments present for the first and only time knowledge resulting from a researcher's lifetime of study. The treatments also incorporate results of recent research that otherwise might not become available to nonspecialists for many decades.

The Missouri Botanical Garden is the Organizational Center for the Flora of North America Project. At present, the following work is done at the Organizational Center once a treatment is received: information is incorporated into the FNA database; the treatment is technically edited and sent for bibliographic and nomenclatural editing; the final sequence of taxa is checked, and information is added to tracking files for maps and illustrations; synonymy is checked against lists of names accepted in major floras and checklists; distribution is checked against maps; maps are scanned or redrawn; illustrations, which have been prepared through close consultation with the author, based on living material, photographs, and herbarium specimens, are completed. Manuscripts, maps, and illustrations are usually sent out for review by regional collaborators. In addition to regional reviewers, special reviewers in the areas of weeds, agriculture, and horticulture are asked to review all relevant treatments, and two editorial committee members act as additional outside reviewers. Twenty-six regional floristic specialists have agreed to review treatments of taxa that occur in their areas. Overall coordination of the effort is the responsibility of the convening editor (Dr. Nancy Morin).

Through a cooperative agreement with Oxford University Press, the Flora of North America Project is being published as 14 printed volumes at the rate of approximately one per year. Plans are also underway for Oxford University Press to produce CD-ROM versions of the Flora and to make the Flora available on-line. The information produced by FNA will provide a unified framework for basic and applied research dealing with North American plants and plant products. The project has also become the model and focus of debate on the future of floristics and systematic botany.

4.2. Collaborating Institutions

CoLib's primary institutional collaborators include:

* Missouri Botanical Garden. The Missouri Botanical Garden, founded by Henry Shaw and opened to the public in 1859, is the oldest botanical garden in the United States. The Garden's Research Division conducts the world's most active program for collecting and studying vascular plants and bryophytes from throughout the world, especially those of the New World and African tropics. Because of the size of its herbarium, the size of its library's holdings, its large proportion of historical specimens, and its formal and informal collaborative agreements with numerous other institutions, the Garden is recognized as one of the most significant global repositories and resources of botanical information.

* Washington University. The Computer Science Department at Washington University is recognized for its role in the intellectual and practical development of Asynchronous Transfer Mode (ATM) network technology. Project Zeus focuses on the practical deployment of ATM technology in university and metropolitan settings. Its multipoint switching technology has become the flagship product of one of the leading commercial vendors of ATM LAN switches. Washington University School of Medicine is the nation's fourth largest institution for biomedical research and has been a leader in medical education and education reform since the turn of the century. The School of Medicine Library is one of the most active medical libraries in the country and is the center for research and deployment of advanced information systems and communications technologies within the medical school.

* Texas A&M University. The Hypermedia Research Laboratory at Texas A&M University is known for its work on large-scale hypermedia systems, hypermedia databases (hyperbases), and the application of these technologies to collaborative work processes. Its early work in sponsoring the Dexter Workshops has led to one of the most influential hypermedia models in the literature today. Texas A&M is the home of the nation's largest agricultural college and, in cooperation with the USDA and the W.M. Keck Institute for Genome Informatics, maintains the nation's collection of scientific information relating to cotton and sorghum, two of the most important agricultural plants. Texas A&M is also recognized for its work in high-performance computing, scientific computation, and scientific visualization. Its Institute for Scientific Computation has pioneered the use of wavelets for multiple-resolution visualization and compression.

* Southwestern Bell Telephone Company. The laboratories affiliated with Southwestern Bell have a long and distinguished record of research in broadband communications and collaborative systems technologies. Southwestern Bell has been a major corporate sponsor of large-scale ATM testbed development in the St. Louis area. The company is committed to expanding the availability of high-speed communications into new application domains and to new client communities, such as those participating in the CoLib Project.

The CoLib Project also includes an important array of secondary collaborators who will become the first extended tier of clients to utilize the CoLib Library These are discussed in Section. 4.4.

4.3. Information Resources

The Missouri Botanical Garden is the primary source of information for the botanical digital library during the initial phase of CoLib's development. In subsequent phases, this base will be expanded to include the resources contained in the network-accessible repositories of virtually all the nation's major botanical institutions. The information associated with the Flora of North America Project is a subset of the vast amount of information managed by the Garden and its affiliated institutions. The following are some of the specific elements that will be included in the CoLib Library.

TROPICOS. TROPICOS, developed and managed by the Missouri Botanical Garden, is the world's largest research- and education-oriented botanical taxonomic database [1]. As of September, 1993, TROPICOS contained information on 600,000 plant names, 50,000 of which are needed for FNA. It also contained 501,000 specimen records and 57,120 bibliographic records. Extensive authority files are used in the system, including 25,443 records for persons and 18,183 records for periodicals and books. In collaboration with The New York Botanical Garden and INGRES, a program is underway to significantly refine and expand the TROPICOS database.

TROPICOS supports a number of programs at the Garden, including herbarium specimen label production, all moss-related literature references published in the past 20 years, all scientific data associated with the Flora of North America Project, the index to plant chromosome numbers, and data associated with numerous worldwide projects, such as the Flora of Peru, Flora of Madagascar, Flora Mesoamericana, and Flora of China Projects.

Herbarium. The cornerstone of the research program at the Garden is its herbarium. The herbarium collection presently consists of nearly 4.5 million mounted specimens, making it the fourth largest in the United States and among the top fifteen of the world's 2600-plus herbaria. The herbarium's accession rate is about 200,000 specimens a year, the highest in the world. Use of the Garden's collection and its other facilities by researchers worldwide is high, with hundreds of research scientists using the collection each year and nearly 160 mounted specimens per day being loaned from St. Louis. The collection contains many specimens of historical and nomenclatural importance, including an estimated 150,000 type specimens[4]. An important component of CoLib is an imaging initiative that will result in the creation of a digital archive of the type specimens and rare book manuscripts associated with the Flora of North America Project.

Libraries. The Missouri Botanical Garden houses one of the most comprehensive collections of literature devoted to systematic botany and floristics. The general collection contains over 215,000 volumes, including 110,000 volumes of monographs and journals. The library has seven special book collections containing between five and six thousand rare books, including the first printing of the first edition of Charles Darwin's On the Origin of Species, John James Audubon's Birds of America, and the Linnean Collection from the 1700s that includes most of Carl Linnaeus'[5] botanical works in original editions and subsequent revisions. Among its non-book collections are more than 220,000 archived items including professional papers, historic manuscripts, photographs, oral histories, and more than 40,000 microfiche images of plant specimens from other herbaria.

Creating a digital archive of many of these manuscripts is important, because they contain the original descriptions of species corresponding to the physical type specimens contained in the herbarium collection. At least 25 percent of all citations in the current literature of systematic botany refer to publications that date from 1900 or earlier. In non-digital form, it is often necessary for researchers to physically travel to St. Louis to simultaneously view type specimens and their associated published information.

Publications. The Garden is responsible for the coordination, editing, and publication of articles and studies in numerous scientific journals and books. These include scientific journals critical to the field, such as Annals of the Missouri Botanical Garden, Monographs in Systematic Botany, Index to Plant Chromosome Numbers, and other publications. Typically, over 200 manuscripts are being processed at one time by the publications department.

The material that we have chosen for initial inclusion in the CoLib library represents a rich cross section of informational forms. However, other important collections will eventually be incorporated as well.

Herbarium Consortium. Formal and informal relationships between the Missouri Botanical Garden and the nation's largest herbaria will enable future expansion of the CoLib Project to include the country's most important sources of botanical information. The ten institutions are the Academy of Natural Sciences, the Bishop Museum, the California Academy of Sciences, the Field Museum of Natural History, Harvard University, the New York Botanical Garden, the Smithsonian Institution, the University of California at Berkeley, the University of Michigan, and the Missouri Botanical Garden. The aggregate holdings of these herbaria are over 28 million specimens, including 84 percent of the North American type specimens. Their collections are the most frequently used in the United States, and they maintain the most active specimen loan program in the world [1].

These herbaria are similar to the Missouri Botanical Garden in the kinds of information that they manage. Each has associated libraries, publications departments, laboratories, etc. The primary difference among them is the plant groups or geographic areas upon which their collections and scientific research focus. Other differences are the extent to which their information exists in digital form and the sophistication of their computing infrastucture. However, all are initiating large-scale, individual and cooperative efforts to bring their information on-line, and arrangements are in place for the inclusion of their information in CoLib.

W. M. Keck Center for Genome Informatics. Under the auspices of the USDA, the Keck Center maintains the national working collection of approximately 7,000 species for the genus Gossypium (cotton)--economically, one of the nation's most important agricultural crops. Data and herbarium specimens are maintained on the A&M campus for each species. The Keck Center is transitioning this information into electronic, network-accessible form and is expanding the information base to include image data and molecular genetic data. A similar effort is underway with sorghum, where the responsibility for a national repository is split between Texas A&M University and the University of Georgia. The sorghum collection includes approximately 20,000 accessions.

The inclusion of these information repositories in a botanical digital library creates two important interfaces. Washington University School of Medicine and the Keck Center have important ties to the Plant and Human Ge-nome Initiative, thereby creating a unique and potentially valuable interface between the botanical and agricultural research communities and the biomedical research community. Perhaps more important to the broader issues relating to the conduct of science, using digital libraries as the context for sharing information between molecular biologists and botanical systematists can help promote the convergence of these disciplines [30].

4.4. Client Groups

The Flora of North America Project is a gold mine of information for the botanical research community. As we have indicated, however, the CoLib Project seeks to enable more than this single client group--it intends to use FNA to catalyze the formation of a large-scale, distributed, botanical digital library that will be useful to many other people and communities. In addition to the primary collaborators, we will include among our clients one of the nation's major science centers, two magnet schools, a major national educational consortium, the National Biological Survey, and the pharmaceutical industry.

St. Louis Science Center. The St. Louis Science Center, with over 1.7 million visitors a year, is ranked third among science centers in the U.S. and fourth in the world. The Center collaborates with the Missouri Botanical Garden, the St. Louis Zoo, and the St. Louis public schools on a wide range of educational programs [6,17]. The Center is a recognized leader in the use of multimedia educational materials in its education, community outreach programs, and museum exhibits.

Curriculum Development Collaborative. The Curriculum Development Collaborative is a national effort involving the Ontario Institute for Studies in Education, the University of California at Berkeley, Vanderbilt University, and the St. Louis Science Center. The goals of the project are to deploy technology-based curricula in science-math magnet schools throughout the country and to increase collaboration between students and active research communities [6,17]. The CoLib testbed will include two St. Louis-area math-science magnet schools associated with this national initiative: the Mullanphy Botanical Garden Investigative Learning Center (an elementary school affiliated with the Missouri Botanical Garden) and the St. Louis Science Center Middle School and High School.

Inclusion of these institutions and programs as part of CoLib enables a diverse program of research on the use of digital libraries in education. Since the Missouri Botanical Garden has an active graduate program and most of the botanists at the Garden also share academic appointments at Washington University or the University of Missouri, the addition of these magnet schools provides an educational interface that spans grades K-12, undergraduate and graduate programs, and general community outreach.

National Biological Survey. The U.S. Department of the Interior has formed a new agency, the National Biological Survey (NBS). The mission of this new agency is to gather, analyze, and disseminate the information necessary for the wise stewardship of our nation's natural resources and to foster an understanding of our biological systems and the benefits they provide to society. The NBS will have responsibilities to inventory, map, and monitor biotic resources. It will also help coordinate basic and applied research on species, populations, and ecosystems, and create a basis for sound environmental decision-making. The NBS will provide, for the first time, an organized framework for collaboration among federal, regional, state, and local organizations over an extensive and common base of biological information [15]. The CoLib botanical digital library will significantly contribute to this effort.

Pharmaceutical Industry. The rosy periwinkle (Catharanthus roseus) is a rare plant native only to Madagascar. However, it produces two alkaloids, vinblastine and vincristine, that cure most people of two of the deadliest of cancers, Hodgkin's disease and acute lymphocytic leukemia. The income produced from the manufacture and sale of these two substances alone exceeds $180 million a year. Most medicines are obtained directly from plants or are synthetic versions of molecules first discovered in nature. However, in the case of these types of anticancer agents, fewer than 3 percent of the world's flowering plants have been examined for alkaloids [29]. The Missouri Botanical Garden, in cooperation with the National Cancer Institute and the Monsanto Company, collects plants around the world to be screened for potential therapeutic value. Biodiversity is the foundation of biotechnology, and we expect that the pharmaceutical industry will extensively utilize the information contained in the CoLib library to "prospect" for new drugs.

4.5. Communications Infrastructure

CoLib will expand Washington University's experimental ATM network to support research on digital libraries. Project Zeus is a long-term effort aimed at introducing advanced Asynchronous Transfer Mode (ATM) network technology into the university campus and metropolitan St. Louis area [2]. An ATM switch, developed in conjunction with the Zeus Project, has been licensed to SynOptics Communications and is now being sold commercially. Eight switches are currently in place in St. Louis, serving about fifty users.

Each of the six main CoLib sites (Missouri Botanical Garden, Washington University Medical School Library, Washington University Computer Science Department, Texas A&M University Computer Science Department, Institute for Biosciences and Technology, Institute for Scientific Computation) is deploying SynOptics switches. Each switch supports 16 links operating at 155 Mb/s. The switches are being used to connect researchers at each site to local servers and to other sites in the same geographic location.

While long-distance ATM services among collaborating sites are not readily available yet, we anticipate that situation will change dramatically in the near future. Initially the project will rely on existing Internet connectivity to support collaboration between the St. Louis and A&M sites and between Consortium institutions and FNA scientists. CoLib will move to higher-speed connectivity as inexpensive ATM services become available, either commercially or through anticipated developments within NSF-Net, Southwestern Bell, or other telecommunications companies.

4.6. Research Program

The advent of digital libraries poses a staggering array of research problems, and our understanding of the technological and social complexities of digital libraries will not progress in the absence of sound, well-designed theoretical and practical studies. Fundamental research is the overarching mission of the CoLib Project. Its research program is focusing on three major areas: (1) collaborative hypermedia library systems, (2) interactive multimedia computing and communications systems, and (3) librarianship in the botanical digital library.

Each of the projects in these areas requires an interdisciplinary, team approach and, in most cases, an interinstitutional collaboration. The specific details of the research projects will be presented in future publications; here, we briefly describe the major issues that are being addressed:

* Collaborative Hypermedia Library Systems. It is important to challenge the limited conception of information-seeking that colors much of the work currently being done on digital libraries. Digital libraries permit a departure from the archaic indexing and presentation schemes derived from static, paper-based libraries, and we must avoid retaining the limitations inherent in the representational medium of print on paper. Collaborative hypermedia systems enable flexible and efficient mechanisms for locating, organizing, and personalizing information. They also permit multiple conceptual mappings over an information space and allow multiple users engaged in a common task to interact synchronously or asynchronously over shared resources. This is clearly one of the most promising computer technologies for use in digital libraries of the future [7-9, 18, 19, 23].

In CoLib, we are extending collaborative hypermedia technology into the domain of digital libraries through research that focuses on hypermedia-in-the-large. We define hypermedia-in-the-large as open, extensible, large-scale systems that support hypermedia-based collaboration across high-speed, wide-area networks [7,10]. We are also examining the use of interface agents in large-scale, hypermedia-based digital library systems [16]. Finally, we are studying issues relating to the use of hypermedia digital library systems in the computer-augmented environment where users interact with information through whiteboards, laptop computers, tablets, and personal communicators [21, 22, 28].

* Interactive Multimedia Computing and Communications Systems. Distributed multimedia digital library systems of the future will require computing platforms that are closely coupled to high-speed networks and capable of processing a potentially large number of continuous high-bandwidth data streams. They will be expected to handle huge computational demands for everything from handwriting and speech recognition to real-time image processing. They will need to support massive amounts of storage as applications make greater use of visual information. These demands require new ideas in system architecture and a rethinking of how operating systems, computer hardware, and networks interact.

In CoLib, we are examining issues relating to multimedia conferencing, focusing on ways of optimizing the interplay between hardware and collaborative software in order to enable n-way audio and video conferencing in the context of a digital library. We are also studying the design of flexible, high-performance computing platforms that will be required by the types of distributed multimedia applications that will be developed in CoLib. Finally, we are examining the feasibility of using scalable ATM network technology to deliver multimedia digital library materials and to support the types of interactivity that will occur over digital library information [2, 26, 27].

* Librarianship in the Botanical Digital Library. Historically, research libraries provided a service to the research community on the basis of internal workings that were essentially isolated from the communities that they served. Networked information services have begun to change this. Today, the library is often a subset of these services and is involved, to varying degrees, in informatics research, computing services, and technology education. As a result, libraries have become flatter, less formalized institutions that are no longer the specialized purview of trained librarians.

In CoLib, we hope to strengthen the integration of libraries into the organizational challenges of large-scale, internetworked information systems. Research in this area focuses on combining technologies with work practices to create new possibilities for librarianship. The Flora of North America Project is an example of editing- and authoring-in-the-large and, as such, allows CoLib to explore new tools and roles for a librarianship that is better integrated with community research, decision-making, and communication. We are examining the use of discipline-oriented structured documents and collaborative publication in the setting of a botanical digital library. We are also studying ways to enhance image-based collections management through wavelet compression techniques for multiple-resolution imaging of herbarium specimens, cataloging and retrieval by image features, and 3D visualization [3-5, 24, 25].

We embody the fusion of these fundamental technologies and processes in a concept of the Digital Library Machine. Digital Library Machines are comprised of hardware and software tools that enable the collaborative workgroup practices of a community. Whether virtual or real, Digital Library Machines support communities of scholars in the conduct of their day-to-day intellectual activities, including collaborations with other researchers, distribution of research results, publication in digital journals, access and personalization of existing literature, and the education of future researchers and students.

5. Discussion

We believe that the CoLib Project has important implications both for botany and for research on digital libraries.

5.1. Implications for Botany

We believe that the emergence of digital libraries and advanced computing and communications technologies will fundamentally change the practice of botany and the way in which botanical information is used. The delivery of high resolution plant specimen images across gigabit networks will enable the "virtual herbarium"--scientists and students anywhere in the world will be able to retrieve images of plants, rotate them in 3D, and zoom in on key structural elements. "Distance microscopy" will enable physical specimens to be examined closely even at remote sites. Pen computers, global positioning systems, remote sensors, and microminiature color CCD cameras will change the way information is gathered in the field--high-resolution, 3D images will be collected in situ and immediately sent to the virtual herbarium for processing along with associated collection data.

Scientists will be able to forge links to related information and store these links in private hyperbases. They will be able to personalize the shared information space with annotations and reuse it for new purposes. If it is necessary to collaborate with colleagues over this material, distributed multimedia computing systems will be used for real-time interactions. In the research labs, teams will gather in computer-augmented conference rooms, where large, interactive displays will be used to plan their research, discuss results, and share information. Computations, models, and simulations will be as much a part of the information space as images and text. New client communities and new discourses of study will be enabled. We predict that an era of "telebotany" and "virtual science" will emerge based on the use of digital libraries of botanical information [1,11, 12, 14, 20, 21].

5.2. Implications for Digital Libraries

There is an important but subtle reason why research in a botanical digital library can have a significant and wide-ranging influence on other lines of digital libraries research. The key lies in understanding information as an activity and systematic biology as a form of librarianship.

Systematic botany attempts to group organisms according to their physical characteristics in a way that reflects their evolutionary relationships. This results in taxonomic categories that help to distinguish biological species. Species, however, are not defined in terms of their intrinsic properties, but in relation to other groups. It is the goal of systematic biology to infer these relationships through complex processes of categorization.

This synthesis is subject to constant revision through debates and developments in supporting biological disciplines. Systematic botany thus pursues categorization on a par with inquiry--categories are not a corollary of information but, instead, are a mode of creation and argumentation, an activity that provides the nomenclature and unifying context through which botanical information and enigmas are communicated within and beyond the scientific community. In this way, botanical systematics performs a large-scale, distributed, and collaborative scientific librarianship--it provides a framework for the vast diversity of information vital to this scientific discourse.

Within many disciplines--linguistics, cognitive science, anthropology, psychology, etc.--information is conceived as an activity. It is not the component elements stored on paper or on a networked fileserver; instead, information is integrally constituted in the activities that involve these elements. Libraries have traditionally provided an organizational frame in which to use and develop the diversity of information. Digital libraries of the future will likewise be more than mere repositories for disembodied ideas--they will be inseparable from the activities that model or stratify information. As a discipline, systematic botany is in a unique position to help our understanding of these issues. What better circumstance to study digital libraries than in the context of a science whose essential activity is the progressive and natural categorization of diversity?

6. Conclusion

This paper has provided a high-level overview of the CoLib Project. In summary, the following are among the major goals we hope to achieve:

* CoLib will increase our understanding of hypermedia-in-the-large, agency, and ubiquitous computing in the context of a digital library;

* CoLib will increase our understanding of the use of broadband communications technology in the digital libraries setting;

* CoLib will promote the inter- and intradisciplinary development of botanical information;

* CoLib will make significant corpora of botanical information more readily available to diverse communities that have been isolated until now from this important information;

* CoLib will enable the transition from a specimen- and paper-based herbarium to the virtual herbarium and telebotany and will become a central clearinghouse for floristic taxonomic information for the world;

* CoLib will create a large-scale testbed for fundamental and applied research on scientific digital libraries that is open to many research communities and methodologies;

* CoLib will promote the fusion of systematic and molecular biology.

* CoLib will help botanists and the public come to understand what a botanical digital library is, how it is used, and how it can influence the fundamental conduct of science within the discipline. It will yield an understanding of digital librarianship.

Research in the setting of a botanical digital library will allow the science and librarianship that is intrinsic to the practice of systematic botany to be extended to the broad spectrum of work underway on digital libraries. We believe that CoLib will contribute to the basic understanding needed to build digital library systems and organizational infrastructures to advance libraries and botany into the twenty-first[ ]century.


Special thanks to Peter Raven, Chris McMahon, Debbie Kama, Jim Solomon, David Brunner, Connie Wolf, Doug Stevens, Jerry Cox, Gil Jost, Mary Lamon, Dwight Crandell, Christine Roman, Jim Myers, Mark Radle, Al Winterbauer, Jim Carpenter, Mike McCarthy, Cindy Kunz, Myrna Harbison, Frank Almeida, Pat Gunn, Paul Schoening, Martha Hill, and Jim Smith for their help in launching the CoLib Project.

References 1. Cooley, G.P., Harrington, M.B., and Lawrence, L.M. (Eds.). 1993. Analysis and Recommendations for Scientific Computing and Collections Information Management of Free-Standing Museums of Natural History and Botanical Gardens, Vol. I, II. MITRE Corporation, McLean, VA. (NSF-sponsored study.)

2. Cox, J., Jr., Gaddis, M., and Turner, J. 1993. Project Zeus: Design of a broadband network and its application on a university campus. IEEE Network, pp. 20-30.

3. Frisse, M.E. 1988. Searching for information in a hypertext medical handbook. Communications of the ACM, Vol. 31, No. 7, pp. 880-886.

4. Frisse, M.E., Marrs, K., and Schoening, P.A. 1992. A method for publishing genomic maps. In Proceedings of the Sixteenth Annual Symposium for Computer Applications in Medical Care (Washington, D.C.), pp. 376-382.

5. Furuta, R.K. and Stotts, P.D. 1989. Separating hypertext content from structure in Trellis. In Proceedings of the Hypertext '92 Conference (University of York, June, 1989).

6. Lamon, M., Lee, E., and Scardamalia, M. Cognitive technologies and peer collaboration: The growth of reflection. In Design Experiments: School Restructuring Through Technology. Collins, A. and Hawkins, J. (Eds.). Cambridge University Press, New York. (To appear.)

7. Leggett, J.J. and Schnase, J.L. 1994. Viewing Dexter with open eyes. Communications of the ACM, Vol. 37, No. 2, pp. 76-86.

8. Leggett, J.J., Schnase, J.L., Fox, E.A., and Smith, J.B. (Eds.). 1993. Proceedings of the NSF Workshop on Hyperbase Management Systems. Department of Computer Science Technical Report No. TAMU-HRL 93-002, Texas A&M University, College Station, TX.

9. Lokken, S.T. and Leggett, J.J. 1993. Document representations in hypermedia library systems. (In prep.)

10. Malcolm, K.C., Poltrock, S.E., and Schuler, D. 1991. Industrial strength hypermedia: Requirements for a large engineering enterprise. In Proceedings of the Hypertext '91 Conference (San Antonio, TX, Dec.), pp. 13-24.

11. Milton, E.O., Ferris, H., Fortuner, R., and Diederich, J.R. (Eds.). 1990. Articial intelligence and modern computer methods for systematic studies in biology (ARTISYST). University of California at Davis, Napa, CA. (NSF-Sponsored Workshop)

12. Morain, S. 1993. Emerging technology for biological data collection and analysis. Annals of the Missouri Botanical Garden, Vol. 80, No. 2, pp. 309-316.

13. Morin, N.R. 1991. Beyond the hardcopy: Databasing Flora of North America information. In Proceedings of the International Congress for Systematic and Evolutionary Biology IV (Portland, OR), pp. 973-980.

14. Morin, N.R., Whetstone, R.D., and Tomlinson, K.L. (Eds.). 1989. Floristics for the 21st Century. Monographs in Systematic Botany from the Missouri Botanical Garden, Vol. 28. The Missouri Botanical Garden Press, St. Louis, MO.

15. NRC. 1993. A Biological Survey for the Nation. National Academy Press, Washington, DC.

16. Sánchez, J.A. 1993. HyperActive: Extending an Open Hypermedia Architecture to Support Agency. M.S. Thesis. Department of Computer Science, Texas A&M University, College Station, TX.

17. Scardamalia, M. and Bereiter, C. 1993. Technologies for knowledge-building discourse. Communications of the ACM, Vol. 36, No. 5, pp. 37-42.

18. Schatz, B.R. 1993. Building an electronic community system. In Readings in Groupware and Computer-Supported Cooperative Work: Assisting Human-Human Collaboration. Baecker, R.M. (Ed.). Morgan Kaufmann Publishers, New York, pp. 550-560.

19. Schnase, J.L., Leggett, J.J., Hicks, D.L., Nürnberg, P.J., and Sánchez, J.A. 1993. Design and implementation of the HB1 hyperbase management system. Electronic Publishing - Origination, Dissemination and Design, Vol. 6, No. 1, pp. 1-29.

20. Schnase, J.L., Grant, W.E., Maxwell, T.C., and Leggett, J.J. 1991. Time and energy budgets of Cassin's Sparrow (Aimophila cassinii) during the breeding season: Evaluation through modelling. Ecological Modelling, Vol. 55, No. 4, pp. 101-135.

21. Schnase, J.L. and Leggett, J.J. 1989. Computational hypertext in biological modelling. In Proceedings of the Hypertext '89 Conference (Pittsburgh, PA, Nov.), pp. 181-198.

22. Schnase, J.L., Leggett, J.J., Hicks, D.L., Nürnberg, P. J., and Sánchez, J.A. 1994. Open architectures for integrated, hypermedia-based information systems. In Proceedings of the 27th Annual Hawaii International Conference on System Science (HICSS '94) (Maui, HI, Jan.), pp. 386-396.

23. Schnase, J.L., Leggett, J.J., Hicks, D.L., and Szabo, R.L. 1993. Semantic data modeling of hypermedia associations. ACM Transactions on Information Systems, Vol. 11, No. 1, pp. 27-50.

24. Stotts, P.D. and Furuta, R.K. 1989. Petri-net-based hypertext: Document structure with browsing semantics. In ACM Transactions on Information Systems, Vol. 7, No. 1, pp. 3-29.

25. Stotts, P.D., Furuta, R.K, and Ruiz, J.C. 1992. Hyperdocuments as automata: Trace-based browsing property verification. In Proceedings of the European Conference on Hypertext (ECHT '92), pp. 272-281.

26. Turner, J.S. 1992. Managing bandwidth in ATM networks with bursty traffic. IEEE Networks, Vol. 6.

27. Turner, J.S. 1988. Design of a Broadcast Packet Network. IEEE Transactions on Communications, Vol. 41.

28. Weiser, M. 1993. Some computer science issues in ubiquitous computing. Communications of the ACM, Vol. 36, No. 7. pp. 74-84.

29. Wilson, E.O. 1992. The Diversity of Life. Harvard University Press, Cambridge, MA.

30. Wilson, E.O. and Raven, P.H. 1993. A fifty-year plan for biodiversity surveys. Science, Vol. 258, pp. 1099-1100.

Authors Addresses

John L. Schnase, Edward S. Metcalfe, Edward L. Cunnius, Scott W. Hassan, and Mark E. Frisse: Advanced Technology Group, School of Medicine Library, Washington University School of Medicine, 660 South Euclid Avenue (Campus Box 8132), St. Louis, Missouri, 63110. {schnase, metcalfe, edc, hassan, frisse};

John J. Leggett and Richard K. Furuta: Hypermedia Research Laboratory, Department of Computer Science, Texas A&M University, College Station, Texas, 778843. {leggett, furuta};

Nancy R. Morin: Missouri Botanical Garden, P.O. Box 299, St. Louis, Missouri, 63166.;

Jonathan S. Turner: Department of Computer Science, Washington University, One Brookings Drive (Box 1045) St. Louis, Missouri, 63130.;

Leland Ellis: W.M. Keck Center for Genome Informatics, Institute of Bioscience and Technology, Texas A&M University, 2121 Holcombe Avenue, Houston, Texas, 77030.;

Michael S. Pilant and Richard E. Ewing: Institute for Scientific Computation, College of Natural Sciences, Texas A&M University, College Station, Texas, 77843. {mpilant, ewing}

[1 ]A species is a category of biological classification generally comprising organisms capable of interbreeding with one another in natural conditions, but not with members of other species. The number of living species of all kinds of organisms currently known is approximately 1.4 million. This includes 250,000 plants, 750,00 insects, 280,000 other animals, and 132,500 species of protozoa, algae, fungi, viruses, and bacteria. (This definition for species best applies to sexually reproducing organisms and is often referred to as the "biological-species" concept. Other definitions for species exist, e.g., genetic definitions, ecological definitions, etc. The concept of species is a complex and natural frame for argumentation.)

[2 ]Flora refers to the plants occurring within a given region as well as to a publication describing those plants. To distinguish between the two, the word is generally capitalized when a publication is meant. A Flora may contain anything from a simple list of the plants occurring in an area to a very detailed account of those plants. They almost always contain scientific and common names, literature references, descriptions, habitats, geographical distribution, illustrations, flowering times, and miscellaneous notes. They also may include chemical information, chromosome numbers, population occurrences, as well as identification devices, such as "keys," that consist of mutually exclusive statements.

[3] Herbarium specimens consist of pressed, dried plant specimens that have been mounted on 30x40 cm sheets of archival-quality paper with a label that indicates date and place of collection, collector, and associated information. The sheets are given an accession record and stored in sealed cabinets. When properly preserved, herbarium specimens retain indefinitely the features needed for accurate identification. Herbarium specimens are critical to documenting and studying the world's flora. The oldest portions of the Missouri Botanical Garden's collection date from the mid-1700s, and include specimens collected by Carl Linnaeus, Charles Darwin, and George Engelmann, one of the foremost botanists of the 19th century.

[4] A type specimen is a specimen that is cited in the original publication to be the basis of a new species name. It therefore determines the correct application of that name and becomes a critically important element of scientific information. Types are consulted by taxonomists to resolve discrepancies between descriptions of species and to prevent the publication of further discrepancies. Since types are the basis of plant names, they must be consulted, along with original descriptions, as part of any sound monographic or floristic research.

[5] In 1753, Carl Linnaeus published Species Plantarum, which established the binomial system of naming plants that is followed today. The most frequently used categories are family, genus, and species. Naming is governed by a set of rules established by an international committee. Each new species must be described in Latin and published in a recognized scientific journal.