Digital Library as a Foundation for Decision Support Systems

Sulin Ba[1], Aimo Hinkkanen[2], and Andrew B. Whinston[1]

[1] Department of Management Science and Information Systems, Graduate School of Business, CBA 5.202, The University of Texas at Austin, Austin, TX 78712, USA, {sulin, abw}

[2] Department of Mathematics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA,


Organizations often face complicated decision making problems. As the corporate knowledge is becoming more and more dispersed, there is a need to analyze organization wide issues that incorporate a wide range of knowledge representations and data types, and this can be supported by computers.

The growth of distributed computing and the emergence of research on digital libraries provide new insights to decision support systems (DSS) research. In this paper, we look at digital library from a business point of view: what services provided by a digital library would be particularly useful to industries? The digital library in this context is a repository of executable documents that has different parts scattered around on different platforms across the network. We propose to utilize this digital library to build an enterprise wide problem solving system that is based on executable documents that contain knowledge represented in a mathematical form, given that a considerable amount of company information is mathematical. The system is aimed to answer "what if" and "what to do" questions and to provide explanations for the proposed approach.

Keywords: Decision support systems, digital library, document composition, organizational decision support.

1. Introduction

During the past 25 years, great progress has been made in research and commercial applications of decision support systems (DSS). Conceived originally as the application of computing technology to support decision making, DSS research focused on the implementation of tools from operations research. By allowing end users to state business problems in a higher level language and use software to translate requests, build suitable models, access required databases, integrate and execute models, and finally provide answers to the user, decisions can be made in a more effective manner.

While there are still many challenges and opportunities within this paradigm, significant changes have taken place in the environment surrounding DSS so that radically new approaches are required. These changes are based on end-user demand and technological changes. Instead of focusing on highly structured aspects of company operations that can be modeled using operations research tools, there is a need to analyze organization wide issues that incorporate a wide range of knowledge representations and data types. Companies face complex decision problems which must be resolved collectively by several individuals and that involve multiple phases, including the specification of the problem, discovery and support for alternatives, eventual solution and implementation. There is, in effect, a complex process that underlies decision making that can be supported by computers.

The growth of distributed computing and the emergence of research on the "digital library" provide new insights to DSS research. The main purpose of this project is to build an enterprise wide problem solving system that is based on executable documents that contain knowledge represented in a mathematical form. The system is aimed to answer "what if" and "what to do" questions and to provide explanations for the proposed approach. The enormous amount of information scattered around, no matter in what form, will be stored in a digital library that serves as a repository of executable documents. When users pose a query, the system extracts all, and only, the relevant information from the library, executes them, and returns an answer which is based on a composite model assembled from different pieces of knowledge.

Since mid-80's, electronic documents have become more powerful and more widely available each year. The term electronic document has come to encompass a wide variety of knowledge forms including text (reference volumes, books, journals, newspapers, etc.), illustrations, tables, mathematical equations, scientific data, scanned images, video, voice, hypertext links, and animation [2]. In the mean time, digital networks and the number of users are growing exponentially. The massive information sources available on the network (i.e., the Internet) have formed the basic ingredient of a digital library. With the National Science Foundation's initiative [13], research on digital library may yield revolutionary results as to how knowledge is stored and disseminated.

Organizational knowledge is very complex in terms of the level of details and the level of analysis. It is of heterogeneous types that could be dynamic or static, qualitative or quantitative. For example, while the knowledge base for a company's production and inventory distribution system is largely quantitative, some functional relationships in the same company could be represented in a qualitative form: "If you increase partnership, productivity will increase." There could also be some statements which are expressed in the logic form such as "Don't cut off service to an elderly customer before x months." Organizational knowledge is increasingly becoming more and more distributed and heterogeneous, which makes the digital library concept valid in an organizational setting. In our earlier work [8], we have developed some idea of using compositional modeling approach to model organizations that contain heterogeneous knowledge (functional relationships, company specific knowledge, internal empirical data stored in databases, etc.). The system is aimed to predict or explain the performance of organizations, answering "what if" and "what to do" questions in particular. Taking the same idea, we can develop an organization wide problem solving system that answers user queries by assembling pieces of knowledge, which we call model fragments, that are stored in our digital library.

The digital library in our context can be thought of as a repository of executable documents that has different parts scattered on different platforms across the network. The problem solving process can be tied to identifying parts of documents as mathematical formulas (or groups of formulas) that can be executed. Each part is a model fragment that can be composed with others to form an executable compositional model. When users retrieve information from the library as to a particular query, the system should be able to isolate the relevant model fragments, execute them, and return a sufficient answer to the user.

However, the term digital library is far more complex than simply a set of existing documents that are interconnected and digitized. The capture of data of all forms, the categorization and organization of electronic information in a variety of formats, the browsing, searching, filtering, abstracting, and summarizing techniques, the combination of large volumes of data, and the utilization of distributed databases, are all in the realm of digital library research. Our research is based on all these but goes beyond this scope. We propose to use digital library to do enterprise modeling, that is, to look at digital library from a business point of view: what services provided by a digital library would be particularly useful to industries? How could they enhance competitiveness, time-to-market [6]? What would happen to the company's profit if it invests in a new product, etc.? The main reason for choosing mathematical documents as our first step in developing such a system is that a considerable amount of company information is mathematical. For example, spreadsheets in accounting department, empirical operational data. Logical type data and qualitative relationships are all in the realm of mathematics.

We propose to use Mathematica to represent electronic documents. Mathematica is a completely integrated software package capable of numeric, symbolic and a wide range of graphical computation. It offers a flexible structure for a great deal of symbolic manipulations and numerical calculations. In addition, many of built-in functions in Mathematica can be used as building blocks to create users' own customized programs, routines, or applications. The notebook interface of Mathematica allows users to mix unlimited amounts of text, graphics, equations, and even sound into an organized, live, presentation quality document. This document can be saved, edited, and read by any other computer having a notebook version of Mathematica. In our approach, documents, as well as formulas inside documents, are represented in the Mathematica language.

Some work has been done in developing mathematical software to integrate numerical computation, mathematical typesetting, computer algebra and "technical electronic mail" (mail that contains formatted mathematical expressions) [1]. The CaminoReal system developed in Xerox PARC can handle direct manipulation of mathematical expressions, whether as part of technical documents or as inputs and outputs of computational engines. There are two unique features in CaminoReal: first, the tight coupling of its computation facilities with a sophisticated document system which opens interesting opportunities for computed and interactive documents; second, its access to computational "algebra servers" on the local network. The framework we are proposing is a step further in the sense that the compositional method can greatly enhance the system's flexibility in answering user queries and manipulating documents in a distributed fashion.

In this paper, we intend to point out how digital library could be developed and utilized as a foundation for enterprise wide decision support systems. In section 2, we discuss some major issues involved in organizing the documents and maintaining consistencies in the library. The algorithm design issues, such as data representation and time scales, are discussed in section 3. Section 4 focuses on the qualitative optimization aspect which is important for a decision support system operating in a heterogeneous environment. Section 5 concludes the paper.

2. Documents in Digital Library

In our framework, we mainly focus on mathematical documents, that is, documents that contain mathematical formulas and equations, and represent them using Mathematica. These documents will be interconnected in a way that relevant ones for a particular query can be pulled out to form a composite model that is sufficient to answer the query. Some specific issues in organizing the documents are discussed in the following subsections.

2.1. Heterogeneity of Documents

An important issue that needs to be addressed is how to deal with the heterogeneity of documents. In most cases, documents have different formats, which makes the interchange of documents a difficult task. To make it worse, the language that represents knowledge could be very different from document to document. Although we are only dealing with mathematical documents, relationships inside these documents could be very diverse. They could be of logic form, qualitative form, or quantitative form. For example, the logic inference rules in an existing expert system may contain a series of Horn clauses, whereas a spreadsheet is represented by cells containing numbers and/or formulas. Besides tables, formulas, and equations, there is always discussion text, such as the origin of data, the analytical procedure, or the method of data collection. With the development of multimedia representation, documents could include voice and animation as well. These all increase the heterogeneity of the documents in our digital library. The question is how to separate these different pieces in a way that they can later on be assembled to answer queries in a suitable context.

2.2. Composition of Documents

One important question to be answered in this framework is, given a query, how to decide which fragments to use and how to put them together. In the modeling process, it is crucial to focus on the relevant aspects of the problem of interest, that is, to include all the relevant objects and constraints, exclude irrelevant ones and ignore unnecessary details.

The compositional modeling method we are proposing (for a discussion and an example, see [5]) contains three levels. First, some model fragments may be combined as components before any query takes place at all, since these fragments may often appear together with a meaning. Therefore, they can be combined as reusable building blocks and stored in the library independently of query execution. That is, those model fragments will simply be grouped together and the combination takes place at the same time as the execution of the whole model regarding a particular query.

At the second level, after a query is issued, we need to find the appropriate model fragments and/or components that are sufficient and consistent to model the situation of interest. A challenging issue is how can documents from heterogeneous sources be found? Some distributed index structures might be needed to complete this task. For each query, oftentimes, one model will suffice. However, multiple models may appear to be suitable or possible. In this case, some heuristics are needed to decide which model to take. For example, we could choose one that has the smallest number of fragments/components.

The execution of the model is the third level in combining a set of documents. This is a rather complicated process since new types of combination may be needed. For example, the output of one fragment may be the input of another. Some questions arise here: should the execution be done sequentially or in parallel? how is the convergence of the execution ensured if it is done in parallel? These are challenging problems that have to be worked out.

2.3. Maintaining Model Consistency

Some research on digital library has been concerned with the integration of documents in different formats that are created using different hardware/software, which is also one of our concerns. However, since we are linking and executing documents, we have to be concerned with the integration of the contents as well. With all the model fragments in the library, we need to find a way to isolate all the coherent and adequate composite models for a particular query. Coherent means that all the assumptions and user-posed constraints are satisfied, whereas adequate means that the composite model is able to answer this query, taking into account all the assumptions that need to be included in the model.

Since there is an enormous amount of information in the library that is scattered across the network and faces constant update, it is inevitable that inconsistencies exist, which will result in contradictory models. Therefore, it is crucial for the system to maintain consistency in each composite model. The underlying rationale is that each model fragment has its own governing assumptions, that is, the context, or the set of conditions, in which it holds. [This is built on the idea of assumption-based truth maintenance systems, developed by de Kleer in artificial intelligence area (see [4]). However, we will not discuss the technical details in this paper.] When choosing model fragments to be combined, these assumptions or constraints have to be satisfied, i.e., the chosen model fragments have to be valid in that context. We cannot combine model fragments which are inconsistent with each other. For example, when we do forecasting for sales, we have to take season into account, which means that for a summer forecast, we need to use the data and relationships that both hold in summer, while another set of data and relationships is needed for a winter forecast. The idea is that the model fragments to be combined have to be carefully chosen so that they are consistent with each other and satisfy the modeling assumptions and constraints.

While the system needs to maintain consistencies when choosing model fragments, it also needs to give explanations for its answers, that is, for each answer it returns, the problem solving system must identify responsibility for its conclusions by providing rational explanations of how it reaches the conclusions. For example, it is not adequate for the system to simply tell an engineer that his new design for an airplane does not work. Instead, if the system points out that no material will stand the projected stresses imposed by the design, the engineer will have a way of going back and modifying the design. In other words, an enterprise wide decision support system must have the capability of tracing what assumptions or constraints lead to the conclusion.

2.4. "Cataloging" of Documents

Traditional libraries use catalogs to organize their documents. Each document is assigned a classification number according to which documents are organized and located. These classification systems have an internal structure which reveals the relationships between different categories and/or documents. For example, in the Library of Congress classification system, each category has a classification number assigned to it. Categories are arranged in a hierarchical order by attaching more digits to a category than its super category. With the massive amount of documents in our digital library, we also need some sort of "catalog" to describe the relationships between documents. A graph/tree structure may be needed to show the interrelationships and dependencies among different pieces of documents (model fragments) which also help decide how the documents should be combined.

2.5. Incorporating SGML

One main strategy emerging for making documents computable across applications and platforms is tagging languages. So far, the most widely used tagging language is SGML. It is a document encoding mechanism designed to enable the "markup" of information content of documents. A basic design goal of the Standard Generalized Markup Language (SGML) was to ensure that documents encoded according to its provisions should be transportable from one hardware/software environment to another without loss of information. The structure of documents therefore can be understood or interpreted by other software applications that have SGML data interpretation capability. SGML provides a method for describing the relationship between the structure and the content of a document. It also enables documents to be stored and exchanged independently of formatting, software applications, and computing platforms.

The Electronic Publishing Special Interest Group has begun to refine its SGML application for markup of complex mathematics and tables [14], which suggests the incorporation of SGML in our problem solving system. The notion of document type definition (DTD) introduced by SGML enables documents to be formally defined by their constituent parts and their structures. For example, a document designer might write a DTD that enables the analytical discussion of a mathematical paper to be marked up as such. The primary purpose is that the text identified as forming part of a paper's analytical discussion can then be organized in a particular way when the SGML source document is combined with other SGML documents, giving explanations for the particular model derived. If, at some later date, it is decided that this part of analytical discussion is useful in another context, it is easily done to combine it with other documents by extracting this part of discussion.

There are several reasons for proposing SGML to be used in our problem solving system. First, since the library operates in a heterogeneous environment containing documents scattered on different platforms across a heterogeneous network (i.e., a set of computers connected using different character encoding schemes), SGML is well suited for documents interchanging in our context. Second, as we mentioned in section 2.3, documents in the library need to be constantly updated. Documents in a structured, machine readable form are easily modifiable by people. Third, the compositional approach in our problem solving system requires documents to be used in multiple ways and for multiple purposes. Consider a company's annual report that has lots of data in it as an example. These data might be used as input to a spreadsheet analysis. The spreadsheet, in its turn, can be presented in different ways. Those data can also be exported to a database system. Finally, SGML guarantees documents to be independent of the life-time of an application, which is an important property for documents such as technical manuals [7].

3. The Design of Algorithms

Algorithms and software are needed to achieve the composition of documents in a suitable environment and format. This includes two levels: the composition of documents and the execution of documents. Composition, as we mentioned above, refers to choosing appropriate fragments and/or components. Since we are focusing on mathematical documents, the isolation of mathematics in each document gives rise to problems that do not arise in "ordinary" digital library because we need to link and execute the fragments.

The execution of documents is part of the problem solving process in the digital library. Given a query of a suitable type/format, we need to put together a model --- a combined virtual document in the spirit of digital library --- for the user. Finding appropriate fragments is the first step to take. We need to decide what types of documents can appear as fragments and how to use Mathematica to represent these fragments. During execution, some issues that have to be addressed are: how to move results from one fragment to interact with another? how to combine results from several fragments?

There may be many suitable models for a particular query. Some criteria need to be employed to automatically choose one from them. After a model has been fixed, the system needs to decide in which order to execute fragment documents, in which order to put intermediate results together. Another very important aspect to be considered is that the mathematical expressions and data in documents might be in quite different forms, e.g., qualitative, quantitative, or logical. Some documents may even contain incomplete information. There are existing tools and algorithms that handle quantitative information well (e.g., GAMS, a linear, quadratic, integer mathematical programming system that has been used to solve real industrial problems of respectable size). However, when different forms are intertwined, i.e., there is qualitative data, quantitative data, and logical type data as well in one model, the execution of the model becomes a difficult endeavor. We need to have a formal way of integrating different data representations. These all have to be considered when designing the algorithm. We propose to develop this algorithm based on an existing one, Rules-Constraints-Reasoning (RCR). Developed by Kiang et al. [10], RCR is a method of reasoning with imprecise knowledge that is aimed mainly at discrete dynamic systems. It proposes a model representation which is essentially an interval-based abstraction of difference equation systems. (See [10] for a detailed and formal description.) First of all, we will discuss some data representation problems that are important to the design of algorithms.

3.1. Data Representation

Suppose that data in some documents are represented in a monotonic form, for example, a certain variable is increasing or decreasing on a certain interval. A system taking this form of data will give its conclusions also in the form of monotonicity. Typically, there will be a huge number of possibilities for the resulting dynamic behavior, many of which may never occur in a practical situation. Therefore, the RCR algorithm uses numerical data representation dealing with sets (described by means of finitely many numbers), rather than listing all the states of behavior.

However, there may be situations where it is desirable to incorporate into the model description not only numerical data but logical type data as well. The question arises as to how to best handle such data. In recent years, a connection has emerged between logical deduction and integer programming (cf. [9]). By formulating problems where one needs to satisfy a set of logical conditions, as problems of solving a set of inequalities in integers, one can profitably solve the problem in a relatively brief period of time by certain integer programming methods.

The RCR algorithm deals with qualitative reasoning when both the initial information and certain specifications of the system may not be completely known. The uncertainty is expressed by saying that, at a given time, a quantity, whose value need not be exactly known, belongs to a set of a suitable type. For computational purposes, the set should be describable by finitely many parameters. So, for a subset of the set of real numbers, unions of finitely many intervals seem suitable. In practice, intervals are used so far. The algorithm then amounts to a propagation of intervals, starting with those initially given. [This is not the same as the so called interval arithmetic. The idea of propagating information using intervals has also been considered by Davis [3] mainly in connection with certain linear systems.] The interval obtained for a quantity at a given time then gives the best information about that quantity that can be obtained. The accuracy may be improved by using more than one interval, creating an issue of how much gain can be expected from making the computations longer and more complicated. This algorithm does not incorporate logical (non-numerical) information at this point.

It does not seem practical to produce all possible behaviors indicated by logical-type variables. Suppose we have a situation where ten intervals and two variables are studied. During each time interval considered, a logic variable can be true or false. There are 4[10] = 1,048,576 possible ways for this behavior to occur. Even if 99% of them are ruled out, more than 10,000 still remain. It seems doubtful that a human being inspecting the answer would be able to comprehend such an answer. Therefore, asking a question like "What are all the possible states of qualitative behavior that could arise from a given set of assumptions" does not seem very useful. It might be more practical to ask what the final result is after a certain period of time, in other words, if a statement is true or false at that time. Looking at the problem this way produces not only the advantage of cutting down the length of output, but also allows the kind of logical reasoning that could apparently be phrased in terms of integer programming.

We need to develop a unifying data representation language which can represent, in addition to precisely known numerical and logical data, qualitative knowledge and incompletely known data. This would include knowledge that is currently represented using various languages in logic. It seems that, in order to incorporate statements of logical type, it will be desirable to code all such information numerically, so that ultimately, the computational system will only have to deal with numbers. This should allow us to tie the numerical and logical components of the system together, making the ranges of certain quantities or the bounds for certain functions dependent on the truth values of certain logical variables. The description of logical conditions should be attainable in many ways, and many possibilities could be allowed in the system for added flexibility. For example, in circumscription [12], logical conditions can be described by giving some main rules and then a number of exceptions. This is analogous to describing upper and lower bounds for functions by formulas that involve something simple most of the time, but a few different function forms at a number of "exceptional" intervals.

When introducing logic variables, we need to incorporate variables that take only integer values (possibly only 0 and 1 as in the case of logic), and therefore deal with the case when the most natural type of set containing some possible values of a variable is not a full interval of the real axis, but a discrete set. Data representation involving both numerical and logical (say 0 or 1 valued) or more generally integer valued variables could be handled using Cartesian products or finite unions of Cartesian products in case of several variables and many conditions. Thus, the case when a logical variable takes a value 1 and simultaneously, or conditionally on the logical variable being equal to 1, a numerical variable is known to lie on the interval [4, 6], would be expressed using the Cartesian product {1} x [4, 6] (if the logical variable can be 0 in the particular situation considered and is 0, the numerical variable might belong to another interval, say [3, 5], and the whole thing would be expressed using ({0} x [3, 5]) [[union]] ({1} x [4, 6]).

One of the challenges is to make sure that we can smoothly integrate genuinely numerical methods with those of integer programming in order to deal with logical information in a unified fashion. It is worth noting that one way to deal with integer information is to consider the corresponding "continuous" problem for real numbers and check at the end of the computation if any integer (or 0 or 1 valued) solutions were generated. Many methods of integer programming proceed using this idea.

Another important issue is the development of theoretical principles needed to create a deductive system that could be used to start with given system specification and initial conditions expressed in terms of the unifying language, and deduce the state of the system at a future time.

3.2. Time Scales and Compositional Modeling

We recognize that it is an integral problem to find ways of organizing the knowledge used in each individual problem, as it may consist of a large number of pieces of information that are to be collected together and used in a coherent way. We need an algorithm that allows information propagation over time, from an initial moment of time to some future moment, and draws the best possible inferences from this mechanism. One aspect of the algorithm design then is to devise ways of incorporating both numerical and logical data into such a system.

Some issues related to the model building process, which thus need to be considered in the algorithm, are time scales and compositional modeling. Note that when applying an algorithm to solve a problem, it is assumed that a model describing some real situation has been developed, satisfying suitable criteria so that the system description is in accordance with algorithm specifications. There is, however, the question of how to model huge, complex enterprises or other organizations, involving perhaps thousands of variables and relations. This is of interest not only for model building but also for computational issues: how to organize the computations most effectively when so many variables and equations have to be potentially taken into account?

As we mentioned in section 2, the approach we intend to follow is based on the idea of compositional modeling. This means that one forms model fragments from the different pieces of data and descriptions of various parts of the system, each relating together perhaps only a few of the many variables. A suitable, generally rather small, number of fragments is joined together to form so-called components. There are then a great number of interrelationships and dependencies among the components. However, to solve a specific problem, one may only need to deal with a limited number of variables and hence only relatively few components. We, therefore, need to develop a systematic way of determining the minimal number of components, consisting of all, and only, the relevant ones to perform analyses.

Another issue is the question of time scales. In most real life situations, there are many different processes at work, proceeding at different speeds (for an example related to medicine/biology, see [11]). Furthermore, the points of view that analysts want their models to reflect, may depend on daily, monthly, quarterly, or annual changes or updates. In a complex organization, the type of model fragments would presumably include very different time scales. Thus there is not only the question of separating or interconnecting such fragments or components, but of designing the algorithm as well in such a way that it would take time scales into account, in case of handling a problem that requires the input of several time scales.

4. Optimization

Suppose that we are using the RCR algorithm to predict the behavior of a partially known dynamic system. One of the variables, L, is initially restricted to an interval [c, d]. Consider a variable M, which may or may not be the same as L, at a given time t. The algorithm will predict that then the values of M will lie in an interval [a, b]. Now, the numbers a and b will be obtained by applying the RCR computations and they depend on the values of c and d.

Instead of considering c and d to be given and fixed, we now assume that they depend on an external variable x (which is not part of the model so far). Then all quantities of the model that were obtained by using c and d become functions of x. Hence the question arises as to how to optimize the values of M at time t. In an application to economics or management, the variable M might, for example, be related to profits, income, or the stability of the enterprise.

To optimize the interval [a, b], one could obviously use many quantities. Suppose that we want to make the lower bound a as high as possible. In other words, we want to maximize a as a function of x. The calculation that yields a as a function of x is usually so complicated that it is not feasible to expect to find an explicit formula for a in terms of x. Here is the challenge of finding computational procedures to solve this extremal problem.

As we suggested in the last section, logical variables would be involved in many cases. Functions/quantities to be optimized, as described above, could depend on logical variables. In general, they would be expected to depend on both numerical and logical variables, resulting in combined or hybrid optimization.

The current directions taken by businesses offer motivation for the study of qualitative optimization. Many corporations are interested in or have undertaken enterprise modeling, to keep track of the development of the organization, financially and otherwise. Effectiveness demands the setting of concrete corporate/organizational goals. To achieve them, different lines of strategy can be considered. Alternative states of planning and behavior arise and must be determined. For example, the company may be simply asking whether or not a certain loan should be taken out, and this reduces to a question involving a logical variable.

Thus, there could be a number of logical and numerical variables whose values are to be chosen when the senior management plans the future of the organization. We can then use the (usually incompletely known) system and optimize some given quantity at a future time to determine the best configuration of logical and numerical variables to be chosen at the present time. When constructing models like this for a large organization, the number of variables and functions involved in the model will obviously be very large and it will often be so complicated that it would prove difficult for someone to make decisions on a "common sense" basis. Therefore, it is important to have a computational system to assist in decision making, by giving the best conclusions that can be obtained, in view of the incomplete information given to the model, and to make choices that will optimize key quantities chosen with regard to the corporate goals.

Therefore, the theoretical development of an optimization approach to qualitative systems and the theoretical development of the foundations of robust optimization (optimization of a quantity in an incompletely known system) are needed. Logic variables could also be used as control variables with respect to which a given quantity in the system is to be optimized, the final result being a hybrid situation where the control variables can be of numerical or logical type, and where the concept of optimization might include a combination of optimizing a numerical quantity, and taking into account some logical variables for which a configuration deemed to be optimal is sought. This involves defining a preferential order among the configurations that one may obtain for such variables.

A simple example of qualitative and logic optimization would be able to show the flavor of the issues that we mean. Consider the financial situation of a company, described by the variables Cash C(t), Sales S(t), Inventory I(t), Profit P(t), Debt D(t), and Stability X(t), which is defined in terms of other quantities. Also there is the logic variable Raid R(t), which is true if, at time t, the company buys another company of a size whose qualitative description is assumed to be given. We assume that the model gives equations or inequalities describing how these quantities are related to each other and in particular, how the condition R(t) = 1 will affect the other quantities. Let us assume, for example, that we try to maximize the midpoint q = (a+b)/2 of the interval [a, b] containing the possible values of X(5) (stability after 5 years from the starting point). As control variables that may be varied in this experiment, we may take, for example, the limits for the intervals containing I(0), R(0), and perhaps some or all future values of R(t) for 1 <= t <= 4. Now q will depend on both numerical and logic variables. If we generalize and expand the model, and define X to depend also on some logic variables, then both the control variables and the variables to be optimized would involve a mixture of numerical and logic variables. Thus, we need an algorithm able to handle such a very general situation.

A rather crude view of optimization is the following. There are a number of different scenarios to be studied that correspond to different initial values for certain variables. If there were only finitely many scenarios, then, at least in theory, one could perform all the calculations for each of them individually, and then compare the final results and check which one appears the most satisfactory. Thus one answers a whole series of "what if" questions. We intend to develop a more sophisticated method than that, that would answer such questions without having to do the calculations for each scenario separately but provide the inference power to arrive at the answer via a method which at this point is at the level of calculus-type analysis, but which we hope to improve further to accommodate techniques to reduce the branching that occurs when different alternatives for the formulas used in functional definitions are encountered.

5. Conclusions

Research in digital library should go beyond the scope of categorizing and organizing the electronic information, and searching and filtering large volumes of data. The tremendous amount of information scattered across the network is a powerful source for modern organizations to capture their strategic and tactic advantages in the tough competition in today's world, if they fully utilize this information. This provides the main motivation for the idea of using digital library as a foundation for an enterprise wide decision support system. However, as we discussed in this paper, many problems have to be solved before such a system could be put in use, which is a great challenge, but a great opportunity as well.


[1] Arnon, D., R. Beach, K. Mcisaac, and C. Waldspurger (1988) "CaminoReal: an interactive mathematical notebook." In EP88: Document Manipulation and Typography. Proceedings of the International Conference on Electronic Publishing, Document Manipulation and Typography. Nice (France). Apr. 20-22. Ed. by J.C. van Vliet. Cambridge Univ. Press. Cambridge.

[2] Bier, E. A. and A. Goodisman (1990) "Documents as user interfaces." In EP90: Proceedings of the International Conference on Electronic Publishing, Document Manipulation & Typography. Gaithersburg, Maryland, September. Ed. by R. Furuta. Cambridge University Press. Cambridge.

[3] Davis, E. (1987) "Constraint propagation with interval labels." Artificial Intelligence. 32: 281-331.

[4] de Kleer, J. (1986) "An assumption-based TMS." Artificial Intelligence. 28: 127-162.

[5] Falkenhainer, B. and K. D. Forbus (1991) "Compositional modeling: finding the right model for the job." Artificial Intelligence. 51: 95-144.

[6] Fox, E. A. (1993) Source book on digital libraries. Virginia Tech.

[7] Herwijnen, E. van (1990) Practical SGML. Dordrecht/Boston/London: Kluwer Academic Publishers.

[8] Hinkkanen, A., K. R. Lang, and A. B. Whinston (1993) "On the Usage of Qualitative Reasoning as Approach Towards Enterprise Modeling." forthcoming in Annals of Operations Research.

[9] Hooker, J. N. (1988) "A qualitative approach to logic inference." Decision Support Systems. 4: 45-69.

[10] Kiang, M. Y., A. B. Whinston, and A. Hinkkanen (1993) "An interval propagation method for solving qualitative difference equation systems." in Qualitative Reasoning and Decision Technologies. ed. by N. P. Carrete and M. G. Singh. International Center for Numerical Methods in Engineering, Barcelona, Spain.

[11] Kuipers, B. (1988) "Qualitative simulation using time-scale abstraction." Artificial Intelligence in Engineering. 3(4): 185-191.

[12] Lifschitz, V. (1993) "Circumscription." manuscript for a chapter in Handbook of Logic in AI and Logic Programming. University of Texas at Austin.

[13] National Science Foundation: Request for Proposals on Digital Libraries. 1993.

[14] Wright, H. (1992) "SGML frees information." Byte. June: 279-286.