The CHROMA Application

When bioscientists are interested in the analysis of the behaviour of a molecule they should obtain this molecule in a pure form from a biological sample. This sample could be a biological tissue (animal or vegetal) that contains, besides the molecule of interest, hundreds of other molecules that should be removed. Protein purification is the process that allows the isolation of one molecule among many others. Once the molecule of interest has been isolated, its structure, function, electrical and physical properties and behaviour can be analyzed. The development of techniques and methods for the separation and purification of biological macro-molecules (such as proteins) has been an important prerequisite for many of the advancements made in biosciences and biotechnology over the past three decades. The main problems that can appear in a purification process are in general related with denaturation, proteolysis and contamination with pyrogens, nucleic acids, bacteria and viruses.

The usual way to search for a purification procedure is to look in the literature for previos purifications of the protein that we are interested in. Then we can use the same source than the obtained experiments and, consequently, the same purification process will be useful. The main difficulty is the unavailability of the sources used in the obtained literature. Therefore, the purification process has to be modified according to the characteristics of the available source. The optimization of the chosen purification process is made by a systematic variation of parameters as the composition of the extraction method. The extraction of a protein from a solid source implies an agreement between the retrieval of the protein and its purity.

CHROMA has a base of cases containing experiments obtained from the literature (Comparative Biochemistry and Physiology revue). CHROMA searches in this base and the result of this search is one or several experiments close to our experiment providing a first approximation of how the protein of interest can be purified. We want to make special emphasis in that the adequacy of the proposed solution can be only evaluated in the laboratory. That makes difficult the evaluation of CHROMA.

The main task of CHROMA is the purification task. Given a new experiment and a base of solved experiments, the goal of the purification task is to find a sequence of chromatographic techniques (purification plan) purifying the protein of the new experiment. The domain expert uses different strategies to find a purification plan for the current protein:

M1) Searching for an experiment using exactly the same sample for the same protein.

M2) Searching for experiments purifying the same protein but from other kinds of sample. If more than one is found, the domain expert chooses one of them according to some specific criteria.

M3) If the sample of the current experiment satisfies some specific domain properties (i.e. the current protein belong to a special family of proteins), the domain expert knows which purification plan to apply without searching for past experiments.

M4) If the domain expert has not found any experiment in the literature purifying the protein of the current experiment, he tries to build a purification plan by trial and error in the laboratory. The steps of this purification plan are build according to the characteristics of each purification techniques.

Each of these strategies has been modelled in CHROMA by a different problem solving method. In particular, strategy M1 has been modelled by the equal-sample method that detects if there is an experiment in the case base having the same protein and sample as the current experiment. The analogy-by-determination method is a case-based method, used to model strategy M2, that retrieves experiments from the case base that purify the same protein. Given a protein P, several experiments purifying P can be retrieved: the analogy-by-determination method performs some interaction with the user in order to let him decide the most appropriate precedent.

Strategy M3 has been modelled by a classification method called purify-by-class. This method uses intensional concept descriptions to determine the class to which an experiment belongs. The purify-by-class method needs two input models: new experiment and class descriptions. The New experiment model contains the description of a sample from which a protein has to be purified. The class descriptions model contains the descriptions of the classes to which a purification experiment can belong. This model is not provided by the domain expert, so during the KM analysis a KA-Task has to be associated to it. This KA-Task is solved using a learning method, called induce-classes, that induces the descriptions of the classes from the experiments contained in the experiments model.

During the KM analysis of the domain, four PSM have been associated to the purification task. Let us suppose that in the CHROMA application the methods equal-sample, purify-by-class, analogy-by-determination, and default-plan are sequentially tried in this order. If a new experiment wants to purify a protein that is not used in any experiment of the base of cases, the only applicable method is default-plan. Using the sequential order, all the methods have to be executed (and fail) before to obtain the solution from default-plan.

The KM analysis of the domain suggests a more intelligent strategy to select the appropriate method. We propose to use a lazy problem-centred selection of the method taking into account an Applicability Conditions model and a Preferences model. In particular, the Applicability Conditions model in CHROMA contains the following knowledge:

If the protein of the current problem is not purified in any experiment in the case base, the only applicable method is default-plan.

If there is no experiment in the case base using the same sample that the current problem the applicable methods are the purify-by-class method, the analogy-by-determination method and the default-plan method.

If the sample of the current problem does not satisfy any class description, the applicable methods are the analogy-by-determination method and the default-plan method. As we will see later, to evaluate this condition CHROMA needs an additional model called control sample.

The Preferences model contains preferences provided by the domain expert in order to choose one method if more than one is applicable. In CHROMA the Preferences model contains the following preferences:

1) If applicable, equal-sample is preferable to others (since identical precedent assures an appropriate solution)

2) default-plan is the less preferable

3) analogy-by-determination and purify-by-class are equally preferable if both are applicable.

The lazy problem-centred strategy has been implemented using a selection task at the meta-level of the purification task. The selection task has as input the control sample model that contains the description of a sample S. Each feature A of the sample S has as values the disjunction of the values that A takes in all the case base experiments. The selection task is solved using the following method:

if there is no experiment in the case base using the current protein, the purification plan is always to be obtained using the default-plan method.

If the protein was already used and there is an experiment having the same sample that the new one, the equal-sample method can be used.

If the new experiment belongs to some solution class, the purify-by-class method can be used (also the analogy-by-determination method could be used).

Otherwise, the new experiment only can be solved using the analogy-by-determination method.

A detailed description of CHROMA and the methods that it uses can be found in:

E. Armengol, E. Plaza (1995); Integrating induction in a Case-based Reasoner. Lecture Notes in Artificial Intelligence. Springer. num. 984, pp. 3-17. (Extended version IIIA-RR-95-02)