PRISM: PRogramming In Statistical Modeling

Contents

Release notes
Update history
Previous versions
License

Introduction — What is PRISM?

(For theoretical details of PRISM and our recent research, see prism-intro.pdf [411KB], which is a compilation of past slides.)

PRISM is a general programming language intended for symbolic-statistical modeling. It is a new and unprecedented programming language with learning ability for statistical parameters embedded in programs. Its programming system, shortly called “PRISM system” here, is a powerful tool for building complex statistical models. The theoretical background of PRISM system is distribution semantics for parameterized logic programs and EM learning of their parameters from observations [Sato 95, Sato 01c]. The PRISM system is comprised of two subsystems, one for learning and the other for execution, just like a human being has an artery and a vein. The execution subsystem supports various probabilistic built-in predicates conforming to the distribution semantics and makes it possible to use a program as a random sampler. The learning subsystem on the other hand supports abductive reasoning, i.e. search for logical explanations of observations through a program and learning parameters associated with them for the adaptive behavior of the program.

Speaking mathematically, a PRISM program is a logic program in which facts have a parameterized probability distribution so that the program can be seen as a parameterized statistical model. The program defines a probability distribution (probability measure) over the set of possible Herbrand interpretations. In execution, probabilities or samples of various probabilistic constructs in the program will be calculated according to the defined distribution. In learning, we perform ML (maximum likelihood) estimation of the program parameters from incomplete data by the EM (Expectation-Maximization) algorithm so that the defined distribution is closer to the observed distribution. Because PRISM programs can be arbitrary complex, our model can be arbitrary complex as well, which means we have opportunities of building large yet understandable symbolic-statistical models that go beyond traditional statistical models. [Sato 01c] and [Sato 08a] provide a comprehensive view of the theoretical background of PRISM.

At this point, we have to say, for fairness, that understanding PRISM programming might require some learning efforts on the user side and you have to take care not to violate certain statistical conditions when writing codes. However PRISM programs just look like a Prolog program with a couple of new built-in predicates. Accordingly, PRISM programming is expected not to be much difficult for anyone who is already familiar with Prolog and statistics.

Table of contents

Merits of PRISM programming

We here list the (expected) merits of PRISM programming;

Table of contents

Current research topics

Relaxing modeling assumptions

The current programming system makes several modeling assumptions in PRISM programs for efficient probabilistic inferences. On the other hand, these assumptions sometimes turn to be a serious restriction to users. We have already shown a way to relax the uniqueness condition (all observable goal patterns are exclusive to each other and their probabilities sum up to unity) by considering failures in the generation process of possible observations. Along with this line, now we are considering to explore the ways to relax other assumptions to enhance the expressivity and flexibility of the PRISM language.

Table of contents

Downloading PRISM system

PRISM is built on top of B-Prolog and runs on Windows, Linux with x86 processors and Mac OS X. Since version 2.0, the source code of the PRISM-specific part is open to public. Visit release notes for details, and here for previous versions. The latest package is version 2.2 as of March 30, 2016. The package contains executable binaries, source code of the PRISM-specific part and program examples. Based on the license agreement for version 2.x, the users can use this package free of charge for academic use, and the source code under the modified BSD license.

Highlights of 2.3:

Notes on platforms:

Table of contents

Selected publications on PRISM

Comprehensive descriptions on PRISM

First, [Sato 01c] includes a full description of the distribution semantics and the efficient architecture for probabilistic inferences. On the other hand, [Sato 08a] adds a couple of recent results. That is, [Sato 08a] discusses on bridging a gap between discrete Bayesian networks and probabilistic context-free grammars, with some performance report. It also presents a detailed description of our approach to deal with failures in generative models. [Sato 08b] gives another overview of PRISM, exploring probabilistic context-free graph grammars. [Kameya 07] is a guide on PRISM for Japanese users.

Distribution semantics and PRISM as a modeling language

[Sato 95] introduced distribution semantics, which is the theoretical (semantic) basis of PRISM programs, and also presents an algorithm for parameter learning of a certain class of logic programs, called BS programs. [Sato 97] describes the design of PRISM language and PRISM system, and shows the examples of writing hidden Markov models (HMMs), Bayesian networks (BNs), and so on. In [Sato 98], we model the complicated interaction between gene-inheritance and a tribal social system discovered in the Kariera tribe. [Sato 03] and [Sato 05b] show how our generic approach could reduce the modeler's effort, and [Sato 05b] gives PRISM programs representing a probabilistic version of two different types of context-free graph grammars.

Efficient probabilistic inferences for PRISM

After proposing PRISM, we have tackled the problem of efficiency in EM learning for PRISM programs. While [Kameya 99] could only state the necessity of the uniqueness condition (in which all observable goal patterns are exclusive, and their probabilities sum to unity) and some compact data structure, [Kameya 00] and [Sato 00] proposed to use OLDT search and probability computation based on dynamic programming for efficient EM learning and finding the most likely explanation. [Zhou 02], [Zhou 03a], and [Kameya 04] describe the further developments of the system. Most recently we presented an insight that bridges a gap between BNs and PCFGs from a viewpoint of probabilistic inference algorithms [Sato 07], and proposed a data-parallel version of our EM algorithm [Izumi 06].

Learning through failure

On the basis of Cussens's FAM (failure-adjusted maximization) algorithm [Cussens 01] and program transformation techniques [Sato 89], we proposed a new learning scheme in which the users can add constraints (which cause failures) to PRISM programs, with keeping the efficency in learning.

Bayesian learning

In an engineering context, Bayesian learning gives us a systematic way to robust prediction and model selection. Variational Bayes (VB) is known as an efficient approximation method for Bayesian learning, and goes together well with our dynamic programming approach via tabled search. In [Sato 09a], we proposed a VB-EM algorithm for PRISM programs, and confirmed its usefulness with the examples from natural language processing and bioinformatics. Furthermore, in [Sato 11], M. Johnson's MCMC algorithm for PCFGs was generalized into the one for PRISM programs.

Generatively defined discriminative models

It is known well that discriminative models tend to perform better than generative models in discriminative tasks such as classification. [Sato 13] proposes a way of constructing discriminative models on top of PRISM, as a generalization of generative-discriminative pairs including naive Bayes classification and logistic regression. Like traditional PRISM and linear-chain conditional random fields (CRFs), this new framework is equipped with efficient inference algorithms based on dynamic programming.

Relaxing modeling assumptions

As described above, currently we are seeking the way to relax modeling assumptions that can be an obstacle in building useful models. Following [De Raedt 07] and [Minato 07], in [Ishihata 08a] and [Ishihata 08b], we proposed the use of (zero-suppressed) binary decision diagrams ((Z)BDDs) as a new data structure for the EM algorithm. This would be a clue to remove the exclusiveness condition assumed in PRISM. In [Ishihata 10], multi-valued random variables are efficiently handled in EM learning on BDDs. Recently [Sato 14a] proposed Viterbi training, a parameter learning framerwork based on Viterbi computation. Viterbi training is interesting since it does not necessarily require the exclusiveness condition and it seems appropriate for the applications where we eventually need just one explanation for every goal (for example, statistical parsing). Another recent work is [Sato 14b], [Kojima 14] and [Kojima 15], in which a generalized algorithm for prefix probability computation is proposed. This inference algorithm can lead to a relaxation of the acyclicity condition. In addition, discriminative models introduced in [Sato 13] above require neither the exclusive condition nor the uniqueness condition.
Table of contents

Related publications and references

Applications

So far, PRISM has been applied to biological sequence analysis (thanks to Henning Christiansen and the members of the LoSt project), analysis of music (thanks to Jon Sneyers and his colleagues), probabilistic planning, and so on. [Christiansen 09] experiments with optimization of PRISM models inspired from genomic sequence analysis (e.g. gene finding in E. Coli) in biology. The authors introduce hidden Markov models to discriminate the coding/non-coding sections, and stochastic context-free grammars to capture the nested hairpin structures in the secondary structure of RNA. [Christiansen 10] integrates side-constraints with HMMs written in PRISM, and applies such constrained HMMs to global pairwise alignment, a problem of biological sequence analysis. [Mørk 12] fully utilizes PRISM system's learning facilities for statistical evaluation of potential HMM structures in a bacterial gene-finding task. [Sneyers 06] proposes a probabilistic-logical model of music, into which the synthetic aspect and the analytic aspect of music are incorporated simultaneously. Later, Abdallah et al. also used PRISM to build probabilistic grammars for the analysis of music structure [Abdallah 15].

Extensions

We believe that PRISM can still be extended further. [Sneyers 10] built a system for probabilistic constraint handling rules (CHRs) on top of the PRISM system.

Advanced logic programming techniques

As described above, [Tamaki and Sato 86], [Zhou 03b], [Zhou 04] and [Zhou 08] proposed tabled search techniques for logic programs, which play a central role in efficient EM learning of PRISM programs. Also, the learning algorithm for generative models with failure is built based on a program transformation proposed in [Sato 89].

Advanced statistical learning techniques

[Cussens 01] proposed the FAM (failure-adjusted maximization) algorithm, which is implemented in the programming system (since version 1.8) to deal with failures in generative models. [Kurihara 06] provides a variational Bayesian (VB) learning method for probabilistic context-free grammars, which forms a basis of the VB learning routines of the programming system (since version 1.11). The deterministic annealing EM algorithm, which is utilized for avoiding undesirable local maxima in EM learning, and its extension to variational Bayesian learning were proposed in [Ueda 98] and [Katahira 08], respectively.

Integration with BDDs

Binary decision diagrams (BDDs) are known as a compact representation of Boolean (propositional) formulas, and utilized in the recent research on statistical relational learning (SRL) or probabilistic logic learning (PLL). ProbLog [De Raedt 07] is the first probabilistic-logical framework (to our knowledge) that introduces BDDs as a basic data structure for probability computation. [Minato 07] proposes an efficient probabilistic inference on Bayesian networks by compiling them into Zero-suppressed BDDs.

Fast EM learning for probabilistic CFG

As an instance of the learning framework for PRISM, in [Kameya 01], we focused on parametric (EM) learning of probabilistic CFGs, or CFGs with context-sensitive probabilities. For these model classes, efficient CFG parsers (e.g. CYK, Earley, Tomita parser, and so on) can be used instead of the OLDT interpreter. Using two actual corpora and hand-crafted grammars of moderate size with different characters, we experimentally confirmed in [Sato 01a] that the speed-up by our approach of EM learning per iteration amounts to from hundreds times to over a thousand times compared to the traditional approach using the Inside-Outside algorithm. [Sato 01b] deals with the probabilistic Earley parsing by our approach.

Table of contents

Contact information

The system is under development, and we keep the latest version downloadable from this page. Please send requests on license, questions or bug reports to (please replace [at] with @):

prism [at] ccml.meijo-u.ac.jp

Table of contents

Acknowledgments

PRISM 2.3 has been partially supported by a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Table of contents

Links

Table of contents

Last update: Aug. 5, 2017