Skip to content

Commit

Permalink
changed :math:'blabla' to :math: (valid ReST) in distributions.py. Mi…
Browse files Browse the repository at this point in the history
…nor changes to the docs. Replaced the distribution definitions by the epydoc generated docs.

git-svn-id: https://pymc.googlecode.com/svn/trunk@714 15d7aa0b-6f1a-0410-991a-d59f85d14984
  • Loading branch information
david.huard committed Apr 29, 2008
1 parent a12e1ee commit e4a58e3
Show file tree
Hide file tree
Showing 17 changed files with 4,612 additions and 208 deletions.
17 changes: 15 additions & 2 deletions INSTALL.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@ dependencies, and all are freely available online.
* `IPython`_ (optional): A convenient python shell coming with parallel
computing facilities.

* `pyTables`_ (optional): An interface to the HDF5 library for storing datasets
in binary format.



There are prebuilt distributions that include all the needed dependencies. For
Mac OS X and Linux users, we recommend the `ActiveState`_ distributions.
Windows users should download and install `Enthought Python`_. The Mac OS X
Expand Down Expand Up @@ -67,6 +72,10 @@ tested with PyMC but may work nonetheless.
.. _`IPython`:
http://ipython.scipy.org/

.. _`pyTables`:
http://www.pytables.org/moin


Platform-specific instructions
==============================

Expand Down Expand Up @@ -154,7 +163,7 @@ To make sure everything is working correctly, open a python shell and type::

You should see a lot of tests being run, and messages appear if errors are
raised or if some tests fail. In case this happens (it shouldn't), please report
the problems on the issue tracker, specifying the version you are using and the
the problems on the `issue tracker`_, specifying the version you are using and the
environment. Some of the tests require SciPy, if it is not installed on your
system, you should not worry too much about failing tests.

Expand All @@ -163,4 +172,8 @@ Bugs and feature requests
=========================

Report problems with the installation, bugs in the code or feature request at
the issue tracker at http://code.google.com/p/pymc/issues/list .
the `issue tracker`_ at http://code.google.com/p/pymc/issues/list .

.. _`issue tracker`:
http://code.google.com/p/pymc/issues/list .

9 changes: 3 additions & 6 deletions README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Features
What's new in 2.0
=================

* New, more flexible object model and syntax.
* New flexible object model and syntax (non backward compatible).

* Reduced redundant computations: only relevant log-probability terms are
computed, and these are cached.
Expand Down Expand Up @@ -89,14 +89,13 @@ From a python shell, type::
S.sample(iter=10000, burn=5000, thin=2)

where problem_definition is a module or a dictionary containing Node, Data and
Parameter instances defining your problem. Read the `user guide`_ for a
complete description of the package, classes and some examples to get started.
Parameter instances defining your problem.


History
=======

PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.

In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.

Expand All @@ -110,5 +109,3 @@ See the `INSTALL.txt`_ file.
.. _`INSTALL.txt`:
./INSTALL.txt

.. _`user guide`:
docs/pdf/new_interface.pdf
8 changes: 4 additions & 4 deletions builddocs
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#!/usr/bin/env bash
#epydoc --verbose --debug --config epydoc.conf

#cp docs/pdf/pymc.distributions-module.tex docs/

# Make manual
cd docs
rst2latex.py ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
rst2latex.py ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
rst2latex.py ../PyMC/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex
rst2latex ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
rst2latex ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
rst2latex ../pymc/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex

mkdir pdf
pdflatex -output-directory=pdf guide2.0
Expand Down
8 changes: 6 additions & 2 deletions docs/INSTALL.tex
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ \section*{Dependencies}
\href{http://ipython.scipy.org/}{IPython} (optional): A convenient python shell coming with parallel
computing facilities.

\item {}
\href{http://www.pytables.org/moin}{pyTables} (optional): An interface to the HDF5 library for storing datasets
in binary format.

\end{itemize}

There are prebuilt distributions that include all the needed dependencies. For
Expand Down Expand Up @@ -215,7 +219,7 @@ \section*{Running the test suite}

You should see a lot of tests being run, and messages appear if errors are
raised or if some tests fail. In case this happens (it shouldn't), please report
the problems on the issue tracker, specifying the version you are using and the
the problems on the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}, specifying the version you are using and the
environment. Some of the tests require SciPy, if it is not installed on your
system, you should not worry too much about failing tests.

Expand All @@ -228,6 +232,6 @@ \section*{Bugs and feature requests}
\label{bugs-and-feature-requests}

Report problems with the installation, bugs in the code or feature request at
the issue tracker at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker} at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .

\
7 changes: 3 additions & 4 deletions docs/README.tex
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ \section*{What's new in 2.0}
\label{what-s-new-in-2-0}
\begin{itemize}
\item {}
New, more flexible object model and syntax.
New flexible object model and syntax (non backward compatible).

\item {}
Reduced redundant computations: only relevant log-probability terms are
Expand Down Expand Up @@ -118,8 +118,7 @@ \section*{Usage}
}\end{quote}

where problem{\_}definition is a module or a dictionary containing Node, Data and
Parameter instances defining your problem. Read the \href{docs/pdf/new_interface.pdf}{user guide} for a
complete description of the package, classes and some examples to get started.
Parameter instances defining your problem.


%___________________________________________________________________________
Expand All @@ -129,7 +128,7 @@ \section*{Usage}
\section*{History}
\label{history}

PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.

In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.

Expand Down
105 changes: 58 additions & 47 deletions docs/database.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different strategies, from simple ASCII files to compressed binary formats. These strategies are implemented different \emph{database backends}, behaving identically from the user's perspective. In the following, the interface to these backends is discussed, and a description of the different backends is given.
By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different storing strategies, which we refer to as \emph{database backends}. All those backends provide the same user interface, making it trivial to switch from one backend to another. In the following, this common interface is presented, along with an individual description of each backend.


%___________________________________________________________________________
Expand All @@ -10,31 +10,16 @@
\section*{Accessing Sampled Data: User Interface}
\label{accessing-sampled-data-user-interface}

The choice of database backend is made when a sampler is created using the \titlereference{db} keyword:
The database backend is selected by the \titlereference{db} keyword:
\begin{quote}{\ttfamily \raggedright \noindent
S~=~MCMC(DisasterModel,~db='txt',~dirname='test')
S~=~MCMC(DisasterModel,~db='ram')
}\end{quote}

This instructs the sampler to tally samples in txt files stored in a directory named \titlereference{test}. Other choices for the database are given in the table below, the default being \titlereference{ram}. When the \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using its trace object
\begin{quote}{\ttfamily \raggedright \noindent
S.e.trace()
}\end{quote}

When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
\begin{quote}{\ttfamily \raggedright \noindent
DB~=~Database.txt.load('test')
}\end{quote}
Here, we instructed the MCMC sampler to keep the trace in the computer's live memory. This means that when the Python session closes, all data will be lost. This is the default backend.

This object can then be linked to a model definition using
Each time MCMC's \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using the call method of its trace attribute:
\begin{quote}{\ttfamily \raggedright \noindent
S~=~Sampler(DisasterSampler,~db=DB)
}\end{quote}

For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attribtues of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{D.db.close()} was called.

The \titlereference{trace} object has the following signature .. {[}{\#}{]}:
\begin{quote}{\ttfamily \raggedright \noindent
trace(self,~~burn=0,~thin=1,~chain=-1,~slicing=None)
S.e.trace(burn=0,~thin=1,~chain=-1,~slicing=None)
}\end{quote}

with arguments having the following meaning:
Expand All @@ -45,7 +30,7 @@ \section*{Accessing Sampled Data: User Interface}

\item[{thin}] \leavevmode (\textbf{int})

Number of samples to step.
The stride, ie the number of samples to step for each returned value.

\item[{chain}] \leavevmode (\textbf{int or None})

Expand All @@ -56,7 +41,34 @@ \section*{Accessing Sampled Data: User Interface}
Slice object used to parse the samples. Overrides burn and thin parameters.

\end{description}
% [#]: The `trace` attribute of stochastic parameters is in fact an instance of a Trace class, defined for each backend. This class has a method called `gettrace` that returns the trace of the object, and which is called by `trace()` .


%___________________________________________________________________________

\hypertarget{loading-data-from-a-previous-session}{}
\pdfbookmark[0]{Loading data from a previous session}{loading-data-from-a-previous-session}
\section*{Loading data from a previous session}
\label{loading-data-from-a-previous-session}

To store a copy of the trace on the hard disk, a number of backends are available: \titlereference{txt}, \titlereference{pickle}, \titlereference{hdf5}, \titlereference{sqlite} and \titlereference{mysql}. These all write the data to disk, in such a way that it can be loaded back in a following session and appended to. So for instance, to save data in ASCII format, we would do:
\begin{quote}{\ttfamily \raggedright \noindent
S~=~MCMC(DisasterModel,~db='txt',~dirname='disaster{\_}data')~\\
S.sample(10000)~\\
S.db.close()
}\end{quote}

When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
\begin{quote}{\ttfamily \raggedright \noindent
DB~=~Database.txt.load('disaster{\_}data')
}\end{quote}

This \titlereference{Database} object can then be linked to a model definition using
\begin{quote}{\ttfamily \raggedright \noindent
S~=~Sampler(DisasterSampler,~db=DB)~\\
S.sample(10000)
}\end{quote}

For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attributes of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{S.db.close()} was called.


%___________________________________________________________________________
Expand Down Expand Up @@ -86,7 +98,7 @@ \subsection*{ram}
\subsection*{txt}
\label{txt}

The \titlereference{txt} backend is a modified \titlereference{ram} backend, the only difference being that when the database is closed, the data is written to disk in ascii files. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
With the \titlereference{txt} backend, the data is written to disk in ASCII files when the class \titlereference{close()} method is called. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.


%___________________________________________________________________________
Expand All @@ -96,7 +108,17 @@ \subsection*{txt}
\subsection*{pickle}
\label{pickle}

As its name implies, the \titlereference{pickle} database used the \titlereference{Cpickle} module to save the trace objects. Use of this backend is not suggested since the generated files may become unreadable after a Python update.
As its name implies, the \titlereference{pickle} database relies on the \titlereference{Cpickle} module to save the trace objects. Use of this backend is appropriate for small scale, short-lived projects. For longer term or larger projects, the \titlereference{pickle} backend should be avoided since generated files might be unreadable across different Python versions.


%___________________________________________________________________________

\hypertarget{hdf5}{}
\pdfbookmark[1]{hdf5}{hdf5}
\subsection*{hdf5}
\label{hdf5}

The hdf5 backend uses \href{http://www.pytables.org/moin}{pyTables} to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available RAM memory, speeding up data access. For this backend to work, pyTables must be installed, which in turn requires the hdf5 library.


%___________________________________________________________________________
Expand All @@ -106,7 +128,7 @@ \subsection*{pickle}
\subsection*{sqlite}
\label{sqlite}

Chris ...
The sqlite backend is based on the python module sqlite3. It is not as mature as the other backends, in the sense that is does not support saving/restoring of state and plug and play reloading.


%___________________________________________________________________________
Expand All @@ -116,17 +138,7 @@ \subsection*{sqlite}
\subsection*{mysql}
\label{mysql}

Chris ...


%___________________________________________________________________________

\hypertarget{hdf5}{}
\pdfbookmark[1]{hdf5}{hdf5}
\subsection*{hdf5}
\label{hdf5}

The hdf5 backend uses pyTables to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available memory. Data access is also very fast.
The mysql backend is based on the MySQLd python module. It also is not as mature as the other backends.

\leavevmode
\begin{longtable}[c]{|p{0.133\locallinewidth}|p{0.447\locallinewidth}|p{0.307\locallinewidth}|}
Expand All @@ -144,13 +156,12 @@ \subsection*{hdf5}
no{\_}trace
&
Do not tally samples at all.
Use only for testing purposes.
& \\
\hline

ram
&
Store samples in memory.
Store samples in live memory.
& \\
\hline

Expand All @@ -166,6 +177,14 @@ \subsection*{hdf5}
& \\
\hline

hdf5
&
Store samples in the HDF5 format.
&
pytables ({\textgreater}2.0), libhdf5
\\
\hline

sqlite
&
Store samples in a sqlite database.
Expand All @@ -181,14 +200,6 @@ \subsection*{hdf5}
MySQLdb
\\
\hline

hdf5
&
Store samples in the HDF5 format.
&
pytables ({\textgreater}2.0), libhdf5
\\
\hline
\end{longtable}

For more information about individual backends, refer to the \href{docs/API.pdf}{API} documentation.
Expand Down
Loading

0 comments on commit e4a58e3

Please sign in to comment.