jakebiesinger
diff --git a/‎INSTALL.txt
Lines changed: 15 additions & 2 deletions b/‎INSTALL.txt
Lines changed: 15 additions & 2 deletions
diff --git a/‎README.txt
Lines changed: 3 additions & 6 deletions b/‎README.txt
Lines changed: 3 additions & 6 deletions
diff --git a/‎builddocs
Lines changed: 4 additions & 4 deletions b/‎builddocs
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/INSTALL.tex
Lines changed: 6 additions & 2 deletions b/‎docs/INSTALL.tex
Lines changed: 6 additions & 2 deletions
diff --git a/‎docs/README.tex
Lines changed: 3 additions & 4 deletions b/‎docs/README.tex
Lines changed: 3 additions & 4 deletions
diff --git a/‎docs/database.tex
Lines changed: 58 additions & 47 deletions b/‎docs/database.tex
Lines changed: 58 additions & 47 deletions
@@ -35,6 +35,11 @@ dependencies, and all are freely available online.
 * `IPython`_ (optional): A convenient python shell coming with parallel 
   computing facilities. 
 
+* `pyTables`_ (optional): An interface to the HDF5 library for storing datasets
+  in binary format.
+
+
+
 There are prebuilt distributions that include all the needed dependencies. For 
 Mac OS X and Linux users, we recommend the `ActiveState`_ distributions. 
 Windows users should download and install `Enthought Python`_. The Mac OS X 
@@ -67,6 +72,10 @@ tested with PyMC but may work nonetheless.
 .. _`IPython`:
    http://ipython.scipy.org/
 
+.. _`pyTables`:
+   http://www.pytables.org/moin
+
+
 Platform-specific instructions
 ==============================
 
@@ -154,7 +163,7 @@ To make sure everything is working correctly, open a python shell and type::
 
 You should see a lot of tests being run, and messages appear if errors are 
 raised or if some tests fail. In case this happens (it shouldn't), please report
-the problems on the issue tracker, specifying the version you are using and the
+the problems on the `issue tracker`_, specifying the version you are using and the
 environment. Some of the tests require SciPy, if it is not installed on your 
 system, you should not worry too much about failing tests. 
 
@@ -163,4 +172,8 @@ Bugs and feature requests
 =========================
 
 Report problems with the installation, bugs in the code or feature request at 
-the issue tracker at http://code.google.com/p/pymc/issues/list .
+the `issue tracker`_ at http://code.google.com/p/pymc/issues/list .
+
+.. _`issue tracker`:
+   http://code.google.com/p/pymc/issues/list .
+
@@ -47,7 +47,7 @@ Features
 What's new in 2.0
 =================
 
-* New, more flexible object model and syntax.
+* New flexible object model and syntax (non backward compatible).
 
 * Reduced redundant computations: only relevant log-probability terms are 
   computed, and these are cached.
@@ -89,14 +89,13 @@ From a python shell, type::
 	S.sample(iter=10000, burn=5000, thin=2)
 
 where problem_definition is a module or a dictionary containing Node, Data and 
-Parameter instances defining your problem. Read the `user guide`_ for a 
-complete description of the package, classes and some examples to get started.
+Parameter instances defining your problem. 
 
 
 History
 =======
 
-PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
+PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
 
 In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
 
@@ -110,5 +109,3 @@ See the `INSTALL.txt`_ file.
 .. _`INSTALL.txt`:
    ./INSTALL.txt
 
-.. _`user guide`:
-   docs/pdf/new_interface.pdf
@@ -1,12 +1,12 @@
 #!/usr/bin/env bash
 #epydoc --verbose --debug --config epydoc.conf
-
+#cp docs/pdf/pymc.distributions-module.tex docs/
 
 # Make manual
 cd docs
-rst2latex.py ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
-rst2latex.py ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
-rst2latex.py ../PyMC/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex
+rst2latex ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
+rst2latex ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
+rst2latex ../pymc/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex
 
 mkdir pdf
 pdflatex -output-directory=pdf guide2.0
 
@@ -36,6 +36,10 @@ \section*{Dependencies}
 \href{http://ipython.scipy.org/}{IPython} (optional): A convenient python shell coming with parallel
 computing facilities.
 
+\item {} 
+\href{http://www.pytables.org/moin}{pyTables} (optional): An interface to the HDF5 library for storing datasets
+in binary format.
+
 \end{itemize}
 
 There are prebuilt distributions that include all the needed dependencies. For
@@ -215,7 +219,7 @@ \section*{Running the test suite}
 
 You should see a lot of tests being run, and messages appear if errors are
 raised or if some tests fail. In case this happens (it shouldn't), please report
-the problems on the issue tracker, specifying the version you are using and the
+the problems on the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}, specifying the version you are using and the
 environment. Some of the tests require SciPy, if it is not installed on your
 system, you should not worry too much about failing tests.
 
@@ -228,6 +232,6 @@ \section*{Bugs and feature requests}
 \label{bugs-and-feature-requests}
 
 Report problems with the installation, bugs in the code or feature request at
-the issue tracker at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
+the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker} at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
 
 \
@@ -63,7 +63,7 @@ \section*{What's new in 2.0}
 \label{what-s-new-in-2-0}
 \begin{itemize}
 \item {} 
-New, more flexible object model and syntax.
+New flexible object model and syntax (non backward compatible).
 
 \item {} 
 Reduced redundant computations: only relevant log-probability terms are
@@ -118,8 +118,7 @@ \section*{Usage}
 }\end{quote}
 
 where problem{\_}definition is a module or a dictionary containing Node, Data and
-Parameter instances defining your problem. Read the \href{docs/pdf/new_interface.pdf}{user guide} for a
-complete description of the package, classes and some examples to get started.
+Parameter instances defining your problem.
 
 
 %___________________________________________________________________________
@@ -129,7 +128,7 @@ \section*{Usage}
 \section*{History}
 \label{history}
 
-PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
+PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
 
 In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
 
 
@@ -1,6 +1,6 @@
 
 
-By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different strategies, from simple ASCII files to compressed binary formats. These strategies are implemented different \emph{database backends}, behaving identically from the user's perspective. In the following, the interface to these backends is discussed, and a description of the different backends is given.
+By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different storing strategies, which we refer to as \emph{database backends}. All those backends provide the same user interface, making it trivial to switch from one backend to another. In the following, this common interface is presented, along with an individual description of each backend.
 
 
 %___________________________________________________________________________
@@ -10,31 +10,16 @@
 \section*{Accessing Sampled Data: User Interface}
 \label{accessing-sampled-data-user-interface}
 
-The choice of database backend is made when a sampler is created using the \titlereference{db} keyword:
+The database backend is selected by the \titlereference{db} keyword:
 \begin{quote}{\ttfamily \raggedright \noindent
-S~=~MCMC(DisasterModel,~db='txt',~dirname='test')
+S~=~MCMC(DisasterModel,~db='ram')
 }\end{quote}
 
-This instructs the sampler to tally samples in txt files stored in a directory named \titlereference{test}. Other choices for the database are given in the table below, the default being \titlereference{ram}. When the \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using its trace object
-\begin{quote}{\ttfamily \raggedright \noindent
-S.e.trace()
-}\end{quote}
-
-When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
-\begin{quote}{\ttfamily \raggedright \noindent
-DB~=~Database.txt.load('test')
-}\end{quote}
+Here, we instructed the MCMC sampler to keep the trace in the computer's live memory. This means that when the Python session closes, all data will be lost. This is the default backend.
 
-This object can then be linked to a model definition using
+Each time MCMC's \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using the call method of its trace attribute:
 \begin{quote}{\ttfamily \raggedright \noindent
-S~=~Sampler(DisasterSampler,~db=DB)
-}\end{quote}
-
-For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attribtues of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{D.db.close()} was called.
-
-The \titlereference{trace} object has the following signature .. {[}{\#}{]}:
-\begin{quote}{\ttfamily \raggedright \noindent
-trace(self,~~burn=0,~thin=1,~chain=-1,~slicing=None)
+S.e.trace(burn=0,~thin=1,~chain=-1,~slicing=None)
 }\end{quote}
 
 with arguments having the following meaning:
@@ -45,7 +30,7 @@ \section*{Accessing Sampled Data: User Interface}
 
 \item[{thin}] \leavevmode (\textbf{int})
 
-Number of samples to step.
+The stride, ie the number of samples to step for each returned value.
 
 \item[{chain}] \leavevmode (\textbf{int or None})
 
@@ -56,7 +41,34 @@ \section*{Accessing Sampled Data: User Interface}
 Slice object used to parse the samples. Overrides burn and thin parameters.
 
 \end{description}
-% [#]: The `trace` attribute of stochastic parameters is in fact an instance of a Trace class, defined for each backend. This class has a method called `gettrace` that returns the trace of the object, and which is called by `trace()` . 
+
+
+%___________________________________________________________________________
+
+\hypertarget{loading-data-from-a-previous-session}{}
+\pdfbookmark[0]{Loading data from a previous session}{loading-data-from-a-previous-session}
+\section*{Loading data from a previous session}
+\label{loading-data-from-a-previous-session}
+
+To store a copy of the trace on the hard disk, a number of backends are available: \titlereference{txt}, \titlereference{pickle}, \titlereference{hdf5}, \titlereference{sqlite} and \titlereference{mysql}. These all write the data to disk, in such a way that it can be loaded back in a following session and appended to. So for instance, to save data in ASCII format, we would do:
+\begin{quote}{\ttfamily \raggedright \noindent
+S~=~MCMC(DisasterModel,~db='txt',~dirname='disaster{\_}data')~\\
+S.sample(10000)~\\
+S.db.close()
+}\end{quote}
+
+When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
+\begin{quote}{\ttfamily \raggedright \noindent
+DB~=~Database.txt.load('disaster{\_}data')
+}\end{quote}
+
+This \titlereference{Database} object can then be linked to a model definition using
+\begin{quote}{\ttfamily \raggedright \noindent
+S~=~Sampler(DisasterSampler,~db=DB)~\\
+S.sample(10000)
+}\end{quote}
+
+For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attributes of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{S.db.close()} was called.
 
 
 %___________________________________________________________________________
@@ -86,7 +98,7 @@ \subsection*{ram}
 \subsection*{txt}
 \label{txt}
 
-The \titlereference{txt} backend is a modified \titlereference{ram} backend, the only difference being that when the database is closed, the data is written to disk in ascii files. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
+With the \titlereference{txt} backend, the data is written to disk in ASCII files when the class \titlereference{close()} method is called. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
 
 
 %___________________________________________________________________________
@@ -96,7 +108,17 @@ \subsection*{txt}
 \subsection*{pickle}
 \label{pickle}
 
-As its name implies, the \titlereference{pickle} database used the \titlereference{Cpickle} module to save the trace objects. Use of this backend is not suggested since the generated files may become unreadable after a Python update.
+As its name implies, the \titlereference{pickle} database relies on the \titlereference{Cpickle} module to save the trace objects. Use of this backend is appropriate for small scale, short-lived projects. For longer term or larger projects, the \titlereference{pickle} backend should be avoided since generated files might be unreadable across different Python versions.
+
+
+%___________________________________________________________________________
+
+\hypertarget{hdf5}{}
+\pdfbookmark[1]{hdf5}{hdf5}
+\subsection*{hdf5}
+\label{hdf5}
+
+The hdf5 backend uses \href{http://www.pytables.org/moin}{pyTables} to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available RAM memory, speeding up data access. For this backend to work, pyTables must be installed, which in turn requires the hdf5 library.
 
 
 %___________________________________________________________________________
@@ -106,7 +128,7 @@ \subsection*{pickle}
 \subsection*{sqlite}
 \label{sqlite}
 
-Chris ...
+The sqlite backend is based on the python module sqlite3. It is not as mature as the other backends, in the sense that is does not support saving/restoring of state and plug and play reloading.
 
 
 %___________________________________________________________________________
@@ -116,17 +138,7 @@ \subsection*{sqlite}
 \subsection*{mysql}
 \label{mysql}
 
-Chris ...
-
-
-%___________________________________________________________________________
-
-\hypertarget{hdf5}{}
-\pdfbookmark[1]{hdf5}{hdf5}
-\subsection*{hdf5}
-\label{hdf5}
-
-The hdf5 backend uses pyTables to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available memory. Data access is also very fast.
+The mysql backend is based on the MySQLd python module. It also is not as mature as the other backends.
 
 \leavevmode
 \begin{longtable}[c]{|p{0.133\locallinewidth}|p{0.447\locallinewidth}|p{0.307\locallinewidth}|}
@@ -144,13 +156,12 @@ \subsection*{hdf5}
 no{\_}trace
  & 
 Do not tally samples at all.
-Use only for testing purposes.
  &  \\
 \hline
 
 ram
  & 
-Store samples in memory.
+Store samples in live memory.
  &  \\
 \hline
 
@@ -166,6 +177,14 @@ \subsection*{hdf5}
  &  \\
 \hline
 
+hdf5
+ & 
+Store samples in the HDF5 format.
+ & 
+pytables ({\textgreater}2.0), libhdf5
+ \\
+\hline
+
 sqlite
  & 
 Store samples in a sqlite database.
@@ -181,14 +200,6 @@ \subsection*{hdf5}
 MySQLdb
  \\
 \hline
-
-hdf5
- & 
-Store samples in the HDF5 format.
- & 
-pytables ({\textgreater}2.0), libhdf5
- \\
-\hline
 \end{longtable}
 
 For more information about individual backends, refer to the \href{docs/API.pdf}{API} documentation.