Skip to content

Commit e4a58e3

Browse files
author
david.huard
committed
changed :math:'blabla' to :math: (valid ReST) in distributions.py. Minor changes to the docs. Replaced the distribution definitions by the epydoc generated docs.
git-svn-id: https://pymc.googlecode.com/svn/trunk@714 15d7aa0b-6f1a-0410-991a-d59f85d14984
1 parent a12e1ee commit e4a58e3

17 files changed

+4612
-208
lines changed

INSTALL.txt

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@ dependencies, and all are freely available online.
3535
* `IPython`_ (optional): A convenient python shell coming with parallel
3636
computing facilities.
3737

38+
* `pyTables`_ (optional): An interface to the HDF5 library for storing datasets
39+
in binary format.
40+
41+
42+
3843
There are prebuilt distributions that include all the needed dependencies. For
3944
Mac OS X and Linux users, we recommend the `ActiveState`_ distributions.
4045
Windows users should download and install `Enthought Python`_. The Mac OS X
@@ -67,6 +72,10 @@ tested with PyMC but may work nonetheless.
6772
.. _`IPython`:
6873
http://ipython.scipy.org/
6974

75+
.. _`pyTables`:
76+
http://www.pytables.org/moin
77+
78+
7079
Platform-specific instructions
7180
==============================
7281

@@ -154,7 +163,7 @@ To make sure everything is working correctly, open a python shell and type::
154163

155164
You should see a lot of tests being run, and messages appear if errors are
156165
raised or if some tests fail. In case this happens (it shouldn't), please report
157-
the problems on the issue tracker, specifying the version you are using and the
166+
the problems on the `issue tracker`_, specifying the version you are using and the
158167
environment. Some of the tests require SciPy, if it is not installed on your
159168
system, you should not worry too much about failing tests.
160169

@@ -163,4 +172,8 @@ Bugs and feature requests
163172
=========================
164173

165174
Report problems with the installation, bugs in the code or feature request at
166-
the issue tracker at http://code.google.com/p/pymc/issues/list .
175+
the `issue tracker`_ at http://code.google.com/p/pymc/issues/list .
176+
177+
.. _`issue tracker`:
178+
http://code.google.com/p/pymc/issues/list .
179+

README.txt

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Features
4747
What's new in 2.0
4848
=================
4949

50-
* New, more flexible object model and syntax.
50+
* New flexible object model and syntax (non backward compatible).
5151

5252
* Reduced redundant computations: only relevant log-probability terms are
5353
computed, and these are cached.
@@ -89,14 +89,13 @@ From a python shell, type::
8989
S.sample(iter=10000, burn=5000, thin=2)
9090

9191
where problem_definition is a module or a dictionary containing Node, Data and
92-
Parameter instances defining your problem. Read the `user guide`_ for a
93-
complete description of the package, classes and some examples to get started.
92+
Parameter instances defining your problem.
9493

9594

9695
History
9796
=======
9897

99-
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
98+
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
10099

101100
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
102101

@@ -110,5 +109,3 @@ See the `INSTALL.txt`_ file.
110109
.. _`INSTALL.txt`:
111110
./INSTALL.txt
112111

113-
.. _`user guide`:
114-
docs/pdf/new_interface.pdf

builddocs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
#!/usr/bin/env bash
22
#epydoc --verbose --debug --config epydoc.conf
3-
3+
#cp docs/pdf/pymc.distributions-module.tex docs/
44

55
# Make manual
66
cd docs
7-
rst2latex.py ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
8-
rst2latex.py ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
9-
rst2latex.py ../PyMC/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex
7+
rst2latex ../README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\hypertarget{installation}{} -o README.tex
8+
rst2latex ../INSTALL.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o INSTALL.tex
9+
rst2latex ../pymc/database/README.txt | ./extract.py -s \\\\setlength{\\\\locallinewidth}{\\\\linewidth} -e \\end{document} -o database.tex
1010

1111
mkdir pdf
1212
pdflatex -output-directory=pdf guide2.0

docs/INSTALL.tex

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ \section*{Dependencies}
3636
\href{http://ipython.scipy.org/}{IPython} (optional): A convenient python shell coming with parallel
3737
computing facilities.
3838

39+
\item {}
40+
\href{http://www.pytables.org/moin}{pyTables} (optional): An interface to the HDF5 library for storing datasets
41+
in binary format.
42+
3943
\end{itemize}
4044

4145
There are prebuilt distributions that include all the needed dependencies. For
@@ -215,7 +219,7 @@ \section*{Running the test suite}
215219

216220
You should see a lot of tests being run, and messages appear if errors are
217221
raised or if some tests fail. In case this happens (it shouldn't), please report
218-
the problems on the issue tracker, specifying the version you are using and the
222+
the problems on the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}, specifying the version you are using and the
219223
environment. Some of the tests require SciPy, if it is not installed on your
220224
system, you should not worry too much about failing tests.
221225

@@ -228,6 +232,6 @@ \section*{Bugs and feature requests}
228232
\label{bugs-and-feature-requests}
229233

230234
Report problems with the installation, bugs in the code or feature request at
231-
the issue tracker at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
235+
the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker} at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
232236

233237
\

docs/README.tex

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ \section*{What's new in 2.0}
6363
\label{what-s-new-in-2-0}
6464
\begin{itemize}
6565
\item {}
66-
New, more flexible object model and syntax.
66+
New flexible object model and syntax (non backward compatible).
6767

6868
\item {}
6969
Reduced redundant computations: only relevant log-probability terms are
@@ -118,8 +118,7 @@ \section*{Usage}
118118
}\end{quote}
119119

120120
where problem{\_}definition is a module or a dictionary containing Node, Data and
121-
Parameter instances defining your problem. Read the \href{docs/pdf/new_interface.pdf}{user guide} for a
122-
complete description of the package, classes and some examples to get started.
121+
Parameter instances defining your problem.
123122

124123

125124
%___________________________________________________________________________
@@ -129,7 +128,7 @@ \section*{Usage}
129128
\section*{History}
130129
\label{history}
131130

132-
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
131+
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
133132

134133
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
135134

docs/database.tex

Lines changed: 58 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22

3-
By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different strategies, from simple ASCII files to compressed binary formats. These strategies are implemented different \emph{database backends}, behaving identically from the user's perspective. In the following, the interface to these backends is discussed, and a description of the different backends is given.
3+
By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different storing strategies, which we refer to as \emph{database backends}. All those backends provide the same user interface, making it trivial to switch from one backend to another. In the following, this common interface is presented, along with an individual description of each backend.
44

55

66
%___________________________________________________________________________
@@ -10,31 +10,16 @@
1010
\section*{Accessing Sampled Data: User Interface}
1111
\label{accessing-sampled-data-user-interface}
1212

13-
The choice of database backend is made when a sampler is created using the \titlereference{db} keyword:
13+
The database backend is selected by the \titlereference{db} keyword:
1414
\begin{quote}{\ttfamily \raggedright \noindent
15-
S~=~MCMC(DisasterModel,~db='txt',~dirname='test')
15+
S~=~MCMC(DisasterModel,~db='ram')
1616
}\end{quote}
1717

18-
This instructs the sampler to tally samples in txt files stored in a directory named \titlereference{test}. Other choices for the database are given in the table below, the default being \titlereference{ram}. When the \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using its trace object
19-
\begin{quote}{\ttfamily \raggedright \noindent
20-
S.e.trace()
21-
}\end{quote}
22-
23-
When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
24-
\begin{quote}{\ttfamily \raggedright \noindent
25-
DB~=~Database.txt.load('test')
26-
}\end{quote}
18+
Here, we instructed the MCMC sampler to keep the trace in the computer's live memory. This means that when the Python session closes, all data will be lost. This is the default backend.
2719

28-
This object can then be linked to a model definition using
20+
Each time MCMC's \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using the call method of its trace attribute:
2921
\begin{quote}{\ttfamily \raggedright \noindent
30-
S~=~Sampler(DisasterSampler,~db=DB)
31-
}\end{quote}
32-
33-
For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attribtues of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{D.db.close()} was called.
34-
35-
The \titlereference{trace} object has the following signature .. {[}{\#}{]}:
36-
\begin{quote}{\ttfamily \raggedright \noindent
37-
trace(self,~~burn=0,~thin=1,~chain=-1,~slicing=None)
22+
S.e.trace(burn=0,~thin=1,~chain=-1,~slicing=None)
3823
}\end{quote}
3924

4025
with arguments having the following meaning:
@@ -45,7 +30,7 @@ \section*{Accessing Sampled Data: User Interface}
4530

4631
\item[{thin}] \leavevmode (\textbf{int})
4732

48-
Number of samples to step.
33+
The stride, ie the number of samples to step for each returned value.
4934

5035
\item[{chain}] \leavevmode (\textbf{int or None})
5136

@@ -56,7 +41,34 @@ \section*{Accessing Sampled Data: User Interface}
5641
Slice object used to parse the samples. Overrides burn and thin parameters.
5742

5843
\end{description}
59-
% [#]: The `trace` attribute of stochastic parameters is in fact an instance of a Trace class, defined for each backend. This class has a method called `gettrace` that returns the trace of the object, and which is called by `trace()` .
44+
45+
46+
%___________________________________________________________________________
47+
48+
\hypertarget{loading-data-from-a-previous-session}{}
49+
\pdfbookmark[0]{Loading data from a previous session}{loading-data-from-a-previous-session}
50+
\section*{Loading data from a previous session}
51+
\label{loading-data-from-a-previous-session}
52+
53+
To store a copy of the trace on the hard disk, a number of backends are available: \titlereference{txt}, \titlereference{pickle}, \titlereference{hdf5}, \titlereference{sqlite} and \titlereference{mysql}. These all write the data to disk, in such a way that it can be loaded back in a following session and appended to. So for instance, to save data in ASCII format, we would do:
54+
\begin{quote}{\ttfamily \raggedright \noindent
55+
S~=~MCMC(DisasterModel,~db='txt',~dirname='disaster{\_}data')~\\
56+
S.sample(10000)~\\
57+
S.db.close()
58+
}\end{quote}
59+
60+
When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
61+
\begin{quote}{\ttfamily \raggedright \noindent
62+
DB~=~Database.txt.load('disaster{\_}data')
63+
}\end{quote}
64+
65+
This \titlereference{Database} object can then be linked to a model definition using
66+
\begin{quote}{\ttfamily \raggedright \noindent
67+
S~=~Sampler(DisasterSampler,~db=DB)~\\
68+
S.sample(10000)
69+
}\end{quote}
70+
71+
For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attributes of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{S.db.close()} was called.
6072

6173

6274
%___________________________________________________________________________
@@ -86,7 +98,7 @@ \subsection*{ram}
8698
\subsection*{txt}
8799
\label{txt}
88100

89-
The \titlereference{txt} backend is a modified \titlereference{ram} backend, the only difference being that when the database is closed, the data is written to disk in ascii files. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
101+
With the \titlereference{txt} backend, the data is written to disk in ASCII files when the class \titlereference{close()} method is called. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
90102

91103

92104
%___________________________________________________________________________
@@ -96,7 +108,17 @@ \subsection*{txt}
96108
\subsection*{pickle}
97109
\label{pickle}
98110

99-
As its name implies, the \titlereference{pickle} database used the \titlereference{Cpickle} module to save the trace objects. Use of this backend is not suggested since the generated files may become unreadable after a Python update.
111+
As its name implies, the \titlereference{pickle} database relies on the \titlereference{Cpickle} module to save the trace objects. Use of this backend is appropriate for small scale, short-lived projects. For longer term or larger projects, the \titlereference{pickle} backend should be avoided since generated files might be unreadable across different Python versions.
112+
113+
114+
%___________________________________________________________________________
115+
116+
\hypertarget{hdf5}{}
117+
\pdfbookmark[1]{hdf5}{hdf5}
118+
\subsection*{hdf5}
119+
\label{hdf5}
120+
121+
The hdf5 backend uses \href{http://www.pytables.org/moin}{pyTables} to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available RAM memory, speeding up data access. For this backend to work, pyTables must be installed, which in turn requires the hdf5 library.
100122

101123

102124
%___________________________________________________________________________
@@ -106,7 +128,7 @@ \subsection*{pickle}
106128
\subsection*{sqlite}
107129
\label{sqlite}
108130

109-
Chris ...
131+
The sqlite backend is based on the python module sqlite3. It is not as mature as the other backends, in the sense that is does not support saving/restoring of state and plug and play reloading.
110132

111133

112134
%___________________________________________________________________________
@@ -116,17 +138,7 @@ \subsection*{sqlite}
116138
\subsection*{mysql}
117139
\label{mysql}
118140

119-
Chris ...
120-
121-
122-
%___________________________________________________________________________
123-
124-
\hypertarget{hdf5}{}
125-
\pdfbookmark[1]{hdf5}{hdf5}
126-
\subsection*{hdf5}
127-
\label{hdf5}
128-
129-
The hdf5 backend uses pyTables to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available memory. Data access is also very fast.
141+
The mysql backend is based on the MySQLd python module. It also is not as mature as the other backends.
130142

131143
\leavevmode
132144
\begin{longtable}[c]{|p{0.133\locallinewidth}|p{0.447\locallinewidth}|p{0.307\locallinewidth}|}
@@ -144,13 +156,12 @@ \subsection*{hdf5}
144156
no{\_}trace
145157
&
146158
Do not tally samples at all.
147-
Use only for testing purposes.
148159
& \\
149160
\hline
150161

151162
ram
152163
&
153-
Store samples in memory.
164+
Store samples in live memory.
154165
& \\
155166
\hline
156167

@@ -166,6 +177,14 @@ \subsection*{hdf5}
166177
& \\
167178
\hline
168179

180+
hdf5
181+
&
182+
Store samples in the HDF5 format.
183+
&
184+
pytables ({\textgreater}2.0), libhdf5
185+
\\
186+
\hline
187+
169188
sqlite
170189
&
171190
Store samples in a sqlite database.
@@ -181,14 +200,6 @@ \subsection*{hdf5}
181200
MySQLdb
182201
\\
183202
\hline
184-
185-
hdf5
186-
&
187-
Store samples in the HDF5 format.
188-
&
189-
pytables ({\textgreater}2.0), libhdf5
190-
\\
191-
\hline
192203
\end{longtable}
193204

194205
For more information about individual backends, refer to the \href{docs/API.pdf}{API} documentation.

0 commit comments

Comments
 (0)