You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
changed :math:'blabla' to :math: (valid ReST) in distributions.py. Minor changes to the docs. Replaced the distribution definitions by the epydoc generated docs.
Copy file name to clipboardExpand all lines: README.txt
+3-6Lines changed: 3 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Features
47
47
What's new in 2.0
48
48
=================
49
49
50
-
* New, more flexible object model and syntax.
50
+
* Newflexible object model and syntax (non backward compatible).
51
51
52
52
* Reduced redundant computations: only relevant log-probability terms are
53
53
computed, and these are cached.
@@ -89,14 +89,13 @@ From a python shell, type::
89
89
S.sample(iter=10000, burn=5000, thin=2)
90
90
91
91
where problem_definition is a module or a dictionary containing Node, Data and
92
-
Parameter instances defining your problem. Read the `user guide`_ for a
93
-
complete description of the package, classes and some examples to get started.
92
+
Parameter instances defining your problem.
94
93
95
94
96
95
History
97
96
=======
98
97
99
-
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
98
+
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
100
99
101
100
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
Copy file name to clipboardExpand all lines: docs/INSTALL.tex
+6-2Lines changed: 6 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,10 @@ \section*{Dependencies}
36
36
\href{http://ipython.scipy.org/}{IPython} (optional): A convenient python shell coming with parallel
37
37
computing facilities.
38
38
39
+
\item {}
40
+
\href{http://www.pytables.org/moin}{pyTables} (optional): An interface to the HDF5 library for storing datasets
41
+
in binary format.
42
+
39
43
\end{itemize}
40
44
41
45
There are prebuilt distributions that include all the needed dependencies. For
@@ -215,7 +219,7 @@ \section*{Running the test suite}
215
219
216
220
You should see a lot of tests being run, and messages appear if errors are
217
221
raised or if some tests fail. In case this happens (it shouldn't), please report
218
-
the problems on the issue tracker, specifying the version you are using and the
222
+
the problems on the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker}, specifying the version you are using and the
219
223
environment. Some of the tests require SciPy, if it is not installed on your
220
224
system, you should not worry too much about failing tests.
221
225
@@ -228,6 +232,6 @@ \section*{Bugs and feature requests}
228
232
\label{bugs-and-feature-requests}
229
233
230
234
Report problems with the installation, bugs in the code or feature request at
231
-
the issue tracker at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
235
+
the \href{http://code.google.com/p/pymc/issues/list.}{issue tracker} at \href{http://code.google.com/p/pymc/issues/list}{http://code.google.com/p/pymc/issues/list} .
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastimgs samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
131
+
PyMC began development in 2003, as an effort to generalize the process of building Metropolis-Hastings samplers, with an aim to making Markov chain Monte Carlo more accessible to non-statisticians (particularly ecologists). The choice to develop PyMC as a python module, rather than a standalone application, allowed the use MCMC methods in a larger modeling framework, in contrast to the BUGS environment. By 2005, PyMC was reliable enough for version 1.0 to be released to the public. A small group of regular users, most associated with the University of Georgia, provided much of the feedback necessary for the refinement of PyMC to its current state.
133
132
134
133
In 2006, David Huard and Anand Patil joined Chris Fonnesbeck on the development team for PyMC 2.0. This iteration of the software strives for more flexibility, better performance and a better end-user experience than any previous version of PyMC.
Copy file name to clipboardExpand all lines: docs/database.tex
+58-47Lines changed: 58 additions & 47 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
2
2
3
-
By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different strategies, from simple ASCII files to compressed binary formats. These strategies are implemented different \emph{database backends}, behaving identically from the user's perspective. In the following, the interface to these backends is discussed, and a description of the different backends is given.
3
+
By default, PyMC keeps the sampled data in memory and keeps no trace of it on the hard drive. To save this data to disk, PyMC provides different storing strategies, which we refer to as \emph{database backends}. All those backends provide the same user interface, making it trivial to switch from one backend to another. In the following, this common interface is presented, along with an individual description of each backend.
The choice of database backend is made when a sampler is created using the \titlereference{db} keyword:
13
+
The database backend is selected by the \titlereference{db} keyword:
14
14
\begin{quote}{\ttfamily\raggedright\noindent
15
-
S~=~MCMC(DisasterModel,~db='txt',~dirname='test')
15
+
S~=~MCMC(DisasterModel,~db='ram')
16
16
}\end{quote}
17
17
18
-
This instructs the sampler to tally samples in txt files stored in a directory named \titlereference{test}. Other choices for the database are given in the table below, the default being \titlereference{ram}. When the \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using its trace object
19
-
\begin{quote}{\ttfamily\raggedright\noindent
20
-
S.e.trace()
21
-
}\end{quote}
22
-
23
-
When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
24
-
\begin{quote}{\ttfamily\raggedright\noindent
25
-
DB~=~Database.txt.load('test')
26
-
}\end{quote}
18
+
Here, we instructed the MCMC sampler to keep the trace in the computer's live memory. This means that when the Python session closes, all data will be lost. This is the default backend.
27
19
28
-
This object can then be linked to a model definition using
20
+
Each time MCMC's \titlereference{sample} method is called, a \titlereference{chain} is created storing the sampled variables. The data in this chain can be accessed for each variable using the call method of its trace attribute:
29
21
\begin{quote}{\ttfamily\raggedright\noindent
30
-
S~=~Sampler(DisasterSampler,~db=DB)
31
-
}\end{quote}
32
-
33
-
For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attribtues of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{D.db.close()} was called.
34
-
35
-
The \titlereference{trace} object has the following signature .. {[}{\#}{]}:
@@ -45,7 +30,7 @@ \section*{Accessing Sampled Data: User Interface}
45
30
46
31
\item[{thin}] \leavevmode (\textbf{int})
47
32
48
-
Number of samples to step.
33
+
The stride, ie the number of samples to step for each returned value.
49
34
50
35
\item[{chain}] \leavevmode (\textbf{int or None})
51
36
@@ -56,7 +41,34 @@ \section*{Accessing Sampled Data: User Interface}
56
41
Slice object used to parse the samples. Overrides burn and thin parameters.
57
42
58
43
\end{description}
59
-
% [#]: The `trace` attribute of stochastic parameters is in fact an instance of a Trace class, defined for each backend. This class has a method called `gettrace` that returns the trace of the object, and which is called by `trace()` .
\pdfbookmark[0]{Loading data from a previous session}{loading-data-from-a-previous-session}
50
+
\section*{Loading data from a previous session}
51
+
\label{loading-data-from-a-previous-session}
52
+
53
+
To store a copy of the trace on the hard disk, a number of backends are available: \titlereference{txt}, \titlereference{pickle}, \titlereference{hdf5}, \titlereference{sqlite} and \titlereference{mysql}. These all write the data to disk, in such a way that it can be loaded back in a following session and appended to. So for instance, to save data in ASCII format, we would do:
When \titlereference{S.db.close()} is called, the data is flushed to disk. That is, directories are created for each chain, with samples from each stochastic variable in a separate file. To access this data during a following session, each database provides a \titlereference{load} function instantiating a \titlereference{Database} object
61
+
\begin{quote}{\ttfamily\raggedright\noindent
62
+
DB~=~Database.txt.load('disaster{\_}data')
63
+
}\end{quote}
64
+
65
+
This \titlereference{Database} object can then be linked to a model definition using
66
+
\begin{quote}{\ttfamily\raggedright\noindent
67
+
S~=~Sampler(DisasterSampler,~db=DB)~\\
68
+
S.sample(10000)
69
+
}\end{quote}
70
+
71
+
For some databases (\titlereference{hdf5}, \titlereference{pickle}), loading an existing database restores the previous state of the sampler. That is, the attributes of the Sampler, its Stochastic parameters and StepMethods are all set to the value they had at the time \titlereference{S.db.close()} was called.
The \titlereference{txt} backend is a modified \titlereference{ram} backend, the only difference being that when the database is closed, the data is written to disk in ascii files. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
101
+
With the \titlereference{txt} backend, the data is written to disk in ASCII files when the class \titlereference{close()} method is called. More precisely, the data for each chain is stored in a directory called \titlereference{Chain{\_}{\textless}{\#}{\textgreater}}, the trace for each variable being stored in a file names`{\textless}variable name{\textgreater}.txt`. This backend makes it easy to load the data using another application, but for large datasets, files tend to be embarassingly large and slow to load into memory.
As its name implies, the \titlereference{pickle} database used the \titlereference{Cpickle} module to save the trace objects. Use of this backend is not suggested since the generated files may become unreadable after a Python update.
111
+
As its name implies, the \titlereference{pickle} database relies on the \titlereference{Cpickle} module to save the trace objects. Use of this backend is appropriate for small scale, short-lived projects. For longer term or larger projects, the \titlereference{pickle} backend should be avoided since generated files might be unreadable across different Python versions.
The hdf5 backend uses \href{http://www.pytables.org/moin}{pyTables} to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available RAM memory, speeding up data access. For this backend to work, pyTables must be installed, which in turn requires the hdf5 library.
The sqlite backend is based on the python module sqlite3. It is not as mature as the other backends, in the sense that is does not support saving/restoring of state and plug and play reloading.
The hdf5 backend uses pyTables to save data in binary HDF5 format. The main advantage of this backend is that data is flushed regularly to disk, reducing memory usage and allowing sampling of datasets much larger than the available memory. Data access is also very fast.
141
+
The mysql backend is based on the MySQLd python module. It also is not as mature as the other backends.
0 commit comments