Skip to content

Commit dfe5a99

Browse files
Added example from README (#98)
1 parent 5394ca0 commit dfe5a99

File tree

1 file changed

+121
-0
lines changed

1 file changed

+121
-0
lines changed

welcome.md

+121
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,127 @@ Here is what sets it apart:
1919
* **Batteries included**: Includes probability distributions, Gaussian processes, ABC, SMC and much more. It integrates nicely with {doc}`ArviZ <arviz:index>` for visualizations and diagnostics, as well as {doc}`Bambi <bambi:index>` for high-level mixed-effect models.
2020
* **Community focused**: Ask questions on [discourse](https://discourse.pymc.io), join [MeetUp events](https://meetup.com/pymc-online-meetup/), follow us on [Twitter](https://twitter.com/pymc_devs), and start [contributing](https://www.pymc.io/projects/docs/en/latest/contributing/index.html).
2121

22+
## Example from Linear Regression
23+
24+
This example demonstrates how to perform Bayesian inference for a linear regression model to predict plant growth based on environmental factors.
25+
26+
Plant growth can be influenced by multiple factors, and understanding these relationships is crucial for optimizing agricultural practices.
27+
28+
```python
29+
import pymc as pm
30+
31+
# Taking draws from a normal distribution
32+
seed = 42
33+
x_dist = pm.Normal.dist(shape=(100, 3))
34+
x_data = pm.draw(x_dist, random_seed=seed)
35+
36+
# Independent Variables:
37+
# Sunlight Hours: Number of hours the plant is exposed to sunlight daily.
38+
# Water Amount: Daily water amount given to the plant (in milliliters).
39+
# Soil Nitrogen Content: Percentage of nitrogen content in the soil.
40+
41+
42+
# Dependent Variable:
43+
# Plant Growth (y): Measured as the increase in plant height (in centimeters) over a certain period.
44+
45+
46+
# Define coordinate values for all dimensions of the data
47+
coords={
48+
"trial": range(100),
49+
"features": ["sunlight hours", "water amount", "soil nitrogen"],
50+
}
51+
52+
# Define generative model
53+
with pm.Model(coords=coords) as generative_model:
54+
x = pm.Data("x", x_data, dims=["trial", "features"])
55+
56+
# Model parameters
57+
betas = pm.Normal("betas", dims="features")
58+
sigma = pm.HalfNormal("sigma")
59+
60+
# Linear model
61+
mu = x @ betas
62+
63+
# Likelihood
64+
# Assuming we measure deviation of each plant from baseline
65+
plant_growth = pm.Normal("plant growth", mu, sigma, dims="trial")
66+
67+
68+
# Generating data from model by fixing parameters
69+
fixed_parameters = {
70+
"betas": [5, 20, 2],
71+
"sigma": 0.5,
72+
}
73+
with pm.do(generative_model, fixed_parameters) as synthetic_model:
74+
idata = pm.sample_prior_predictive(random_seed=seed) # Sample from prior predictive distribution.
75+
synthetic_y = idata.prior["plant growth"].sel(draw=0, chain=0)
76+
77+
78+
# Infer parameters conditioned on observed data
79+
with pm.observe(generative_model, {"plant growth": synthetic_y}) as inference_model:
80+
idata = pm.sample(random_seed=seed)
81+
82+
summary = pm.stats.summary(idata, var_names=["betas", "sigma"])
83+
print(summary)
84+
```
85+
From the summary, we can see that the mean of the inferred parameters are very close to the fixed parameters
86+
87+
| Params | mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat |
88+
|-------------------------|-------|------|--------|---------|-----------|---------|----------|----------|-------|
89+
| betas[sunlight hours] | 4.972 | 0.054 | 4.866 | 5.066 | 0.001 | 0.001 | 3003 | 1257 | 1 |
90+
| betas[water amount] | 19.963 | 0.051 | 19.872 | 20.062 | 0.001 | 0.001 | 3112 | 1658 | 1 |
91+
| betas[soil nitrogen] | 1.994 | 0.055 | 1.899 | 2.107 | 0.001 | 0.001 | 3221 | 1559 | 1 |
92+
| sigma | 0.511 | 0.037 | 0.438 | 0.575 | 0.001 | 0 | 2945 | 1522 | 1 |
93+
94+
```python
95+
# Simulate new data conditioned on inferred parameters
96+
new_x_data = pm.draw(
97+
pm.Normal.dist(shape=(3, 3)),
98+
random_seed=seed,
99+
)
100+
new_coords = coords | {"trial": [0, 1, 2]}
101+
102+
with inference_model:
103+
pm.set_data({"x": new_x_data}, coords=new_coords)
104+
pm.sample_posterior_predictive(
105+
idata,
106+
predictions=True,
107+
extend_inferencedata=True,
108+
random_seed=seed,
109+
)
110+
111+
pm.stats.summary(idata.predictions, kind="stats")
112+
```
113+
The new data conditioned on inferred parameters would look like:
114+
115+
| Output | mean | sd | hdi_3% | hdi_97% |
116+
|-------------------|-------|------|--------|---------|
117+
| plant growth[0] | 14.229 | 0.515 | 13.325 | 15.272 |
118+
| plant growth[1] | 24.418 | 0.511 | 23.428 | 25.326 |
119+
| plant growth[2] | -6.747 | 0.511 | -7.740 | -5.797 |
120+
121+
```python
122+
# Simulate new data, under a scenario where the first beta is zero
123+
with pm.do(
124+
inference_model,
125+
{inference_model["betas"]: inference_model["betas"] * [0, 1, 1]},
126+
) as plant_growth_model:
127+
new_predictions = pm.sample_posterior_predictive(
128+
idata,
129+
predictions=True,
130+
random_seed=seed,
131+
)
132+
133+
pm.stats.summary(new_predictions, kind="stats")
134+
```
135+
The new data, under the above scenario would look like:
136+
137+
| Output | mean | sd | hdi_3% | hdi_97% |
138+
|-------------------|-------|------|--------|---------|
139+
| plant growth[0] | 12.149 | 0.515 | 11.193 | 13.135 |
140+
| plant growth[1] | 29.809 | 0.508 | 28.832 | 30.717 |
141+
| plant growth[2] | -0.131 | 0.507 | -1.121 | 0.791 |
142+
22143
## Get started
23144
* [Installation instructions](https://www.pymc.io/projects/docs/en/latest/installation.html)
24145
* [Beginner guide (if you **do not** know Bayesian modeling)](https://www.pymc.io/projects/docs/en/latest/learn/core_notebooks/pymc_overview.html)

0 commit comments

Comments
 (0)