Skip to content

Binomial python #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: binomial_python
Choose a base branch
from
78 changes: 78 additions & 0 deletions python/binomial_test.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: "Binomial Test"
format: html
editor: visual
---

The statistical test used to determine whether the proportion in a binary outcome experiment is equal to a specific value. It is appropriate when we have a small sample size and want to test the success probability $p$ against a hypothesized value $p_0$.

## Creating a sample dataset

- We will generate a dataset where we record the outcomes of 1000 coin flips.

- We will use the `binom.test` function to test if the proportion of heads is significantly different from 0.5.

```{python}
import numpy as np
from scipy.stats import binomtest

# Set seed for reproducibility
np.random.seed(19)
coin_flips = np.random.choice(['H', 'T'], size=1000, replace=True, p=[0.5, 0.5])
```

Now, we will count the heads and tails and summarize the data.

```{python}
# Count heads and tails
heads_count = np.sum(coin_flips == 'H')
tails_count = np.sum(coin_flips == 'T')
total_flips = len(coin_flips)

heads_count, tails_count, total_flips
```

## Conducting Binomial Test

```{python}
# Perform the binomial test
binom_test_result = binomtest(heads_count, total_flips, p=0.5)
binom_test_result
```

### Results:

The output has a p-value `py binom_test_result` $> 0.05$ (chosen level of significance). Hence, we fail to reject the null hypothesis and conclude that the **coin is fair**.

# Example of Clinical Trial Data

We load the `lung` dataset from `survival` package. We want to test if the proportion of patients with survival status 1 (dead) is significantly different from a hypothesized proportion (e.g. 50%)

We will calculate number of deaths and total number of patients.

```{python}
import pandas as pd

# Load the lung dataset as an example (using a mock dataset for demonstration purposes)
lung = pd.read_csv("../data/lung_cancer.csv")

# The 'status' flag in the dataset here is flipped compared to the R version
num_deaths = np.sum(lung['status'] == 0)
total_pat = lung.shape[0]

num_deaths, total_pat
```

## Conduct the Binomial Test

We will conduct the Binomial test and hypothesize that the proportion of death should be 19%.

```{python}
# Perform the binomial test
binom_test_clinical = binomtest(num_deaths, total_pat, p=0.19)
binom_test_clinical
```

## Results:

The output has a p-value `py binom_test_clinical` $< 0.05$ (chosen level of significance). Hence, we reject the null hypothesis and conclude that **the propotion of death is significantly different from 19%**.