|
| 1 | +--- |
| 2 | +Title: 'Descriptive Stats' |
| 3 | +Description: 'Summarizes and describes the essential features of a dataset.' |
| 4 | +Subjects: |
| 5 | + - 'Computer Science' |
| 6 | + - 'Data Science' |
| 7 | +Tags: |
| 8 | + - 'Data' |
| 9 | + - 'Functions' |
| 10 | + - 'Math' |
| 11 | + - 'Python' |
| 12 | +CatalogContent: |
| 13 | + - 'learn-python-3' |
| 14 | + - 'paths/computer-science' |
| 15 | +--- |
| 16 | + |
| 17 | +In SciPy, **descriptive statistics** refers to summarizing and analyzing a dataset's key characteristics. It helps summarize essential properties such as central tendency, variability, and distribution shape. |
| 18 | + |
| 19 | +The **`.describe()`** function in the `scipy.stats` module is used to calculate common descriptive statistics of a given array, such as: |
| 20 | + |
| 21 | +- Number of observations (`nobs`) |
| 22 | +- Minimum and maximum values (`minmax`) |
| 23 | +- Mean (`mean`) |
| 24 | +- Variance (`variance`) |
| 25 | +- Skewness (`skewness`) |
| 26 | +- Kurtosis (`kurtosis`) |
| 27 | + |
| 28 | +## Syntax |
| 29 | + |
| 30 | +```pseudo |
| 31 | +stats.describe(a, axis=0, ddof=1, bias=True, nan_policy='propagate') |
| 32 | +``` |
| 33 | + |
| 34 | +- `a`: The input data to describe. |
| 35 | +- `axis` (Optional): The axis along which to compute the descriptive statistics (default is `0`). If set to `None`, the statistics are calculated for the whole array. |
| 36 | +- `ddof` (Optional): Delta Degrees of Freedom for calculating variance (default is `1`). |
| 37 | +- `bias` (Optional): If set to `False`, it corrects the skewness and kurtosis calculations for statistical bias. |
| 38 | +- `nan_policy` (Optional): Defines the handling method to use when the input contains NaN. The options include: |
| 39 | + - `propagate` (Default): Returns NaN. |
| 40 | + - `raise`: Raises an error. |
| 41 | + - `omit`: Ignores NaN values and performs the calculations. |
| 42 | + |
| 43 | +## Example |
| 44 | + |
| 45 | +The following example demonstrates the usage of the `.describe()` function to calculate the descriptive statistics of a given array: |
| 46 | + |
| 47 | +```py |
| 48 | +import numpy as np |
| 49 | +from scipy import stats |
| 50 | + |
| 51 | +# Define an array |
| 52 | +arr = np.array([12, 23, 34, 45, 56]) |
| 53 | + |
| 54 | +# Calculate the descriptive statistics of the array |
| 55 | +res = stats.describe(arr) |
| 56 | + |
| 57 | +# Print the result |
| 58 | +print(res) |
| 59 | +``` |
| 60 | + |
| 61 | +The above code produces the following output: |
| 62 | + |
| 63 | +```shell |
| 64 | +DescribeResult(nobs=5, minmax=(12, 56), mean=34.0, variance=302.5, skewness=0.0, kurtosis=-1.3) |
| 65 | +``` |
| 66 | + |
| 67 | +## Codebyte Example |
| 68 | + |
| 69 | +The following codebyte example demonstrates the usage of the `.describe()` function to calculate the descriptive statistics of a given array: |
| 70 | + |
| 71 | +```codebyte/python |
| 72 | +import numpy as np |
| 73 | +from scipy import stats |
| 74 | +
|
| 75 | +# Define an array |
| 76 | +arr = np.array([5, 10, 20, 40, 80]) |
| 77 | +
|
| 78 | +# Calculate the descriptive statistics of the array |
| 79 | +res = stats.describe(arr) |
| 80 | +
|
| 81 | +# Print the result |
| 82 | +print(res) |
| 83 | +``` |
0 commit comments