Skip to content

Pyglm new #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft

Pyglm new #358

wants to merge 6 commits into from

Conversation

abstractqqq
Copy link
Owner

No description provided.

@abstractqqq
Copy link
Owner Author

@anath2 Let's continue the chat in the new thread.

I am testing logistic regression here by using this following example:

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(return_X_y=True)
y[y != 0] = 1 # turn y into a binary classification problem

from polars_ds.linear_models import GLM

glm = GLM(family = "logistic", add_bias=True, max_iter = 100, tol = 1e-4)
glm.fit(X, y)
print(glm.coeffs())


import statsmodels.api as sm

X_ = sm.add_constant(X)
# Instantiate a gamma family model with the default link function.
gamma_model = sm.GLM(
    y, 
    X_, 
    family=sm.families.Binomial()
)
gamma_results = gamma_model.fit()
print(gamma_results.summary())

I understand that glm doesn't have unique solutions, which means that coeffs could be different. However, the fact that our implementation of IRLS doesn't converge, and statsmodel converges (at iteration 24) is a little strange.

@anath2
Copy link
Contributor

anath2 commented May 20, 2025

Yes strange indeed. I tried it out on pure rust implementation of IRLS and it looks like it converges after 12 iterations.

My Testing code:

use std::fs::{self, File};
use std::path::Path;
use std::io::Write;
use serde::Deserialize;
use std::io::{Seek, SeekFrom};
use faer::Mat

/// Helpers
    
#[derive(Debug, Deserialize)]
struct IrisRecord {
    #[serde(rename = "0")]
    sepal_length: f64,
    #[serde(rename = "1")]
    sepal_width: f64,
    #[serde(rename = "2")]
    petal_length: f64,
    #[serde(rename = "3")]
    petal_width: f64,
    #[serde(rename = "4")]
    species: String,
}

fn download_iris_dataset() -> std::io::Result<String> {
    let data_dir = Path::new("data");
    let iris_path = data_dir.join("iris.csv");
  
    if !data_dir.exists() {
       fs::create_dir(data_dir)?;
    }

    if !iris_path.exists() {
       let url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data";
       let response = reqwest::blocking::get(url)
          .unwrap()
          .text()
          .unwrap();
      
       let mut file = File::create(&iris_path)?;
       file.write_all(response.as_bytes())?;
    }
  
   Ok(iris_path.to_str().unwrap().to_string())
}

fn make_binomial_regression_dataset() -> (Mat<f64>, Mat<f64>) {
    let iris_path = download_iris_dataset().unwrap();
    let mut file = File::open(&iris_path).unwrap();

    let nrows = {
        let mut rdr = csv::ReaderBuilder::new()
            .has_headers(false)
            .from_reader(&mut file);
        let count = rdr.records().count();
        count
    };

    file.seek(SeekFrom::Start(0)).unwrap();  // rewind and populate
    
    let mut X: Mat<f64> = Mat::zeros(nrows, 4);
    let mut y: Mat<f64> = Mat::zeros(nrows, 1);

    let mut rdr = csv::ReaderBuilder::new()
        .has_headers(false)
        .from_reader(file);

    for (i, result) in rdr.deserialize().enumerate() {
        let record: IrisRecord = result.unwrap();
        X[(i, 0)] = record.sepal_length;
        X[(i, 1)] = record.sepal_width;
        X[(i, 2)] = record.petal_length;
        X[(i, 3)] = record.petal_width;

        if record.species == "Iris-setosa" {   // iris-sentosa == 0 otherwise 1
            y[(i, 0)] = 0.0;
        } else {
            y[(i, 0)] = 1.0;
        }
    }
   
    (X, y)
}

fn test_binomial_regression() {
    let (X, y) = make_binomial_regression_dataset();

    let mut model = Binomial::new(add_bias=true);
    model.fit_unchecked(X.as_ref(), y.as_ref());
    let preds = model.fitted_values();

    // Inverse link - probabilities and accuracy
   // ...
}


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants