Skip to content

Peer Review: NYC Public Schools... #6

@alexhubbard89

Description

@alexhubbard89

First off this is a very interesting topic, and something that is starting to gaining more attention. I'm very interested to see your findings.

There are many factors that could influence if a student is college ready. I’m very impressed with the amount of features you were able to collect to investigate problem. I also really like the visuals of the ‘Percent of Null Values by Feature’ and the ‘Heatmap of Correlated Features’ in your exploratory data analysis. It’s extremely helpful to see those plotted out, as opposed to numbers in a spreadsheet.

My initial concern was the amount of features you are using to predict. Personally, I think I would find it difficult to keep track of that many features (kudos to you!). I was also afraid that you would run the risk of overfitting your model. However, after exploring your workbooks and seeing your learning curves your results look very promising!

I have two main critiques for your project:

  1. In the final-presentation-draft-michelle-cronin.pdf under the Modeling section, you have the minimum benchmark of 87%. I understand how you calculated that metric from diving into you work, but a short sentence would be helpful for people who will only be viewing your presentation.
  2. Add more notes to you code. When I first went through your project and documentation I had a few questions. After going through everything a second time I found answers to my questions within your ipython notebooks. For example, in 2_data_cleaning_eda.ipynb you give a nice overview of your data set, and in 3_models.ipynb you drop features and leave short comments in you code. It would be awesome to see your thought process in greater detail. Maybe when your dropping these features you could add some markdown notes with a couple of sentence to explain your actions to your audience.

Overall your code looks good and it's well organized into separate notebooks and block off into sections. I think that you are on the right path for next steps, as highlighted in your To Do list in 3_models.ipynb. I agree that SVM is probably having a hard time computing due to the high number of features. I'm interested to see the results of an SVM model with you data set.

Keep up the good work!

@masongallo @lemonsoup

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions