This project is a from-scratch (no ML libraries) neural network implementation in Java, trained and evaluated on the classic MNIST handwritten digits dataset.
Main.java contains the neural network implementation:
ActivationFunction: sigmoid, sigmoid derivative, ReLU/Leaky ReLU, andsoftMaxLossFunction: cross-entropy loss (for softmax classification)NeuronandLayer: store weights/biases for each layerNeuralNetwork: forward pass, backpropagation, training, prediction, and testingpublic class Main: CLI menu (1= Train,2= Test)
PreProcess.java contains the dataset and math utilities:
- Reads MNIST
.idxfiles for images/labels (train-*andt10k-*) - Normalizes pixel values
- Converts labels to one-hot vectors
- Provides matrix helpers used by the network (
dot,transpose,sumSecondAxis, etc.)
The dataset files live in mnist-dataset/ in this repo:
train-images.idx3-ubyte,train-labels.idx1-ubytet10k-images.idx3-ubyte,t10k-labels.idx1-ubyte
This code is compiled to Java 8 bytecode for compatibility with older Java runtimes:
- Ensure you have a JDK (not just a JRE) installed
- Recommended: Java 8 (or newer tooling compiled with
--release 8)
Download the MNIST dataset (the same *.idx files referenced by the code) from Kaggle:
Place the dataset files into the mnist-dataset/ folder so the paths match what the code loads:
mnist-dataset/train-images.idx3-ubytemnist-dataset/train-labels.idx1-ubytemnist-dataset/t10k-images.idx3-ubytemnist-dataset/t10k-labels.idx1-ubyte
From the repo root (javaMNIST/):
mkdir -p out
rm -rf out
# Compile to Java 8 bytecode (prevents UnsupportedClassVersionError)
javac --release 8 -d out Main.java PreProcess.java
# Run
java -cp out Final.MainThe program will prompt you:
1to train (then it will run a test pass on the trained weights)2to test (runs inference with the current in-memory weights; if you didn’t train in the same run, this will be near-random accuracy)
Key improvements made to get the network working well on MNIST:
- Correct MNIST input normalization: switched to global scaling (
pixel / 255.0) instead of per-image min/max scaling - Better hidden-layer nonlinearity: switched hidden activation from sigmoid to Leaky ReLU and updated backprop accordingly
- Better training procedure: replaced full-batch gradient descent with mini-batch training plus momentum
- Increased model capacity: changed network shape from
784 -> 10 -> 10to784 -> 64 -> 10 - Added/finished test-time evaluation on
t10k-*so we can report accuracy properly
Example results observed with the current default settings (train on 4000 samples, test on 2000 samples, epochs = 401):
- Training accuracy reaches ~
100%on the 4000 training subset - Test accuracy: about
90.65%
Before the accuracy improvements listed above, training on 4000 images typically plateaued in the low/mid 70% range (for example ~73.7% at epoch 400), and t10k evaluation was not implemented, so it was harder to tell how well the model generalized.