Please excuse our dust!

This code is very much in Beta and needs a through deep clean.

Writing of talk (and, let's be honest, eating of Italian food) took precedence, but expect cleaner code mid-October.

Order

change paths in config.py for system
ocr_and_image_processing_batch.py -- runs the OCR and pdfmining to get raw data from pages
pull_check_makesense.ipynb -- uses a current model to "guess" boxes, and prepares them to check with MakeSense.ai
process_annotations_and_generate_features_batch.py -- process annotations after downloading (as a csv file) from MakeSense.ai, ~~generate features as well if you want~~
generate_features_only.py -- generates features in batchs for certain feature sets, saves them in tfrecords format
mega_yolo_train_tfrecords.ipynb -- to be run on the cloud (set up for Google Collab), trains the model. Make sure to download weights if not doing all work on Collab.
post_processing_tfrecords.py -- post processes results of test dataset (in tfrecords format) from saved weights
explore_calculate_metrics.ipynb -- takes in post-process results, calculates various metrics, makes nice plots for talk

for pull_check_makesense.ipynb -- make sure there is an option when there is no model already run
for process_annotations_and_generate_features_batch.py -- maybe this should just be annotation generation? and then generate_features_only.py is the follow up?
what file is PDFmining in?

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
annotations		annotations
bin		bin
misc		misc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
annotation_utils.py		annotation_utils.py
config.py		config.py
explore_calculate_metrics.ipynb		explore_calculate_metrics.ipynb
explore_features.ipynb		explore_features.ipynb
explore_post_processing.ipynb		explore_post_processing.ipynb
feature_generation_utils.py		feature_generation_utils.py
general_utils.py		general_utils.py
generate_features_only.py		generate_features_only.py
mega_yolo_train.ipynb		mega_yolo_train.ipynb
mega_yolo_train_tfrecords.ipynb		mega_yolo_train_tfrecords.ipynb
mega_yolo_train_tfrecords_aug.ipynb		mega_yolo_train_tfrecords_aug.ipynb
mega_yolo_utils.py		mega_yolo_utils.py
metric_utils.py		metric_utils.py
ocr_and_image_processing_batch.py		ocr_and_image_processing_batch.py
ocr_and_image_processing_utils.py		ocr_and_image_processing_utils.py
post_processing_tfrecords.py		post_processing_tfrecords.py
post_processing_utils.py		post_processing_utils.py
process_annotations_and_generate_features_batch.py		process_annotations_and_generate_features_batch.py
pull_check_makesense.ipynb		pull_check_makesense.ipynb