This tool scans files from a given root path to detect potential corruption or integrity issues. It supports multiple file types, including Git repositories, images, PDFs, documents, spreadsheets, archives, text files, scripts, and media files.
- Multi-threaded for efficiency
- Supports rerun of previous results
- Real-time CSV logging
- Safe handling of file paths in CSV
- Automatic detection of file types and associated integrity checks
- Code is provided as-is, no tests implemented, and no guarantee this works on another platform than MacOS
python3 -m files_integrity_check --path <root_path> --outpath <output_csv> [--ignore <patterns>] [--rerun <previous_csv>]
--path
: Root path to check (required)--outpath
: Path to the output CSV file (required)--ignore
: Space-separated list of patterns to ignore (optional)--rerun
: Path to a previous CSV file to rerun failed checks (optional)
python3 -m files_integrity_check --path /mnt/data --outpath results.csv --ignore '*.tmp' '*.log'
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Depending on the file types you want to check, you may need some external programs installed on your system. Below are the main dependencies and how to install them.
sudo apt update
sudo apt install ffmpeg git npm nodejs tsc make bash fish zsh inkscape
brew update
brew install ffmpeg git npm nodejs tsc make bash fish zsh inkscape
- Download and install the following programs:
- FFmpeg: ffmpeg.org
- Git: git-scm.com
- Node.js: nodejs.org
- Inkscape: inkscape.org
File type | Extensions | Supported | Python dependencies | External dependencies |
---|---|---|---|---|
Archives | .zip, .gzip, .gz, .tar, .tgz | Yes | None | None |
Binary Files | * | Yes | pefile (Windows) | otool (Mac), readelf (Linux) |
Code Files | .py, .js, .jsx, .ts, .tsx | Yes | esprima, fonttools, PyYAML, lxml | git, npm, node, tsc |
CSV/TSV | .csv, .tsv | Yes | None | None |
Documents | .docx, .xlsx, .odt, .ods, .odp, .doc, .xls, .xlsm, .xlsb | Yes | python-docx, openpyxl, odfpy | None |
Environment and Config | .ini, .cfg, .conf, .env, .env.*, .yml, .yaml, Makefile, toml | Yes | PyYAML | None |
Fonts | .ttf, .otf | Yes | fonttools | None |
Git Repos | .git | Yes | None | git |
HTML | .html, .htm | Yes | beautifulsoup4 | None |
Images | .jpg, .jpeg, .png, .gif, .bmp, .tiff, .ico, .webp, .psd, .svg, .ai, .esp | Yes | Pillow | Inkscape (.ai, .esp only) |
JSON | .json | Yes | None | None |
Markdown | .md | Yes | None | None |
Media | .mp3, .wav, .flac, .midi, .mp4, .mov, .avi, .mkv, .webm, .ogg, .opus, .m4a, .3gp, .aac, .aiff, .mpg, .mpeg, .wmv | Yes | None | FFmpeg |
Notebooks | .ipynb | Yes | nbformat | None |
Yes | PyPDF2 | None | ||
Shell Scripts | .sh, .bash, .zsh, .fish | Yes | None | bash, fish, zsh |
SQL | .sql | Yes | None | None |
Templates | .jinja, .tmpl, .j2 | Yes | jinja2 | None |
XML | .xml | Yes | lxml | None |
File type | Extensions | Supported | Python dependencies | External dependencies |
---|---|---|---|---|
3D Model | *.obj, *.fbx, *.stl, *.dae, *.gltf, *.glb, *.ply | No | None | None |
3D Project | *.blend, *.max, *.ma, *.mb, *.c4d | No | None | Blender, Autodesk Maya, 3ds Max, Cinema 4D |
ActionScript | .as | No | None | None |
Ada | .adb, .ads, .ada | No | None | None |
Audio Project | *.als, *.flp, *.ptx, *.rpp | No | None | Ableton Live, FL Studio, Pro Tools, Reaper |
C | .c, .h | No | None | None |
C++ | .cpp, .cc, .cxx, .hpp, .hh, .hxx | No | None | None |
C# | .cs | No | None | None |
CAD | *.dwg, *.dxf, *.step, *.stp, *.iges, *.igs | No | None | AutoCAD, FreeCAD |
Clojure | .clj, .cljs, .cljc | No | None | None |
Coffeescript | .coffee | No | None | None |
Dart | .dart | No | None | None |
Digital Audio Workstation (DAW) | *.logicx, *.aif, *.band | No | None | Logic Pro, GarageBand |
Elixir | .ex, .exs | No | None | None |
Erlang | .erl, .hrl | No | None | None |
Game Engine Project | *.unity, *.uproject, *.godot | No | None | Unity, Unreal Engine, Godot |
Go | .go | No | None | None |
Gradle | .gradle | No | None | None |
Groovy | .groovy | No | None | None |
Haskell | .hs | No | None | None |
Java | .java | No | None | None |
Julia | .jl | No | None | None |
Kotlin | .kt, .kts | No | None | None |
Liquid | .liquid | No | None | None |
Lisp | .lisp, .lsp | No | None | None |
Lua | .lua | No | None | None |
Motion Graphics/Animation | *.aep, *.aepx, *.prproj, *.drp | No | None | Adobe After Effects, Premiere Pro, DaVinci Resolve |
Mustache | .mustache | No | None | None |
Objective-C | .m | No | None | None |
Objective-C++ | .mm | No | None | None |
OCaml | .ml, .mli | No | None | None |
Pascal | .pas | No | None | None |
Perl | .pl, .pm | No | None | None |
PowerShell | .ps1 | No | None | None |
R | .r, .R | No | None | None |
RMarkdown | .rmd | No | None | None |
RestructuredText | .rst | No | None | None |
Scala | .scala | No | None | None |
Svelte | .svelte | No | None | None |
Swift | .swift | No | None | None |
TeX/LaTeX | .tex, .ltx, .sty | No | None | None |
Vector Graphics | *.ai, *.eps, *.svg | No | None | Adobe Illustrator, Inkscape |
Video Editing | *.veg, *.prproj, *.drp | No | None | Sony Vegas, Adobe Premiere Pro, DaVinci Resolve |
Visual Basic | .vb | No | None | None |
VFX Project | *.nk, *.hip | No | None | Nuke, Houdini |
Vue | .vue | No | None | None |
Web Design Project | *.xd, *.fig, *.sketch | No | None | Adobe XD, Figma, Sketch |
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.