GitHub - evancolvin/Benfords-Law: A python class that checks data against Benford's Law empirical distribution

Benfords-Law Repo

This is a repo for a project written in Python 2.7 that compares empirical data to Benford's Law.

We expect empirical data to generally follow a roughly log rule where the proportion of the first digits of the data follows P(d) = log(1 + 1/d) where the log is base 10 and d is the digit in question. We can, and do, use this in fraud detection because when people make up numbers they usually don't follow this pattern.

Goals:

Currently the library can take in a matrix of data and check the distribution of the first digits, but it could work better. Eventually, it will scan documents, ignoring dates and section headings, and comparing the digits to Benford's law. I also want to directly interface with Pandas data frames and other data formats.

I included an example, running the methods over the milk data set which is from Regression Analysis By Example, by Chatterjee and Hadi. The data is available here

Here's a picture of what the output currently looks like:

Where the curve shows the (smoothed) version of what we expect to see under Benfords Law and the bars are what we see in the actual data.

You can read more about the uses of Benford's Law here, and here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
BenfordsLaw.py		BenfordsLaw.py
README.md		README.md
benford_example.png		benford_example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benfords-Law Repo

Goals:

About

Releases

Packages

Languages

evancolvin/Benfords-Law

Folders and files

Latest commit

History

Repository files navigation

Benfords-Law Repo

Goals:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages