Skip to content

pinformatics/HetGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HetGen

A package to introduce errors into a dataset and create a dirty version of it, enabling us to benchmark record linkage frameworks at different error rates.

This open source code help ut to infuse different levels of data heterogeneity most often found in record linkage projects (duplicates, twins, suffixes, day-month swaps, first-last name swaps, nick names, last name change due to marriages, typos on names and dates) into any given data. The system allows the user to control the overall rate of heterogeneity in the data making it easy to run systematic controlled experiments.

About

A heterogeneity generator for benchmarking record linkage algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors