Skip to content

R package for demographers: survey-weighted summaries, flexible bucketing, fast Oaxaca–Blinder decompositions, scalable with DuckDB.

License

Notifications You must be signed in to change notification settings

lorae/demographr

Repository files navigation

🌎 demographR

An R package providing tools for demographers: data bucketing with lookup tables, aggregation, and validation, optimized for performance with databases like DuckDB.

📎 Quick Start

git clone https://github.com/lorae/demographr demographr

📋 Unit Testing

Tests are located in the tests/testthat folder. To run all tests:

library("devtools")
devtools::test()

🛠️ Contribution Guidelines

Variable naming conventions

1. Data Variables: In this R package, variables often refer to data that is either:

  • Stored in R memory as a tibble.
  • Stored on a DuckDB server as a pointer to a database table.

These two data types are handled differently, and failing to distinguish between them may lead to subtle bugs. To prevent this:

  • Variables referring to an in-memory tibble are suffixed with _tb.
  • Variables referring to an on-server database table are suffixed with _db.

Example:

library(duckdb)
library(dplyr)

# The name of `iris` data represented as a tibble is `iris_tb`
iris_tb <- tibble(iris)

# The name of `iris` data represented as a pointer to a database table is `iris_db`. But the name of the table itself, in the connection, is just `iris`.
con <- dbConnect(duckdb::duckdb(), ":memory:")
dbWriteTable(con, "iris", iris_tb, overwrite = TRUE)
iris_db <- tbl(con, "iris")
dbDisconnect(con)

2. Column name arguments: Many functions have arguments that are strings representing column names in the data. To clearly indicate that an argument represents a column name, we suffix it with _col.

Example:

library(dplyr)

# Define the function
calculate_group_average <- function(
  data, 
  group_col, 
  value_col
  ) {
  data |>
    group_by(.data[[group_col]]) |>
    summarize(average = mean(.data[[value_col]], na.rm = TRUE)) |>
    ungroup()
}

# Example usage
calculate_group_average(
  data = iris_tb,
  group_col = "Species",
  value_col = "Petal.Width"
)

📜 License

MIT License (see LICENSE file).

About

R package for demographers: survey-weighted summaries, flexible bucketing, fast Oaxaca–Blinder decompositions, scalable with DuckDB.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages