Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test a leave-one-out, time-weighted, rolling spatial lag feature #343

Open
dfsnow opened this issue Feb 11, 2025 · 0 comments
Open

Test a leave-one-out, time-weighted, rolling spatial lag feature #343

dfsnow opened this issue Feb 11, 2025 · 0 comments
Labels
method ML technique or method change

Comments

@dfsnow
Copy link
Member

dfsnow commented Feb 11, 2025

We should test adding a spatial lag feature similar to the feature added to the condo model by ccao-data/model-condo-avm#101. This would be a leave-one-out, time-weighted, rolling spatial lag of sale prices. In plain English, for each sale, you'd take the average of nearby sales (within 500m) that occurred in the 5 years prior to that sale, weighted by time. The idea here is that sale prices are autocorrelative i.e. a sale price itself has some effect on future sale prices.

Constructing such a feature is straightforward but compute/memory intensive. You basically have two steps:

  1. Find all the sales within N meters of each PIN.
  2. For each PIN with a sale, take the weighted average of the surrounding PIN sales that occurred before the target sale.

The latter step is already demoed in the condo model. The former can be done using spatial dependency packages; here's some starter R code:

# Grab all PIN locations, regardless of sale status
spt_pin_all_dt <- assessment_data_clean %>%
  select(meta_pin, loc_longitude, loc_latitude) %>%
  filter(!is.na(loc_latitude) & !is.na(loc_longitude)) %>%
  distinct(meta_pin, .keep_all = TRUE) %>%
  sf::st_as_sf(coords = c("loc_longitude", "loc_latitude"), crs = 4326)

# For every PIN, find the PINs that are within 500m. This works but takes
# an hour or two to run. Outputs a list where each element is a list of index
# positions of the neighboring PIN
spt_pin_nb_list <- sfdep::st_dist_band(
  spt_pin_all_dt,
  lower = 0,
  upper = 0.5
)

# Pivot the nested list to a tibble. Each PIN has N rows, where N
# is the number of neighbors
spt_pin_adj_dt <- tibble(
  target_pin = spt_pin_all_dt$meta_pin,
  neighbor_pin = lapply(spt_pin_nb_list, \(x) spt_pin_all_dt$meta_pin[x])
) %>%
  unnest(neighbor_pin)

setDT(spt_pin_adj_dt, key = c("target_pin", "neighbor_pin"))
@dfsnow dfsnow added the method ML technique or method change label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
method ML technique or method change
Projects
None yet
Development

No branches or pull requests

1 participant