You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should test adding a spatial lag feature similar to the feature added to the condo model by ccao-data/model-condo-avm#101. This would be a leave-one-out, time-weighted, rolling spatial lag of sale prices. In plain English, for each sale, you'd take the average of nearby sales (within 500m) that occurred in the 5 years prior to that sale, weighted by time. The idea here is that sale prices are autocorrelative i.e. a sale price itself has some effect on future sale prices.
Constructing such a feature is straightforward but compute/memory intensive. You basically have two steps:
Find all the sales within N meters of each PIN.
For each PIN with a sale, take the weighted average of the surrounding PIN sales that occurred before the target sale.
The latter step is already demoed in the condo model. The former can be done using spatial dependency packages; here's some starter R code:
# Grab all PIN locations, regardless of sale statusspt_pin_all_dt<-assessment_data_clean %>%
select(meta_pin, loc_longitude, loc_latitude) %>%
filter(!is.na(loc_latitude) &!is.na(loc_longitude)) %>%
distinct(meta_pin, .keep_all=TRUE) %>%
sf::st_as_sf(coords= c("loc_longitude", "loc_latitude"), crs=4326)
# For every PIN, find the PINs that are within 500m. This works but takes# an hour or two to run. Outputs a list where each element is a list of index# positions of the neighboring PINspt_pin_nb_list<-sfdep::st_dist_band(
spt_pin_all_dt,
lower=0,
upper=0.5
)
# Pivot the nested list to a tibble. Each PIN has N rows, where N# is the number of neighborsspt_pin_adj_dt<- tibble(
target_pin=spt_pin_all_dt$meta_pin,
neighbor_pin= lapply(spt_pin_nb_list, \(x) spt_pin_all_dt$meta_pin[x])
) %>%
unnest(neighbor_pin)
setDT(spt_pin_adj_dt, key= c("target_pin", "neighbor_pin"))
The text was updated successfully, but these errors were encountered:
We should test adding a spatial lag feature similar to the feature added to the condo model by ccao-data/model-condo-avm#101. This would be a leave-one-out, time-weighted, rolling spatial lag of sale prices. In plain English, for each sale, you'd take the average of nearby sales (within 500m) that occurred in the 5 years prior to that sale, weighted by time. The idea here is that sale prices are autocorrelative i.e. a sale price itself has some effect on future sale prices.
Constructing such a feature is straightforward but compute/memory intensive. You basically have two steps:
The latter step is already demoed in the condo model. The former can be done using spatial dependency packages; here's some starter R code:
The text was updated successfully, but these errors were encountered: