-
Notifications
You must be signed in to change notification settings - Fork 1
Description
When I calculate the change in relative frequency of a word over time with the two following methods, I get very different results (on the trend as well as on the level of frequencies). Do you know why? Is there a mistake in the coding of the aggregation in the second version? In the second version, I assume that the relative frequency per year are calculated as the average of the daily relative frequencies throughout that year? Thanks for your insights!
Version 1:
di <- dispersion("MIGPARL", query = '"(M|m)uslim.*"', cqp=TRUE, s_attribute = "year", freq = TRUE)
di[order(di$year)]
barplot(
height = di[["freq"]],
names.arg = di[["year"]],
ylim = c(0, 200)
)
Version 2:
dtm <- dispersion("MIGPARL", query = '"(M|m)uslim."', cqp=TRUE, s_attribute = "date", freq=TRUE)
dtm <- dtm[!is.na(as.Date(dtm[["date"]]))]
tsm <- xts(x = dtm[["freq"]], order.by = as.Date(dtm[["date"]]))
tsm_year <- aggregate(tsm, as.Date(sprintf("%s-01-01", gsub("^(\d{4})-.?$", "\1", index(tsm)))))
plot(as.xts(tsm_year))