Skip to content

Difference in Dispersion (by year and date) #16

@svenjakopyciok

Description

@svenjakopyciok

When I calculate the change in relative frequency of a word over time with the two following methods, I get very different results (on the trend as well as on the level of frequencies). Do you know why? Is there a mistake in the coding of the aggregation in the second version? In the second version, I assume that the relative frequency per year are calculated as the average of the daily relative frequencies throughout that year? Thanks for your insights!

Version 1:
di <- dispersion("MIGPARL", query = '"(M|m)uslim.*"', cqp=TRUE, s_attribute = "year", freq = TRUE)
di[order(di$year)]
barplot(
height = di[["freq"]],
names.arg = di[["year"]],
ylim = c(0, 200)
)

Version 2:
dtm <- dispersion("MIGPARL", query = '"(M|m)uslim."', cqp=TRUE, s_attribute = "date", freq=TRUE)
dtm <- dtm[!is.na(as.Date(dtm[["date"]]))]
tsm <- xts(x = dtm[["freq"]], order.by = as.Date(dtm[["date"]]))
tsm_year <- aggregate(tsm, as.Date(sprintf("%s-01-01", gsub("^(\d{4})-.
?$", "\1", index(tsm)))))
plot(as.xts(tsm_year))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions