-
Notifications
You must be signed in to change notification settings - Fork 0
Description
For CWB subcorpora, the columns defined in types_src are not returned. I think that this is due to the hard-coded assignment of columns here:
Lines 857 to 867 in 9392164
| tab <- links[, | |
| list( | |
| cpos_left = expand_fun(.SD, direction = "left"), | |
| cpos_right = expand_fun(.SD, direction = "right"), | |
| dbpedia_uri = .SD[["dbpedia_uri"]], | |
| text = .SD[["text"]], | |
| types = .SD[["types"]] | |
| ), | |
| by = c("start", "end"), | |
| .SDcols = c("start", "end", "dbpedia_uri", "text", "types") | |
| ] |
If I understand this correctly, then two columns are added to the links object which is created earlier: cpos_left and cpos_right. All other columns remain unchanged. The other columns are hard-coded, however.
This results in two issues: Additional columns returned in links such as "DBpedia_type" or "Wikidata_type" are not added (or better: kept). Also, columns which might not be there cause issues. In recent version of dbpedia you can drop the types column. However, if you drop it in this scenario, this causes an error since the column is expected but not available.
I was wondering whether it is possible to simplify the code above to really only add cpos_left and cpos_right as follows:
links[, "cpos_left" := expand_fun(.SD, direction = "left"), by = c("start", "end"), .SDcols = c("start", "end")]
links[, "cpos_right" := expand_fun(.SD, direction = "right"), by = c("start", "end"), .SDcols = c("start", "end")]