-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function for obtaining / printing the text for a pair of duplicated blocks #44
Comments
Note that the LCS algorithm in {stringdist} only computes the length of the LCS, it doesn't return the longest common subsequence. I can't find a good LCS implementation within CRAN (and don't want to depend on bioconductor packages since dupree is on CRAN now) |
? include an LCS implementation with dupree (can still use {stringdist} for computing the distances, but local LCS for computing the duplicated strings) |
[Could call to {textreuse} with the original code strings, rather than integer vectors] - but would require r-textreuse to be pushed to conda-forge for me to use this locally |
Just print the contents of the two (+) blocks for now. Can implement finding the actual LCS at a later stage |
Hi all,
For example,
|
Neat. Thanks |
I ameliorate a little bite the precedent script. dup_diff <- function(dupree_res, min_score = 0.45, nlines = 10) {
dup_misc_filter <- dupree_res$dups_df |>
dplyr::filter(score > min_score)
res <- list()
for (i in seq_len(nrow(dup_misc_filter))) {
dir.create(paste0(tempdir(), "/", i, showWarnings = FALSE))
file_a_temp <- paste0(tempdir(), "/", i, "/a_", basename(dup_misc_filter$file_a[i]))
file_b_temp <- paste0(tempdir(), "/", i, "/b_", basename(dup_misc_filter$file_b[i]))
writeLines(readLines(dup_misc_filter$file_a[i])[dup_misc_filter$line_a[i] + c(0:nlines)],
file_a_temp)
writeLines(readLines(dup_misc_filter$file_b[i])[dup_misc_filter$line_b[i] + c(0:nlines)],
file_b_temp)
res[[i]] <- diffr::diffr(file_a_temp, file_b_temp)
}
return(res)
} |
For example,
print_dup(dup_df[1, ])
Or, if we change
dupree
to return a list of classDups
, wherein each entry is of classDup
; thenprint(dups[[1]]) might be better syntax
The text was updated successfully, but these errors were encountered: