Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function for obtaining / printing the text for a pair of duplicated blocks #44

Open
russHyde opened this issue Nov 6, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@russHyde
Copy link
Owner

russHyde commented Nov 6, 2019

For example,
print_dup(dup_df[1, ])

Or, if we change dupree to return a list of class Dups, wherein each entry is of class Dup; then
print(dups[[1]]) might be better syntax

@russHyde russHyde mentioned this issue Nov 11, 2019
11 tasks
@russHyde
Copy link
Owner Author

Note that the LCS algorithm in {stringdist} only computes the length of the LCS, it doesn't return the longest common subsequence. I can't find a good LCS implementation within CRAN (and don't want to depend on bioconductor packages since dupree is on CRAN now)

@russHyde
Copy link
Owner Author

? include an LCS implementation with dupree (can still use {stringdist} for computing the distances, but local LCS for computing the duplicated strings)

@russHyde
Copy link
Owner Author

[Could call to {textreuse} with the original code strings, rather than integer vectors] - but would require r-textreuse to be pushed to conda-forge for me to use this locally

@russHyde
Copy link
Owner Author

Just print the contents of the two (+) blocks for now. Can implement finding the actual LCS at a later stage

@russHyde russHyde mentioned this issue Mar 30, 2020
17 tasks
@russHyde russHyde added the enhancement New feature or request label Mar 30, 2020
@adrientaudiere
Copy link

Hi all,
I made a little function to view diff between each couple of code string. I hope this can help somebody. The diffr package is needed.

dup_diff <- function(dupree_res, min_score = 0.45, nlines = 10) {
  dup_misc_filter <- dupree_res$dups_df |>
    filter(score > min_score)

  res <- list()
  for (i in seq_len(nrow(dup_misc_filter))) {
    dir.create(paste0(tempdir(), "/", i), showWarnings = FALSE)
    writeLines(readLines(dup_misc_filter$file_a[i])[dup_misc_filter$line_a[i] + c(0:nlines)],
               paste0(tempdir(), "/", i, "/file_a"))
    writeLines(readLines(dup_misc_filter$file_b[i])[dup_misc_filter$line_b[i] + c(0:nlines)],
               paste0(tempdir(), "/", i, "/file_b"))
    res[[i]]  <- diffr::diffr(paste0(tempdir(), "/", i, "/file_a"),
                              paste0(tempdir(), "/", i, "/file_b"))

  }
  return(res)
}

For example,

example_file <- system.file("extdata", "duplicated.R", package = "dupree")
dup <- dupree(example_file, min_block_size = 10)
dup
dif <- dup_diff(dup)

@russHyde
Copy link
Owner Author

Neat. Thanks

@adrientaudiere
Copy link

I ameliorate a little bite the precedent script.

dup_diff <- function(dupree_res, min_score = 0.45, nlines = 10) {
  dup_misc_filter <- dupree_res$dups_df |>
    dplyr::filter(score > min_score)
  
  res <- list()
  for (i in seq_len(nrow(dup_misc_filter))) {
    dir.create(paste0(tempdir(), "/", i, showWarnings = FALSE))
    file_a_temp <- paste0(tempdir(), "/", i, "/a_", basename(dup_misc_filter$file_a[i]))
    file_b_temp <- paste0(tempdir(), "/", i, "/b_", basename(dup_misc_filter$file_b[i]))
    writeLines(readLines(dup_misc_filter$file_a[i])[dup_misc_filter$line_a[i] + c(0:nlines)],
               file_a_temp)
    writeLines(readLines(dup_misc_filter$file_b[i])[dup_misc_filter$line_b[i] + c(0:nlines)],
               file_b_temp)
    res[[i]]  <- diffr::diffr(file_a_temp, file_b_temp)
                              
  }
  return(res)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants