Skip to content

Commit 2e331e9

Browse files
author
Tyler Rinker
committed
* classification_template added to manually add a classification script
template. This template has a suggested **termco** based workflow that may be useful for classification projects. Updated inst/extra_docs/02_classification.R close #60 close #59
1 parent 2030a21 commit 2e331e9

15 files changed

+268
-62
lines changed

DESCRIPTION

+4-3
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,14 @@ Version: 0.5.4
44
Authors@R: c(person("Tyler", "Rinker", email = "[email protected]", role = c("aut", "cre")), person("Steven",
55
"Simpson", role = "ctb"))
66
Maintainer: Tyler Rinker <[email protected]>
7-
Description: A suite of functions used to count terms and substrings in strings. The tools can be used to build an expert rules, regular expression based text classification model.
7+
Description: A suite of functions used to count terms and substrings in strings. The tools can be used to build an
8+
expert rules, regular expression based text classification model.
89
Depends: R (>= 3.4.0)
910
Imports: ca, clipr, cowplot, data.table, dplyr (>= 0.4.3), ggplot2, ggraph, ggstance, gofastr (>= 0.3.0), graphics,
1011
grDevices, grid, gridExtra, igraph, jsonlite, methods, numform (>= 0.2.1), quanteda (>= 0.99.2), slam,
11-
SnowballC, stats, stringi, textshape (>= 1.5.2), tidyr, tm, utils, wordcloud
12+
SnowballC, stats, stringi, textclean, textshape (>= 1.5.2), tidyr, tm, utils, wordcloud
1213
Suggests: intergraph, knitr, lexicon, rmarkdown, qdapRegex, rgl, testthat
13-
Date: 2018-01-21
14+
Date: 2018-01-22
1415
License: MIT + file LICENSE
1516
LazyData: TRUE
1617
Roxygen: list(wrap = FALSE)

NAMESPACE

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ export(as_term_list)
5656
export(as_terms)
5757
export(assign_validation_task)
5858
export(classification_project)
59+
export(classification_template)
5960
export(classify)
6061
export(collapse_tags)
6162
export(colo)

NEWS

+4
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ NEW FEATURES
5959
* `read_term_list`/`write_term_list` added to aid in the reading in/writing out
6060
and formating of term list files.
6161

62+
* `classification_template` added to manually add a classification script
63+
template. This template has a suggested **termco** based workflow that may be
64+
useful for classification projects.
65+
6266
MINOR FEATURES
6367

6468
* `important_terms` picks up a plot method corresponding to the `frequent_terms`

NEWS.md

+4
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ termco 0.5.0 -
5959
* `read_term_list`/`write_term_list` added to aid in the reading in/writing out
6060
and formating of term list files.
6161

62+
* `classification_template` added to manually add a classification script
63+
template. This template has a suggested **termco** based workflow that may be
64+
useful for classification projects.
65+
6266
**MINOR FEATURES**
6367

6468
* `important_terms` picks up a plot method corresponding to the `frequent_terms`

R/classification_project.R

+2-2
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ classification_project <- function (path = "new", open = is.global(2)){
3939
cat(paste(rproj, collapse = "\n"), file = file.path(path, ".Rproj"))
4040

4141
cat(paste(dat_clean, collapse = "\n"), file = file.path(path, "scripts", "01_data_cleaning.R"))
42-
script <- system.file("extra_docs/02_classification.R", package = "termco")
43-
file.copy(script, file.path(path, "scripts"), TRUE, TRUE)
42+
classification_template(path = 'scripts/02_classification.R', file.ext = 'csv',
43+
categories.file = 'categories/categories.R')
4444

4545
verify <- all(c(dirs, ".Rproj") %in% dir(path, all.files = TRUE)) &&
4646
all(file.exists(file.path(path, c("categories/categories.R",

R/classification_template.R

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#' Generate a Basic Classification Analysis Template
2+
#'
3+
#' Generates a basic template for a classification using \pkg{termco} tools.
4+
#'
5+
#' @param path Path to classification template script.
6+
#' @param file.ext A supported file extension for the text data. This dictates
7+
#' the packages used in the template for reading in data. Currently,
8+
#' '.csv' (\pkg{readr})and '.fst' (\pkg{fst}) are supported. This can be
9+
#' manually changed after the template is created.
10+
#' @param categories.file A path to the categories (term list) file. Defaults
11+
#' to \file{'scripts/02_classification.R'}.
12+
#' @param verbose logical. If \code{TRUE} a message is printed if the template
13+
#' inclusion was successful.
14+
#' @param \ldots ignored.
15+
#' @export
16+
#' @examples
17+
#' \dontrun{
18+
#' classification_template()
19+
#' }
20+
classification_template <- function(
21+
path = 'scripts/02_classification.R', file.ext = 'fst',
22+
categories.file = 'categories/categories.R', verbose = TRUE, ...
23+
){
24+
25+
26+
if (dirname(path) != '.' && !dir.exists(dirname(path))) {
27+
dir.create(dirname(path), recursive = TRUE)
28+
}
29+
30+
if (path == Sys.getenv("R_HOME")) stop("path can not be `R_HOME`")
31+
if (file.exists(path)) {
32+
message(paste0("\"", path, "\" already exists:\nDo you want to overwrite?\n"))
33+
ans <- utils::menu(c("Yes", "No"))
34+
if (ans == "2") {
35+
stop("template write aborted")
36+
} else {
37+
unlink(path, recursive = TRUE, force = FALSE)
38+
}
39+
}
40+
41+
script <- system.file("extra_docs/02_classification.R", package = "termco")
42+
43+
read_package <- switch(file.ext,
44+
csv = 'readr',
45+
fst = 'fst',
46+
{
47+
warning('not a supported file.ext; either "csv" or "fst"\n Using \'csv\'; this can be manually changed after file is created')
48+
'csv'
49+
}
50+
)
51+
52+
53+
out <- textclean::mgsub(
54+
readLines(script),
55+
c('{{read_package}}', '{{read_file_type}}', '{{categories_file}}'),
56+
c(read_package, file.ext, categories.file)
57+
)
58+
59+
cat(out, file = path, sep = '\n')
60+
61+
62+
if (file.exists(path) && verbose) {
63+
message(sprintf('\'%s\' exists!\nLooks like everything went according to plan!', path))
64+
}
65+
66+
if (!file.exists(path)) {
67+
warning(sprintf('\'%s\' does not appear to exist\nLooks like something went wrong', path))
68+
}
69+
}

R/probe_colo_plot_list.R

+6-2
Original file line numberDiff line numberDiff line change
@@ -68,21 +68,25 @@ probe_colo_plot_list <- function(terms, data.name, copy2clip = getOption("termco
6868
#' \code{probe_colo_plot} - Make the plots of \code{probe_colo_plot_list}
6969
#' directly to an external \file{.pdf} file.
7070
#' @param data A vector of character strings.
71-
#' @param file A \file{.pdf} file to plot to.
71+
#' @param file A path for the \file{.pdf} file to plot to.
7272
#' @param width The width of the graphics region in inches.
7373
#' @param height The height of the graphics region in inches.
7474
#' @param \ldots Other arguments passed to frequent terms.
7575
#' @family probe functions
7676
#' @export
7777
#' @rdname probe_colo__plot
78-
probe_colo_plot <- function(terms, data, file = "Rplots.pdf", width = 5.5, height = 7, ...){
78+
probe_colo_plot <- function(terms, data,
79+
file = sprintf('%scolo_term_plots.pdf', ifelse(dir.exists('plots/'), 'plots', '')),
80+
width = 5.5, height = 7, ...){
7981

8082
terms <- stringi::stri_escape_unicode(terms)
8183

8284
grDevices::pdf(file=file, width = width, height = height)
85+
8386
for(i in terms){
8487
print(graphics::plot(search_term_collocations(data, i, ...)) + ggplot2::ggtitle(i))
8588
Sys.sleep(1)
8689
}
90+
8791
grDevices::dev.off()
8892
}

R/read_term_list.R

+22-7
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@
6767
#' )
6868
#'
6969
#' ## writing to the console (not that useful)
70-
#' write_term_list(discoure_markers)
70+
#' write_term_list(discoure_markers, path = '')
7171
#'
7272
#' trpl_list <- list(
7373
#' list(
@@ -81,7 +81,7 @@
8181
#' )
8282
#'
8383
#' ## writing to the console (not that useful)
84-
#' write_term_list(trpl_list)
84+
#' write_term_list(trpl_list, path = '')
8585
#'
8686
#' ## Writing to an external file
8787
#' temp <- tempdir()
@@ -101,7 +101,7 @@
101101
#' stringi::stri_unescape_unicode() %>%
102102
#' cat(file = 'testing.json')
103103
#' }
104-
read_term_list <- function(path, indices = NULL, term.list, ...){
104+
read_term_list <- function(path = 'categories/categories.R', indices = NULL, term.list, ...){
105105

106106
obj <- 'unspecified_termco_obj1234'
107107

@@ -198,7 +198,7 @@ read_term_list <- function(path, indices = NULL, term.list, ...){
198198

199199
#' @rdname read_term_list
200200
#' @export
201-
source_term_list <- function(path, indices = NULL, ...){
201+
source_term_list <- function(path = 'categories/categories.R', indices = NULL, ...){
202202

203203
read_term_list(path = path, collapse = FALSE, indices = indices, ...)
204204

@@ -344,7 +344,7 @@ search_open_or <- function(x, ...){
344344
}
345345

346346
## json write double backslashes
347-
write_model <- function(term.list, path, ...) {
347+
write_model <- function(term.list, path = 'categories/model_categories.json', ...) {
348348

349349
df <- textshape::tidy_list(lapply(term.list, textshape::tidy_list, 'tag', 'regex'), 'iteration')
350350

@@ -369,15 +369,30 @@ write_model <- function(term.list, path, ...) {
369369

370370
#' Title
371371
#'
372-
#' \code{write_term_list} - Write-out a term list out to a file.
372+
#' \code{write_term_list} - Write-out a term list to a file.
373373
#'
374374
#' @rdname read_term_list
375375
#' @export
376-
write_term_list <- function(term.list, path = "", ...){
376+
write_term_list <- function(term.list, path = 'categories/categories.R', ...){
377377

378378
stopifnot(is.list(term.list))
379379
stopifnot(!is.list(term.list[[1]][[1]]))
380380

381+
if (!is.null(path) && path != '' && dirname(path) != '.' && !dir.exists(dirname(path))) {
382+
dir.create(dirname(path), recursive = TRUE)
383+
}
384+
385+
if (!is.null(path) && path == Sys.getenv("R_HOME")) stop("path can not be `R_HOME`")
386+
if (!is.null(path) && file.exists(path)) {
387+
message(paste0("\"", path, "\" already exists:\nDo you want to overwrite?\n"))
388+
ans <- utils::menu(c("Yes", "No"))
389+
if (ans == "2") {
390+
stop("template write aborted")
391+
} else {
392+
unlink(path, recursive = TRUE, force = FALSE)
393+
}
394+
}
395+
381396
if (!is.list(term.list[[1]])) {
382397

383398
cat(sprintf(list1, paste(unlist(unname(Map(function(x, y){

R/term_list_template.R

+8-4
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,14 @@
1616
#' @export
1717
#' @examples
1818
#' cats <- c("Summons", "Justification", "Exclamation", "Empty")
19-
#' term_list_template(cats)
20-
#' term_list_template(cats, hierarchical = FALSE)
21-
term_list_template <- function(categories = NULL, path = NULL, hierarchical = TRUE,
22-
overwrite = FALSE, copy2clip = getOption("termco.copy2clip"), ...) {
19+
#' term_list_template(cats, path = '')
20+
#' term_list_template(cats, hierarchical = FALSE, path = '')
21+
term_list_template <- function(categories = NULL, path = 'categories/categories.R',
22+
hierarchical = TRUE, overwrite = FALSE, copy2clip = getOption("termco.copy2clip"), ...) {
23+
24+
if (!is.null(path) && path != '' && dirname(path) != '.' && !dir.exists(dirname(path))) {
25+
dir.create(dirname(path), recursive = TRUE)
26+
}
2327

2428
if (!is.null(path) && path == Sys.getenv("R_HOME")) stop("path can not be `R_HOME`")
2529
if (!is.null(path) && file.exists(path)) {

README.Rmd

+1
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ Most of the functions *count*, *search*, *plot* terms, and *covert* between outp
6363
| `match_word` | search | Extract words from a text matching a regular expression |
6464
| `search_term_collocations` | search | Wrapper for `search_term` + `frequent_terms` |
6565
| `classification_project` | modeling | Make a classification modeling project template |
66+
| `classification_template` | modeling | Make a classification analysis script template |
6667
| `as_dtm`/`as_tdm` | modeling | Coerce `term_count` object into `tm::DocumentTermMatrix`/`tm::TermDocumentMatrix` |
6768
| `split_data` | modeling | Split data into `train` & `test` sets |
6869
| `evaluate` | modeling | Check accuracy of model against human coder |

0 commit comments

Comments
 (0)