You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey EDIutils team! I had a conversation with Colin Smith and Greg Maurer recently about creating a make_query function to help make Solr queries for people with some R literacy but limited prior exposure to Solr. The hope is that this new function would make it easier for R users to make good use of EDIutils::search_data_packages.
I've taken a stab at this function and will attach the full code to this issue. Note that I also wrote two helper functions solr_wild and solrize to make the internal components of make_query as streamlined as possible. I'm definitely a novice to Solr queries so make_query may be missing crucial arguments but I think it's a reasonable starting point and is built to be semi-modular and could easily support additional arguments. All functions are written in base R (version 4.3.2).
Let me know if this doesn't work on your end and/or if you'd like me to make any changes before it could possibly be built into EDIutils. Thanks!
Function Demo Script
# Load needed libaries
library(EDIutils)
# Clear environment
rm(list = ls())
# Define helper function
## Swaps human equivalents of wildcards for Solr wildcard
solr_wild <- function(bit){
# Handle empty `bit`
if(is.null(bit) == TRUE){
# Replace with wildcard
bit_v2 <- "*"
}
# Handle English equivalents for wildcard
else if(length(bit) == 1){
# Replace allowed keywords with wildcard
bit_v2 <- gsub(pattern = "all|any", replacement = "*", x = bit)
}
# If neither condition is met, return whatever was originally supplied
else { bit_v2 <- bit }
# Return finished product
return(bit_v2) }
# Example(s)
solr_wild(bit = NULL)
solr_wild(bit = "any")
solr_wild(bit = "something else")
# Define helper function
## Parses English text into Solr syntax (i.e., right delimiters, etc.)
solrize <- function(bit){
# Replace spaces with hyphens
bit_v2 <- gsub(pattern = " ", replacement = "-", x = bit)
# If more than one value, handle that
if(length(bit_v2) > 1){
# Collapse with plus signs
bit_v3 <- paste0("(", paste0(bit_v2, collapse = "+"), ")")
} else { bit_v3 <- bit_v2 }
# Return finished bit
return(bit_v3) }
# Example(s)
solrize(bit = c("primary production", "plants"))
# Define function to generate query
make_query <- function(keywords = NULL, subjects = NULL, authors = NULL,
scopes = NULL, excl_scopes = NULL,
return_fields = "all", limit = 10){
## Error Checking ----
# Define supported return 'return_fields'
good_fields <- c("*", "all", "abstract", "begindate", "doi", "enddate", "funding", "geographicdescription", "id", "methods", "packageid", "pubdate", "responsibleParties", "scope", "site", "taxonomic", "title", "authors", "spatialCoverage", "sources", "keywords", "organizations", "singledates", "timescales")
# Error out for unsupported ones
if(all(return_fields %in% good_fields) != TRUE)
stop("Unrecognized return field(s): ",
paste(base::setdiff(x = return_fields, y = good_fields), collapse = "; "))
# Error out for non-numeric limit
if(is.numeric(limit) != TRUE){
message("`limit` must be numeric, coercing to 10")
limit <- 10 }
## Solr Query Construction ----
# Make start of query object
query_v0 <- "q="
# If keywords are provided:
### 1. Turn into Solr Syntax
solr_kw <- solrize(bit = solr_wild(bit = keywords))
### 2. Add to query
query_v1 <- paste0(query_v0, "keyword:", solr_kw)
# Handle authors
solr_aut <- solrize(bit = solr_wild(bit = authors))
query_v2 <- paste0(query_v1, "&fq=", "author:", solr_aut)
# Handle subjects
solr_sub <- solrize(bit = solr_wild(bit = subjects))
query_v3 <- paste0(query_v2, "&fq=", "subject:", solr_sub)
# Handle scopes
solr_scp <- solrize(bit = solr_wild(bit = scopes))
query_v4 <- paste0(query_v3, "&fq=", "scope:", solr_scp)
# EXCLUDED scopes
## Handled differently because don't want to swap `NULL` for wildcard
if(is.null(excl_scopes) != TRUE){
# Solr-ize
solr_excl_scp <- solrize(bit = excl_scopes)
# Add to query
query_v5 <- paste0(query_v4, "&fq=", "-scope:", solr_excl_scp)
# Or skip
} else { query_v5 <- query_v4 }
# Parse return fields
## Solr syntax for multiple entries differs here from other elements of query
solr_fl <- paste(solr_wild(bit = return_fields), collapse=",")
query_v6 <- paste0(query_v5, "&fl=", solr_fl)
# Finally, assemble full query with row limit
solr_query <- paste0(query_v6, "&rows=", limit)
# Return that to the user
return(solr_query) }
# Invoke function
( request <- make_query(keywords = "*",
scopes = "knb-lter-fce",
excl_scopes = c("ecotrends", "lter landsat"),
return_fields = c("title", "authors", "id", "doi"),
limit = 10) )
# Test assembled query
EDIutils::search_data_packages(query = request)
# Test use of `make_query` inside of `search_data_packages`
EDIutils::search_data_packages(query = make_query(excl_scopes = "knb-lter-fce",
return_fields = c("title", "id")))
The text was updated successfully, but these errors were encountered:
I just heard about the query function in the dataone package which seems like it could be a nice 'middle path' for constructing Solr queries (see here).
Users can create their own Solr queries (A) by hand/manually, (B) by supplying a named list that breaks queries into four chunks, or (C) by using something like the function I supplied above where each Solr parameter is mapped to a separate argument.
I'm biased but I think the mapping of each parameter to its own argument is novel enough (relative to dataone::query) that it still warrants inclusion as its own function but I wanted to point out that a similar function does already exist
Summary
Hey
EDIutils
team! I had a conversation with Colin Smith and Greg Maurer recently about creating amake_query
function to help make Solr queries for people with some R literacy but limited prior exposure to Solr. The hope is that this new function would make it easier for R users to make good use ofEDIutils::search_data_packages
.I've taken a stab at this function and will attach the full code to this issue. Note that I also wrote two helper functions
solr_wild
andsolrize
to make the internal components ofmake_query
as streamlined as possible. I'm definitely a novice to Solr queries somake_query
may be missing crucial arguments but I think it's a reasonable starting point and is built to be semi-modular and could easily support additional arguments. All functions are written in base R (version 4.3.2).Let me know if this doesn't work on your end and/or if you'd like me to make any changes before it could possibly be built into
EDIutils
. Thanks!Function Demo Script
The text was updated successfully, but these errors were encountered: