-
Notifications
You must be signed in to change notification settings - Fork 72
General best-practices cleanup #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Avoid calling jq more than once per operation. To do this, we have `jq` generate assignments with eval-safe escaping setting shell variables. - Avoid passing anything but options through eval (keeping options themselves from being eval'd will be a compatibility-breaking change, and is the subject of cloud-gov#50). - Avoid using `echo` to handle non-constant data (which introduces substantial portability problems across different implementations of `sh`; see https://unix.stackexchange.com/a/65819/3113 and https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html). - Handle "null" return value from `aws s3api list-objects` gratefully (cloud-gov#49)
assets/check
Outdated
| payload=`cat` | ||
| bucket=$(echo "$payload" | jq -r '.source.bucket') | ||
| prefix="$(echo "$payload" | jq -r '.source.path // ""')" | ||
| eval "$(jq -r '.source | [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason to prefer using eval over the existing code is this:
Avoid calling jq more than once per operation. To do this, we have jq generate assignments with eval-safe escaping setting all required shell variables from a single invocation.
The code is definitely more efficient, but it's the "eval-safe escaping" part that worries me. Since there are numerous articles identifying the perils of eval.
I'm also a bit confused given #50. If that issue is saying that eval presents a shell-injection risk, which is my primary concern, then why refactor the code to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very short form of the argument is that jq's @sh is trustworthy because it doesn't try to be clever. :)
I've reported my share of shell injection bugs in code (mostly in Java) that tried to do eval-safe escaping and got it wrong (see f/e MSHARED-297 and PLXUTILS-161), but in all those cases the tools in question were trying to do the minimum quoting necessary, leaving trivial content unquoted or using double quotes instead of single quotes if there were literal single quotes in the string to be escaped, or otherwise attempting to do something interesting to generate "better" output.
The POSIX sh rules for single-quoted strings, though, are dirt simple: Absolutely everything is literal, including backslashes, newlines, and every other character until the next single quote; no mechanism to escape them is allowed (so one cannot have a single quote inside a single-quoted string). To insert a literal single quote character, jq exits the single-quoted string, adds a backslash-escaped single quote in an unquoted context, and then reenters the single-quoted context.
Dirt simple, next to impossible to get wrong as long as its authors continue to reject any PRs that would try to make @sh clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put it a bit differently: eval presents a shell injection risk whenever it's passed non-constant data that hasn't been through a correct implementation of eval-safe escaping.
In the PR here, the varname= parts are constant, and the values to the right of them go through jq's eval-safe escaping implementation, which (as discussed above) is implemented in a simple enough way to be easy to audit as trivially correct for all POSIX-compliant shells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said -- if y'all are willing to give up sh compatibility to switch to bash, there are options that become available to drop eval, avoid shell injection, and still run jq only once; something like:
declare -A source_vars=( )
while IFS='' read -r -d '' element; do
[[ $element = *=* ]] || continue
source_vars[${element%%=*}]=${element#*=}
done < <(jq -j '
def clean_nuls: . | sub("\u0000"; "<NUL>");
.source | to_entries[] | "\(.key|clean_nuls)=\(.value|clean_nuls)\u0000"
')after which we work with ${source_vars[bucket]} and the like. That way we import every defined variable, but unexpected values like PATH, LD_PRELOAD, &c can't be used to attack the system -- one needs to explicitly set AWS_ACCESS_KEY_ID=${source_vars[access_key_id]} to get variables out of the map into a context where they're meaningful to anything that doesn't know about the associative array and intentionally access it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation @charles-dyfis-net ! Ultimately, given that the only benefit of these changes is to eliminate duplicate calls to jq, I'm inclined not to introduce the use of eval. While the changes may be safe in their current form as you say, if jq changes it @sh behavior, that could change. Or, perhaps more likely, other maintainers of this code (myself included) who don't understand the risks of eval as well could introduce a shell-injection risk.
So I'd like to remove the changes to use eval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated per request. Let me know if you want the change and its revert squashed together.
I do plan on a separate PR to use xargs to remove the eval for options -- might come back with a concrete eval-free single-jq-invocation config-parsing implementation then.
|
BTW, one thing that may be notable in the recently added commit (returning to a separate jq call per variable) is the change from If we were specifying a specific shell (like bash), making assumptions about |
|
Approved. Thanks @charles-dyfis-net for the contribution! |
Changes proposed in this pull request:
jqgenerate assignments with eval-safe escaping setting all required shell variables from a single invocation.evalintroduces possibility of shell injection unnecessarily. #50). The prior logic runningeval aws s3 sync "s3://$bucket/$path" $dest $optionsgained none of the benefit from quotes arounds3://$bucket/$path, becauseevalitself concatenated the literal arguments (with syntactic quotes discarded from the quote-removal parsing stage) into a single string before starting the parsing process over from the beginning; the new logic no longer usesevalfor this entire command, but instead only usesevalwhen processing$optionsinto"$@".echoto handle non-constant data (which introduces substantial portability problems across different implementations ofsh; see https://unix.stackexchange.com/a/65819/3113 and https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html). Also avoidecho $variablewhere$variableis unquoted, which can result in values being glob-expanded before being passed toecho, as well as transforming characters in IFS to spaces and collapsing runs of multiple such characters to a single instance per each.aws s3api list-objectsgratefully (fixes Check fails if bucket is empty #49)"$0"inside"$(dirname "$0")"; the modern command substitution format used here introduces a new quoting context, so directory names with spaces (or an unexpectedIFSvalue) could resultdirnamebeing passed an unexpected number of arguments with the old code.Security considerations
Because we deliberately do not modify the behavior described in #50, the preexisting opportunity for shell injection via
$optionsremains intact. However, we do narrow the code to only beevaling$options, so it's no longer possible to perform shell injection via$bucketor$path.In every other respect, this PR works to reduce runtime ambiguity, and thus to also reduce attack surface.