Skip to content

Conversation

@charles-dyfis-net
Copy link

@charles-dyfis-net charles-dyfis-net commented Dec 28, 2022

Changes proposed in this pull request:

  • Avoid calling jq more than once per operation. To do this, we have jq generate assignments with eval-safe escaping setting all required shell variables from a single invocation.
  • Avoid passing anything but options through eval (keeping options themselves from being eval'd will be a compatibility-breaking change, and is the subject of Use of eval introduces possibility of shell injection unnecessarily. #50). The prior logic running eval aws s3 sync "s3://$bucket/$path" $dest $options gained none of the benefit from quotes around s3://$bucket/$path, because eval itself concatenated the literal arguments (with syntactic quotes discarded from the quote-removal parsing stage) into a single string before starting the parsing process over from the beginning; the new logic no longer uses eval for this entire command, but instead only uses eval when processing $options into "$@".
  • Avoid using echo to handle non-constant data (which introduces substantial portability problems across different implementations of sh; see https://unix.stackexchange.com/a/65819/3113 and https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html). Also avoid echo $variable where $variable is unquoted, which can result in values being glob-expanded before being passed to echo, as well as transforming characters in IFS to spaces and collapsing runs of multiple such characters to a single instance per each.
  • Handle "null" return value from aws s3api list-objects gratefully (fixes Check fails if bucket is empty #49)
  • Quote "$0" inside "$(dirname "$0")"; the modern command substitution format used here introduces a new quoting context, so directory names with spaces (or an unexpected IFS value) could result dirname being passed an unexpected number of arguments with the old code.

Security considerations

Because we deliberately do not modify the behavior described in #50, the preexisting opportunity for shell injection via $options remains intact. However, we do narrow the code to only be evaling $options, so it's no longer possible to perform shell injection via $bucket or $path.

In every other respect, this PR works to reduce runtime ambiguity, and thus to also reduce attack surface.

- Avoid calling jq more than once per operation. To do this, we have `jq` generate assignments with eval-safe escaping setting shell variables.
- Avoid passing anything but options through eval (keeping options themselves from being eval'd will be a compatibility-breaking change, and is the subject of cloud-gov#50).
- Avoid using `echo` to handle non-constant data (which introduces substantial portability problems across different implementations of `sh`; see https://unix.stackexchange.com/a/65819/3113 and https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html).
- Handle "null" return value from `aws s3api list-objects` gratefully (cloud-gov#49)
assets/check Outdated
payload=`cat`
bucket=$(echo "$payload" | jq -r '.source.bucket')
prefix="$(echo "$payload" | jq -r '.source.path // ""')"
eval "$(jq -r '.source | [

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason to prefer using eval over the existing code is this:

Avoid calling jq more than once per operation. To do this, we have jq generate assignments with eval-safe escaping setting all required shell variables from a single invocation.

The code is definitely more efficient, but it's the "eval-safe escaping" part that worries me. Since there are numerous articles identifying the perils of eval.

I'm also a bit confused given #50. If that issue is saying that eval presents a shell-injection risk, which is my primary concern, then why refactor the code to use it?

Copy link
Author

@charles-dyfis-net charles-dyfis-net Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very short form of the argument is that jq's @sh is trustworthy because it doesn't try to be clever. :)

I've reported my share of shell injection bugs in code (mostly in Java) that tried to do eval-safe escaping and got it wrong (see f/e MSHARED-297 and PLXUTILS-161), but in all those cases the tools in question were trying to do the minimum quoting necessary, leaving trivial content unquoted or using double quotes instead of single quotes if there were literal single quotes in the string to be escaped, or otherwise attempting to do something interesting to generate "better" output.

The POSIX sh rules for single-quoted strings, though, are dirt simple: Absolutely everything is literal, including backslashes, newlines, and every other character until the next single quote; no mechanism to escape them is allowed (so one cannot have a single quote inside a single-quoted string). To insert a literal single quote character, jq exits the single-quoted string, adds a backslash-escaped single quote in an unquoted context, and then reenters the single-quoted context.

Dirt simple, next to impossible to get wrong as long as its authors continue to reject any PRs that would try to make @sh clever.

Copy link
Author

@charles-dyfis-net charles-dyfis-net Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it a bit differently: eval presents a shell injection risk whenever it's passed non-constant data that hasn't been through a correct implementation of eval-safe escaping.

In the PR here, the varname= parts are constant, and the values to the right of them go through jq's eval-safe escaping implementation, which (as discussed above) is implemented in a simple enough way to be easy to audit as trivially correct for all POSIX-compliant shells.

Copy link
Author

@charles-dyfis-net charles-dyfis-net Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said -- if y'all are willing to give up sh compatibility to switch to bash, there are options that become available to drop eval, avoid shell injection, and still run jq only once; something like:

declare -A source_vars=( )
while IFS='' read -r -d '' element; do
  [[ $element = *=* ]] || continue
  source_vars[${element%%=*}]=${element#*=}
done < <(jq -j '
  def clean_nuls: . | sub("\u0000"; "<NUL>");
  .source | to_entries[] | "\(.key|clean_nuls)=\(.value|clean_nuls)\u0000"
')

after which we work with ${source_vars[bucket]} and the like. That way we import every defined variable, but unexpected values like PATH, LD_PRELOAD, &c can't be used to attack the system -- one needs to explicitly set AWS_ACCESS_KEY_ID=${source_vars[access_key_id]} to get variables out of the map into a context where they're meaningful to anything that doesn't know about the associative array and intentionally access it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @charles-dyfis-net ! Ultimately, given that the only benefit of these changes is to eliminate duplicate calls to jq, I'm inclined not to introduce the use of eval. While the changes may be safe in their current form as you say, if jq changes it @sh behavior, that could change. Or, perhaps more likely, other maintainers of this code (myself included) who don't understand the risks of eval as well could introduce a shell-injection risk.

So I'd like to remove the changes to use eval

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per request. Let me know if you want the change and its revert squashed together.

I do plan on a separate PR to use xargs to remove the eval for options -- might come back with a concrete eval-free single-jq-invocation config-parsing implementation then.

@charles-dyfis-net
Copy link
Author

BTW, one thing that may be notable in the recently added commit (returning to a separate jq call per variable) is the change from echo to printf. For background on that, see the excellent answer by Stéphane Chazelas on Why is printf better than echo?, or the APPLICATION USAGE and RATIONALE sections of the POSIX standard for echo.

If we were specifying a specific shell (like bash), making assumptions about echo behavior would be slightly justifiable (though only slightly: even with bash, configuration parameters like xpg_echo modifying behaviors in the cases the standard describes as ambiguous can be set at compile time, or via environment variables at runtime, or via explicit runtime commands); but if we're using /bin/sh, best to avoid cases the POSIX sh standard describes as ambiguous altogether.

@markdboyd
Copy link

Approved. Thanks @charles-dyfis-net for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check fails if bucket is empty

2 participants