Code Golf

Part of the Google CTF 2019 (https://capturetheflag.withgoogle.com/)

Our task is to write a Haskell function g :: [String] -> String which takes a list of transparencies, i.e., Strings with characters and spaces and we have to stack them in such a way that no two characters overlap.

The maximum allowed length of the program (excluding imports) is 181 bytes.

These are the rules from the dask description:

For example, given the strings "ha  m" and "ck e":

If you overlay them:
    "ha  m" 
    "ck e" 

Shift them by an appropriate offset:
    "ha  m" 
      "ck e"

And combine them, you get "hackme".

For the data we're working with, the following rules seem to always hold:

1. The correct offset will never cause two characters to occupy the same column.
2. The correct offset will minimize the length of the final text after trimming
   leading and trailing spaces.
3. If there are multiple possible decryptions that follow the above rules, the
   lexicographically first one is correct.

To make matters worse, we only have a limited number of bytes in our payload:
your solution may contain at most 181 bytes of Haskell code.

We have the following modules which are already imported, meaning the import is not part of the character limit.

Prelude
Control.Applicative
Control.Arrow
Control.Monad
Data.Array
Data.Bits
Data.Bool
Data.Char
Data.Complex
Data.Dynamic
Data.Either
Data.Eq
Data.Fixed
Data.Function
Data.Graph
Data.Int
Data.Ix
Data.List
Data.Maybe
Data.Monoid
Data.Ord
Data.Ratio
Data.Tree
Data.Tuple
Data.Typeable
Data.Word
Debug.SimpleReflect
Text.Printf
ShowFun

"Uncompressed" solution

import Data.List

-- Check if two strings can be merged
-- zip stops at the end of shorter string which is fine since the rest of the longer string can obviously be merged
canMerge :: String -> String -> Bool
canMerge x = all (\(x,y) -> x==' ' || y==' ') . zip x

-- Take two transparencies and overlay them (assumes no two characters overlap)
merge :: String -> String -> String
merge []     y      = y
merge x      []     = x
merge (x:xs) (y:ys) = (if x == ' ' then y else x):(merge xs ys)

-- strip (since Data.Text is not imported)
-- this is necessary in case we get e.g. "zz  " and "aa" as input, since otherwise
-- "zzaa" would be preferred because it is shorter than "aazz  " although the
-- latter would result in "aazz" with the same length and being first in
-- lexicographical order
strip :: String -> String
strip = stripL . stripR
stripL :: String -> String
stripL = dropWhile (==' ')
stripR :: String -> String
stripR = reverse . stripL . reverse

-- move each string to be inserted space by space to the right until it can be
-- merged, insert as far left as possible
shiftAndMerge :: [String] -> String
shiftAndMerge = foldl (\r -> merge r . until (canMerge r) (' ':)) ""

-- the g function works as follows
-- 1. strip the input strings
-- 2. take all permutations of the input strings and for each one, beginning with
--    an empty string, insert every transparency in that order as far left as
--    possible
-- 3. sort in lexicographical order
-- 4. (stable) sort by length, if two strings have the same length they are still
--    sorted in lexicographical order from before
-- 5. return the first string
g :: [String] -> String
g = head . sortOn length . sort . map shiftAndMerge . permutations

Not that Data.List is manually imported here so the script can easily be run on my local machine, however, it is not counted when determining the program length.

Shortening `canMerge`

Starting with the uncompressed function:

-- Check if two strings can be merged
-- zip stops at the end of shorter string which is fine since the rest of the longer string can obviously be merged
canMerge :: String -> String -> Bool
canMerge x = all (\(x,y) -> x==' ' || y==' ') . zip x

Removing all comments, the signature and unnecessary whitespaces brings the function down to 44 bytes:

canMerge x=all(\(x,y)->x==' '||y==' ').zip x

Replacing the two comparisons with an elem call saves one additional byte (43 bytes). Note that the whitespace after elem can not be removed.

canMerge x=all(\(x,y)->elem ' '[x,y]).zip x

We can save another 7 bytes (plus 7 bytes for the call to canMerge) bringing it down to 36 bytes by giving it a shorter name, however, the function will be turned into a lambda function later anyways.

c x=all(\(x,y)->elem ' '[x,y]).zip x

Shortening `merge`

Starting with the uncompressed function:

-- Take two transparencies and overlay them (assumes no two characters overlap)
merge :: String -> String -> String
merge []     y      = y
merge x      []     = x
merge (x:xs) (y:ys) = (if x == ' ' then y else x):(merge xs ys)

As before, we start by simply removing "unnecessary stuff" and begin with 79 bytes, including two newlines:

merge[]y=y
merge x[]=x
merge(x:xs)(y:ys)=(if x==' 'then y else x):(merge xs ys)

We can remove the first two patterns by matching merge x y at the end, we will only get there if one of the strings is empty since otherwise the third pattern would be matched. We can simply concatenate x and y since the one of the strings is empty anyways, so the order does not matter. This brings it down to 71 bytes:

merge(x:xs)(y:ys)=(if x==' 'then y else x):(merge xs ys)
merge x y=x++y

The if-then-else can be replaced with a list built with list comprehension and a call to last. If x==' ' is true, y will be added to the list and thus selected as the last element, otherwise the list contains only x. This saves 5 bytes bringing the function to 66 bytes.

merge(x:xs)(y:ys)=(last$x:[y|x==' ']):(merge xs ys)
merge x y=x++y

Obviously, another 12 bytes (plus 4 for the call to merge) can be saved by renaming the function (54 bytes).

m(x:xs)(y:ys)=(last$x:[y|x==' ']):(m xs ys)
m x y=x++y

xs and ys can be renamed to a and b, saving 4 bytes (50 bytes):

m(x:a)(y:b)=(last$x:[y|x==' ']):(m a b)
m x y=x++y

We will apply further "optimizations" later when looking at the whole program.

Shortening `strip`

Starting with the uncompressed function:

-- strip (since Data.Text is not imported)
-- this is necessary in case we get e.g. "zz  " and "aa" as input, since otherwise "zzaa" would be preferred because it is shorter than "aazz  " although the latter would result in "aazz" with the same length and being first in lexicographical order
strip :: String -> String
strip = stripL . stripR
stripL :: String -> String
stripL = dropWhile (==' ')
stripR :: String -> String
stripR = reverse . stripL . reverse

By removing unnecessary stuff, renaming the functions to s,t and u and renaming xs to x we start with 44 bytes (4 bytes are saved for the call to strip):

s=t.u
t=dropWhile(==' ')
u=reverse.t.reverse

Since we never need to use stripL and stripR separately, we can remove u (which was stripR) and instead replace s with t.reverse.t.reverse (saving 4 bytes => 40 bytes)

s=t.reverse.t.reverse
t=dropWhile(==' ')

Again, we will apply further optimizations later when looking at the whole program.

Shortening `shiftAndMerge`

Since we renamed a few functions already the inital function looks different than in the first listing:

-- move each string to be inserted space by space to the right until it can be merged, insert as soon as possible
shiftAndMerge :: [String] -> String
shiftAndMerge = foldl (\r -> m r . until (c r) (' ':)) ""

Again, we remove unnecessary stuff and rename the function to v, which gives us 35 bytes:

v=foldl(\r->m r.until(c r)(' ':))""

We can save 1 byte by replacing foldl and the initial value "" with foldl1 which takes the first list element as initial value. However this means we can no longer work on empty lists (resulting in an error), but this case is not checked when submitting (34 bytes):

v=foldl1(\r->m r.until(c r)(' ':))

Again, more optimizations will follow when looking at the complete program.

Shortening `g`

The original:

g :: [String] -> String
g = head . sortOn length . sort . map v . permutations . map s

can be shortened to start with 50 bytes:

g=head.sortOn length.sort.map v.permutations.map s

We can replace length with 0<$, which replaces all elements in the list with a 0. This still works, since a shorter list with 0 is considered to be less than a longer list with 0, i.e., [0] < [0,0]. This saves 2 bytes (48 bytes):

g=head.sortOn(0<$).sort.map v.permutations.map s

Shortening the whole the program

After combining everything we did earlier we end up with a program which is 212 bytes in size (reminder: the goal is at most 181 bytes). The values from the previous sections do not add up because we need to consider additional newlines now.

The changes we apply in this section were ignored before since they change multiple functions at once.

The current program is:

c x=all(\(x,y)->elem ' '[x,y]).zip x
m(x:a)(y:b)=(last$x:[y|x==' ']):(m a b)
m x y=x++y
s=t.reverse.t.reverse
t=dropWhile(==' ')
v=foldl1(\r->m r.until(c r)(' ':))
g=head.sortOn(0<$).sort.map v.permutations.map s

We can turn c and v into lambda functions, saving 8 and 3 bytes respectively (201 bytes):

m(x:a)(y:b)=(last$x:[y|x==' ']):(m a b)
m x y=x++y
s=t.reverse.t.reverse
t=dropWhile(==' ')
g=head.sortOn(0<$).sort.map(foldl1(\r->m r.until(all(\(x,y)->elem ' '[x,y]).zip r)(' ':))).permutations.map s

Instead of defining s=t.reverse.t.reverse, which wastes a lot of bytes due to having reverse twice, we define s=t.reverse and replace the call to s with s.s.

This saves 10 bytes in the definition but costs 3 bytes when calling (s.s) since we need the parantheses, meaning we end up at 194 bytes:

m(x:a)(y:b)=(last$x:[y|x==' ']):(m a b)
m x y=x++y
s=t.reverse
t=dropWhile(==' ')
g=head.sortOn(0<$).sort.map(foldl1(\r->m r.until(all(\(x,y)->elem ' '[x,y]).zip r)(' ':))).permutations.map(s.s)

We now also no longer need the definition of t and can move it directly to the definition of s, saving 4 bytes (190 bytes):

m(x:a)(y:b)=(last$x:[y|x==' ']):(m a b)
m x y=x++y
s=dropWhile(==' ').reverse
g=head.sortOn(0<$).sort.map(foldl1(\r->m r.until(all(\(x,y)->elem ' '[x,y]).zip r)(' ':))).permutations.map(s.s)

The function m can be turned into an inline operator named !, saving 4 bytes due to removing spaces in the definition, but costing 1 byte because the call to m r becomes (!r) (187 bytes):

(x:a)!(y:b)=(last$x:[y|x==' ']):(a!b)
x!y=x++y
s=dropWhile(==' ').reverse
g=head.sortOn(0<$).sort.map(foldl1(\r->(!r).until(all(\(x,y)->elem ' '[x,y]).zip r)(' ':))).permutations.map(s.s)

The space character ' ' is used four times and costs 3 bytes each. Since none of the occurrences are in a pattern, we can introduce a variable w=' ' which costs 6 bytes including the newline, but each of the occurrences now only costs 1 byte, saving 2 bytes overall (185 bytes):

w=' '
(x:a)!(y:b)=(last$x:[y|x==w]):(a!b)
x!y=x++y
s=dropWhile(==w).reverse
g=head.sortOn(0<$).sort.map(foldl1(\r->(!r).until(all(\(x,y)->elem w[x,y]).zip r)(w:))).permutations.map(s.s)

Final Solution (`151` bytes)

At this point we were stuck four bytes short of the goal. However, as it turned out the checker service does not actually supply strings with leading or trailing spaces (woopsie ¯\_(ツ)_/¯), so we can throw the whole stripping logic away saving 34 bytes at once (note that we also replace w with ' ' again, which ends up being +/-0 bytes, but looks nicer) for a total of 151 bytes:

(x:a)!(y:b)=(last$x:[y|x==' ']):(a!b)
x!y=x++y
g=head.sortOn(0<$).sort.map(foldl1(\r->(!r).until(all(\(x,y)->elem ' '[x,y]).zip r)(' ':))).permutations

This yields the flag CTF{since-brevity-is-the-soul-of-wit-I-will-be-brief} when submitted. Yay!

It should however be noted that the solution does not yield correct results when called with e.g. ["z ","a"] and it crashes when called with []. The shortest solution we have that covers all edge cases is 186 bytes in length (since we need to replace foldl1 with foldl and "" again). If you have any idea how to get rid of the remaining 5 bytes, please let me know.

w=' '
(x:a)!(y:b)=(last$x:[y|x==w]):(a!b)
x!y=x++y
u=reverse.dropWhile(==w)
g=head.sortOn(0<$).sort.map(foldl(\r->(!r).until(all(\(x,y)->elem w[x,y]).zip r)(w:))"").permutations.map(u.u)

Update (2019-06-26)

As pointed out by @mcpower in their writeup and in issue #1, our solution fails to handle yet another special case, namely e.g. ["a a","b"], which should yield "a ba" according to the rules but yields "ab a" with our solution.

But they also came up with a better variant of the ! function, which simply returns max of the first two inputs. This works since any printable character will always be greater than ' '. Similarly, they also managed to shorten our all(\(x,y)->elem w[x,y]).zip r even further. This brings the "wrong" and "somewhat right" solutions down to 132 bytes and 168 bytes respectively.

However, while one would expect the "somewhat right" solution to be accepted as well (since even the "wrong" solution was accepted), the only difference being the stripping of leading/trailing spaces which the second rule requires, it actually causes the checker to fail.

@mcpower also tested whether the strings should be trimmed when determining the length and ordering them but not when returning them, but this failed as well.

At this point it's probably best to wait for Google to release the source code of the checker, as has been done for their CTFs in the past, but the second rule seems to be either misleading or there is something wrong with the checker.

Acknowledgements

Thanks to the authors of all the useful replies in this StackExchange thread: https://codegolf.stackexchange.com/questions/19255/tips-for-golfing-in-haskell

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!