You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using this tongue in cheek version of the classic river crossing puzzle as an ad hoc smoke test:
Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?
So, the only gotcha here is that "rabbit eats cacodemon", but that seems to be enough to confuse almost every model. The only ones that I've seen solve it are GPT-4, GPT-o1, and QwQ-32b. GPT-4 can even do it without CoT, impressively enough. Other models tend to come up with a wrong solution and declare that it is correct; if forced into a check-and-correct loop, they never terminate.
The text was updated successfully, but these errors were encountered:
I've been using this tongue in cheek version of the classic river crossing puzzle as an ad hoc smoke test:
So, the only gotcha here is that "rabbit eats cacodemon", but that seems to be enough to confuse almost every model. The only ones that I've seen solve it are GPT-4, GPT-o1, and QwQ-32b. GPT-4 can even do it without CoT, impressively enough. Other models tend to come up with a wrong solution and declare that it is correct; if forced into a check-and-correct loop, they never terminate.
The text was updated successfully, but these errors were encountered: