Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

River crossing variations #8

Open
int19h opened this issue Nov 29, 2024 · 1 comment
Open

River crossing variations #8

int19h opened this issue Nov 29, 2024 · 1 comment

Comments

@int19h
Copy link

int19h commented Nov 29, 2024

I've been using this tongue in cheek version of the classic river crossing puzzle as an ad hoc smoke test:

Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?

So, the only gotcha here is that "rabbit eats cacodemon", but that seems to be enough to confuse almost every model. The only ones that I've seen solve it are GPT-4, GPT-o1, and QwQ-32b. GPT-4 can even do it without CoT, impressively enough. Other models tend to come up with a wrong solution and declare that it is correct; if forced into a check-and-correct loop, they never terminate.

@cpldcpu
Copy link
Owner

cpldcpu commented Nov 30, 2024

I love the setting! Nice one.

Indeed, claude figures out that this is similar to the river crossing puzzle, but then it failes to make the right associations.
grafik

QwQ had a nice start in assigning symbols to the entities and trying some combinations, but then concluded that there is no solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants