I'm most interested in content rerouting between LLM and VLLM agens for automation possibilities. Using templates for each agent which is then filled in by another agents inputs seems really useful.
Interesting Solution to the Problem of Misguided Attention
So I've been fascinated by the problem of Misguided Attention for a few weeks. I am trying to build an inference algorithm to help LLMs address that issue; but in the process, I found a cool short-term fix I call "Mindful Attention" using just prompt-engineering.
Have you ever thought about how our brains filter reality through layers of past experiences, concepts, and mental images? For example, when you look at an oak tree, are you truly seeing that oak tree in all its unique details, or are you overlaying it with a generalized idea of "oak tree"? This phenomenon inspired the new approach.
LLMs often fall into a similar trap, hence the Misguided Attention problem. They process input not as it’s uniquely presented but through patterns and templates they’ve seen before. This leads to responses that can feel "off," like missing the point of a carefully crafted prompt or defaulting to familiar but irrelevant solutions.
I wanted to address this head-on by encouraging LLMs to slow down, focus, and engage directly with the input—free of assumptions. This is the core of the Mindful Attention Directive, a prompt designed to steer models away from over-generalization and back into the moment.
And if you want to try this mindful approach in action, check out the LLM I’ve set up for testing: https://hf.co/chat/assistant/677e7ebcb0f26b87340f032e. It works about 80% of the time to counteract these issues, and the results are pretty cool.
I'll add the Gist with the full prompt. I admit, it is quite verbose but it's the most effective one I have landed on yet. I am working on a smaller version that can be appended to any System Prompt to harness the Mindful Attention. Feel free to experiment to find a better version for the community!
That moment when you spend 5 days up babysitting trains, only for colab pro + to randomly disconnect the environment at every chance with 0 error indication of any kind (it just disconnects without an error). Nuke the session from the interface, but continue to eat my colab credits while it reports to wandb. 0 way of saving the models when this happens since it nukes the code preset up to auto-execute. And since the sessions 'exist' but also at the same time doesn't exist i cant close it. And have to wait till they auto timeout after 24hrs. Guess, i won't be using colab for 'quick' test trains anymore. Thanks google for scheming the very little model training budget i had for the month.