Ksenia Se
AI & ML interests
Recent Activity
Articles
Organizations
Kseniase's activity
So I've been fascinated by the problem of Misguided Attention for a few weeks. I am trying to build an inference algorithm to help LLMs address that issue; but in the process, I found a cool short-term fix I call "Mindful Attention" using just prompt-engineering.
Have you ever thought about how our brains filter reality through layers of past experiences, concepts, and mental images? For example, when you look at an oak tree, are you truly seeing that oak tree in all its unique details, or are you overlaying it with a generalized idea of "oak tree"? This phenomenon inspired the new approach.
LLMs often fall into a similar trap, hence the Misguided Attention problem. They process input not as it’s uniquely presented but through patterns and templates they’ve seen before. This leads to responses that can feel "off," like missing the point of a carefully crafted prompt or defaulting to familiar but irrelevant solutions.
I wanted to address this head-on by encouraging LLMs to slow down, focus, and engage directly with the input—free of assumptions. This is the core of the Mindful Attention Directive, a prompt designed to steer models away from over-generalization and back into the moment.
You can read more about the broader issue here: https://github.com/cpldcpu/MisguidedAttention
And if you want to try this mindful approach in action, check out the LLM I’ve set up for testing: https://hf.co/chat/assistant/677e7ebcb0f26b87340f032e. It works about 80% of the time to counteract these issues, and the results are pretty cool.
I'll add the Gist with the full prompt. I admit, it is quite verbose but it's the most effective one I have landed on yet. I am working on a smaller version that can be appended to any System Prompt to harness the Mindful Attention. Feel free to experiment to find a better version for the community!
Here is the Gist: https://gist.github.com/severian42/6dd96a94e546a38642278aeb4537cfb3
however cool your LLM is, without being agentic it can only go so far
enter smolagents: a new agent library by Hugging Face to make the LLM write code, do analysis and automate boring stuff!
Here's our blog for you to get started https://huggingface.co/blog/smolagents
Hugging Face's new agentic library 𝘀𝗺𝗼𝗹𝗮𝗴𝗲𝗻𝘁𝘀 has gathered nearly 4k stars 🤯
➡️ But we are just getting started on agents: so we are hiring an ML Engineer to join me and double down on this effort!
The plan is to build GUI agents: agents that can act on your computer with mouse & keyboard, like Claude Computer Use.
We will make it work better, and fully open. ✨
Sounds like something you'd like to do? Apply here 👉 https://apply.workable.com/huggingface/j/AF1D4E3FEB/
High-quality datasets, their size and relevance directly impact the effectiveness of fine-tuning and the models' real-world applications. Among the numerous datasets for different tasks, it can be challenging to choose the most comprehensive dataset that best suits your purposes.
So today, we invite you to explore top 10 free datasets on natural language processing and maths:
1. fka/awesome-chatgpt-prompts proposes a huge variety of prompts that can be used with ChatGPT. Over 700 models were trained on this dataset.
2. HuggingFaceFW/fineweb from Hugging Face includes 15T tokens of cleaned and deduplicated English web data. It’s suitable for LLM training, benchmarking, model validation.
3. HuggingFaceFW/fineweb-2 is an another version of FineWeb with high-quality pretraining data to over 1000 languages.
4. O1-OPEN/OpenO1-SFT with Chinese and English data can be used for Chain-of-Thought activation.
5. yahma/alpaca-cleaned is a curated version of the original Alpaca Dataset released by Stanford.
6. lmsys/lmsys-chat-1m with 1 million real-world conversations with 25 state-of-the-art LLMs offers diverse use cases, like content moderation, safety benchmarks, and training instruction-following models.
7. allenai/dolma from Allen AI includes 3T tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
Math datasets:
1. HuggingFaceTB/finemath consists of educational math content and has two versions: 34B tokens and 54B tokens.
2. amphora/QwQ-LongCoT-130K for training O1-like LLMs.
3. openai/gsm8k for training multi-step reasoning.
This year, we started our “AI Agents and Agentic Workflows” series (https://www.turingpost.com/t/AI-Agents) to explore everything about AI agents step by step: all the vocabulary, how they work, and how to build them.
The huge interest in this series and the large number of studies conducted on agents showed that it was one of the most popular and important themes of the year. In 2025, most likely, agents will reach new highs – we will be covering that for you. Now, let’s review the agentic systems that have emerged this year.
Here is a list of 15 agentic systems and frameworks of 2024:
1. GUI Agents: A Survey (2412.13501)
2. Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level (2411.03562)
3. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292)
4. MALT: Improving Reasoning with Multi-Agent LLM Training (2412.01928)
5. Agent S: An Open Agentic Framework that Uses Computers Like a Human (2410.08164)
6. Automated Design of Agentic Systems (2408.08435)
7. AgentInstruct: Toward Generative Teaching with Agentic Flows (2407.03502)
8. AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant (2410.18603)
9. WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents (2410.07484)
10. Generative Agent Simulations of 1,000 People (2411.10109)
11. DynaSaur: Large Language Agents Beyond Predefined Actions (2411.01747)
12. PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking (2410.12375)
13. Generative World Explorer (2411.11844)
14. Bel Esprit: Multi-Agent Framework for Building AI Model Pipelines (2412.14684)
15. AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions (2410.20424)
Thanks for reading Turing Post!
Subscribe to receive new posts straight into your inbox -> https://www.turingpost.com/subscribe
- There will be the first major public protest related to AI
- A big company will see its market cap divided by two or more because of AI
- At least 100,000 personal AI robots will be pre-ordered
- China will start to lead the AI race (as a consequence of leading the open-source AI race).
- There will be big breakthroughs in AI for biology and chemistry.
- We will begin to see the economic and employment growth potential of AI, with 15M AI builders on Hugging Face.
How my predictions for 2024 turned out:
- A hyped AI company will go bankrupt or get acquired for a ridiculously low price
✅ (Inflexion, AdeptAI,...)
- Open-source LLMs will reach the level of the best closed-source LLMs
✅ with QwQ and dozens of others
- Big breakthroughs in AI for video, time-series, biology and chemistry
✅ for video 🔴for time-series, biology and chemistry
- We will talk much more about the cost (monetary and environmental) of AI
✅Monetary 🔴Environmental (😢)
- A popular media will be mostly AI-generated
✅ with NotebookLM by Google
- 10 millions AI builders on Hugging Face leading to no increase of unemployment
🔜currently 7M of AI builders on Hugging Face
We also got Yoshua Bengio's story: "My own insight really became strong in the context of the machine translation task. Prior to our introduction of attention, we were using a recurrent network that read the whole input source language sequence and then generated the translated target language sequence. However, this is not at all how humans translate. Humans pay very particular attention to just one word or a few input words at a time, in their context, to decide on the next word (or few words) to generate to form the sequence of words in the translation. "
This is the paper he mentions as the one that influenced them: https://www.cs.toronto.edu/~hinton/absps/nips_eyebm.pdf
You can find his full reply here: https://www.turingpost.com/p/attention
Origin: First proposed in 2014 by @Dzmitry Bahdanau, @KyunghyunCho , and Yoshua Bengio in Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473) . Inspired by cognitive processes and later renamed from "RNNSearch."
Key Idea: A data-dependent weighted average for pooling and communication, enabling flexible and powerful neural network connections.
Breakthrough: Bahdanau's "soft search" mechanism (softmax + weighted averaging) solved encoder-decoder bottlenecks in machine translation.
Transformer Revolution: Attention Is All You Need (1706.03762) (2017) by @ashishvaswanigoogle et al. simplified architectures by stacking attention layers, introducing multi-headed attention and positional encodings.
Legacy: Attention replaced RNNs, driving modern AI systems like ChatGPT. It emerged independently but was influenced by contemporaneous work like Alex Graves’s Neural Turing Machines (1410.5401) and Jason Weston’s Memory Networks (1410.3916) .
Attention to history: Jürgen Schmidhuber claims his 1992 Fast Weight Programmers anticipated modern attention mechanisms. While conceptually similar, the term “attention” was absent, and there’s no evidence it influenced Bahdanau, Cho, and Bengio’s 2014 work. Paying attention (!) to history might have brought us to genAI earlier – but credit for the breakthrough still goes to Montreal.
Referenced Papers:
Attention Origin: Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Transformers: Attention Is All You Need (1706.03762)
Alex Graves' Work: Neural Turing Machines (1410.5401), Generating Sequences With Recurrent Neural Networks (1308.0850)
Jason Weston @spermwhale 's Memory Networks (1410.3916)
Sequence to Sequence Learning with Neural Networks (1409.3215) by Ilya Sutskever ( @ilyasut ), Oriol Vinyals, Quoc V. Le
Who else deserves recognition in this groundbreaking narrative of innovation? Let’s ensure every contributor gets the credit they deserve. Leave a comment below 👇🏻🤗
- Jürgen is always right!
- If Jürgen is wrong, see 1.
Origin: First proposed in 2014 by @Dzmitry Bahdanau, @KyunghyunCho , and Yoshua Bengio in Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473) . Inspired by cognitive processes and later renamed from "RNNSearch."
Key Idea: A data-dependent weighted average for pooling and communication, enabling flexible and powerful neural network connections.
Breakthrough: Bahdanau's "soft search" mechanism (softmax + weighted averaging) solved encoder-decoder bottlenecks in machine translation.
Transformer Revolution: Attention Is All You Need (1706.03762) (2017) by @ashishvaswanigoogle et al. simplified architectures by stacking attention layers, introducing multi-headed attention and positional encodings.
Legacy: Attention replaced RNNs, driving modern AI systems like ChatGPT. It emerged independently but was influenced by contemporaneous work like Alex Graves’s Neural Turing Machines (1410.5401) and Jason Weston’s Memory Networks (1410.3916) .
Attention to history: Jürgen Schmidhuber claims his 1992 Fast Weight Programmers anticipated modern attention mechanisms. While conceptually similar, the term “attention” was absent, and there’s no evidence it influenced Bahdanau, Cho, and Bengio’s 2014 work. Paying attention (!) to history might have brought us to genAI earlier – but credit for the breakthrough still goes to Montreal.
Referenced Papers:
Attention Origin: Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Transformers: Attention Is All You Need (1706.03762)
Alex Graves' Work: Neural Turing Machines (1410.5401), Generating Sequences With Recurrent Neural Networks (1308.0850)
Jason Weston @spermwhale 's Memory Networks (1410.3916)
Sequence to Sequence Learning with Neural Networks (1409.3215) by Ilya Sutskever ( @ilyasut ), Oriol Vinyals, Quoc V. Le
Who else deserves recognition in this groundbreaking narrative of innovation? Let’s ensure every contributor gets the credit they deserve. Leave a comment below 👇🏻🤗
The original post by Karpathy https://x.com/karpathy/status/1864023344435380613
Origin: First proposed in 2014 by @Dzmitry Bahdanau, @KyunghyunCho , and Yoshua Bengio in Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473) . Inspired by cognitive processes and later renamed from "RNNSearch."
Key Idea: A data-dependent weighted average for pooling and communication, enabling flexible and powerful neural network connections.
Breakthrough: Bahdanau's "soft search" mechanism (softmax + weighted averaging) solved encoder-decoder bottlenecks in machine translation.
Transformer Revolution: Attention Is All You Need (1706.03762) (2017) by @ashishvaswanigoogle et al. simplified architectures by stacking attention layers, introducing multi-headed attention and positional encodings.
Legacy: Attention replaced RNNs, driving modern AI systems like ChatGPT. It emerged independently but was influenced by contemporaneous work like Alex Graves’s Neural Turing Machines (1410.5401) and Jason Weston’s Memory Networks (1410.3916) .
Attention to history: Jürgen Schmidhuber claims his 1992 Fast Weight Programmers anticipated modern attention mechanisms. While conceptually similar, the term “attention” was absent, and there’s no evidence it influenced Bahdanau, Cho, and Bengio’s 2014 work. Paying attention (!) to history might have brought us to genAI earlier – but credit for the breakthrough still goes to Montreal.
Referenced Papers:
Attention Origin: Neural Machine Translation by Jointly Learning to Align and Translate (1409.0473)
Transformers: Attention Is All You Need (1706.03762)
Alex Graves' Work: Neural Turing Machines (1410.5401), Generating Sequences With Recurrent Neural Networks (1308.0850)
Jason Weston @spermwhale 's Memory Networks (1410.3916)
Sequence to Sequence Learning with Neural Networks (1409.3215) by Ilya Sutskever ( @ilyasut ), Oriol Vinyals, Quoc V. Le
Who else deserves recognition in this groundbreaking narrative of innovation? Let’s ensure every contributor gets the credit they deserve. Leave a comment below 👇🏻🤗