Skip to content

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)

License

Notifications You must be signed in to change notification settings

SalvatoreRa/tutorial

Repository files navigation

Tutorials

Tutorials on machine learning, artificial intelligence in general, and biomedical research

cancer cells

Photo by National Cancer Institute on Unsplash

 

I collected here a set of tutorials and articles (with complementary codes) about artificial intelligence, machine learning, and data science. I have divided this repository into different sections (each one covers a different macro area of data science). You will find tutorials, code, scripts, datasets, and a list of resources related to the different topics (links to articles, free books, free courses, libraries, and so on).

Here, you will find also the Jupiter Notebook for the tutorial I published on Medium. I suggest reading the tutorial and the companion tutorial code in the order provided in the table below. For practical reasons, I have divided some of the tutorials into more than one part (allowing me to concentrate on one of the tutorials on the theoretical part and the others about the programming). Tutorials dedicated only to the theory have not a linked Jupiter notebook.

Most of the code is written in Python, but you can find also some Excel files that I have created to make it easier to understand some of the concepts. I have also added R scripts (since it is widely used by statisticians, biologists, and so on).

Moreover, you may find here some Colab notebooks without a theoretical tutorial (yet). I decided to upload the code before I finished writing the theoretical part (this would be indicated). I am convinced that the code alone is already beneficial. I would successively publish on Medium the written article (with details and comments on the code).

You may write me with any requests, suggestions, and comments. If you find it useful please follow and/or share (a star is always really appreciated).

Medium account  

Index

This is the general index of this repository:

About me

I am Salvatore Raieli, a Senior Data Scientist in a pharmaceutical company. My work consists of applying machine learning and artificial intelligence in the drug discovery process. I have a PhD in immunology and I have years of experience in coding, machine learning, and bioinformatics. I have worked on different projects related to machine learning and biology (for work or for passion). I attended an MSc in Artificial Intelligence to dive inside the theory. I have always been passionate about artificial intelligence, and biology and understand how complex systems work.

I think that artificial intelligence will drive the new wave of innovation and it will revolutionize biology, medicine, and the pharma industry. I always thought that science should be more democratic and that sharing knowledge is fundamental for the improvement of science. For this reason, I have planned to write tutorials where I am trying to explain machine learning and artificial intelligence in the easiest way possible.

Medium account

 

Stay Updated with the most important news and research on machine learning and artificial intelligence, you can find them every week here:

ML news of the week

 

Back to General index -- Index of tutorials

How to use this repository

This repository is intended for those who want to learn artificial intelligence and machine learning in general. I have collected different scripts, additional functions, tutorials, and articles I have written that can be freely used on different topics. Moreover, you can find more than 30 datasets you can use for your projects

Back to General index -- Index of tutorials

Cite this repository

If this repository has been useful for your work, consider citing this repository:

@software{Raieli_Tutorial_and_articles_2024,
author = {Raieli, Salvatore},
license = {Apache-2.0},
month = mar,
title = {{Tutorial and articles on machine learning and artificial intelligence}},
url = {https://github.com/SalvatoreRa/tutorial},
version = {1.0},
year = {2024}
}

Back to General index -- Index of tutorials

What is new

  • Oct 25 - updated with all the latest articles
  • May 25 - 50 datasets available
  • Sep 24 - tutorial reorganization.

Back to General index -- Index of tutorials

Index of tutorials

Here is an index of the different sections and subsections:

Tutorials on machine learning

This series of tutorials is focused on classical machine learning (regression, classification, dimensional reduction, and so on). I will discuss the basics, the math behind models, and how to implement them.

Introduction to medical image analysis

Articles notebook description
Introduction to medical image analysis -- Brief introduction to medical image analysis
Introduction to point processing Jupiter Notebook Whether you are doing medical image analysis or you use Photoshop, you are using point preprocessing
Introduction to Thresholding Jupiter Notebook A simple but powerful system for segmenting images
A practical guide to neighborhood image processing Jupiter Notebook Love thy neighbors: How the neighbors are influencing a pixel
A practical guide to morphological image processing Jupiter Notebook simple but powerful operations to analyze images
Dividi et Impera: A Practical Guide to BLOB Analysis and Extraction with Python Jupiter Notebook Simple yet powerful techniques to extract objects.
Harnessing the power of colors in Python Jupiter Notebook Color images have more hidden information than you think
Image Segmentation with Simple and Elegant Methods Jupiter Notebook Why the need for a deep learning model with hundreds of layers? Sometimes, there are simpler and faster models.
A Guide to Geometric Transformation with Python Jupiter Notebook Why the need for Photoshop when you can have fun with Python

 

Back to General index -- Index of tutorials

Graph machine learning

Articles notebook description
Graph ML: A Gentle Introduction to Graphs -- A deep introduction to these mysterious creatures.
Graph ML: fantastic graphs and where to find them -- Why to use a graph? which application?
Graph ML: introduction to NetworkX Jupiter Notebook How to start with handle graph in Python using the most popular library
Graph ML: Introduction to Python iGraph Jupiter Notebook Python iGraph is a wide-use library to handle graphs. how do start using it? why?
Graph ML: Graph traversal algorithms in a nutshell Jupiter Notebook A quick glance at bread-first and depth-first search algorithms for graph machine learning
Graph ML: Graph Data Representation Jupiter Notebook how to represent graph data? how to store them? how to do in Python?
Graph ML: How Do you Visualize a Large network? Jupiter Notebook Seeing is understanding: How to visualize large networks
Trapped in the Net: Where is a Foundation Model for Graphs? -- Disconnected from the other modalities graphs wait for their AI revolution: is it coming?

Back to General index -- Index of tutorials

Tutorials on artificial intelligence

In this series of tutorials, I will focus on artificial intelligence (neural networks, convolutional neural networks, and many other related topics). I will discuss the basics, the math behind models, and how to implement them. I will use Keras and PyTorch

Back to General index -- Index of tutorials

Artificial intelligence's bases

Articles notebook description
Learning is a simple task, complex plans are doomed to fail -- How simplicity in model design and training fosters generalization and the phenomenon of grokking in neural networks
The Good, the Bad, and the Ugly: Memory for a Neural Network -- Memory can play tricks; to learn best it is not always good to memorize
Forever Learning: Why AI Struggles with Adapting to New Challenges -- Understanding the limits of deep learning and the quest for true continual adaptation
Learning to Learn: How AI and Humans Learn -- Understanding learning to create better AI and understand ourselves
Tensors: a Gentle Introduction PyTorch Code, Tensorflow Code, Excel What there are? Why do you care? The name is intimidating, but fear not them!
Grokking: Learning Is Generalization and Not Memorization -- Understanding how a neural network learns helps us to avoid that the model from forgetting what it learns
A fAIry tale of the Inductive Bias -- Do we need inductive bias? How simple models can reach the performance of complex models
Unsupervised data pruning: less data to learn better -- Not always more data is meaning a more accurate model, but how to choose your data?

Back to General index -- Index of tutorials

Tabular learning

Articles notebook description
Traditional ML Still Reigns: Why LLMs Struggle in Clinical Prediction? -- Clinical prediction is more than medical knowledge: An LLM may not be the solution for every task
Tabula Rasa: Why Do Tree-Based Algorithms Outperform Neural Networks -- Tree-based algorithms are the winner in tabular data: Why?
Tabula Rasa: How to save your network from the category drama -- Neural networks do not like categories but you have techniques to save your favorite model
Neural Ensemble: what’s Better than a Neural Network? A group of them -- Neural ensemble: how to combine different neural networks in a powerful model
Tabula rasa: Give your Neural Networks Rules, They Will Learn Better -- From great powers derive great responsibilities: regularization allows AI to exploit its power
Tabula rasa: take the best of trees and neural networks -- Hybrid ideas for complex data: how to join two powerful models in one
Tabula rasa: Could We Have a Transformer for Tabular Data -- We are using large language models for everything, so why not for tabular data?
Tabula Rasa: not enough data? Generate them! -- How you can apply generative AI to tabular data
Tabula Rasa: Fill in What Is Missing Jupiter Notebook - Scripts: 1, 2, 3 Missing values are a known problem, why and how we can solve it
Tabula Rasa: Large Language Models for Tabular Data -- Tabular data are everywhere, why and how you can use LLMs for them
Tabula Rasa: A Deep Dive on Kolmogorov-Arnold Networks (KANs) -- A Deep Dive into Next-Gen Neural Networks

Back to General index -- Index of tutorials

AI and science

Articles notebook description
On the Other Side of the Mirror: How Language Models Align with the Human Brain -- Exploring the Evolution of Linguistic Competence and Brain Alignment in Large Language Models
I, LLM: Mapping Cognitive Parallels in Humans and AI -- Exploring Specialization and Functionality in Neural and Artificial Minds
The Cybernetic Neuroscientist: Smarter Than Experts? -- Exploring How AI Outperforms Human Expertise in Predicting Neuroscience Breakthroughs
AI Planning or Serendipity? Where Do the Best Research Ideas Come From? -- Can AI Planning Replace Human Researchers in Generating Novel Ideas?
Charting the Linguistic Seas: Navigating the Uncharted Waters of Human Language with an LLM -- Exploring the Brain’s Language Networks with Spatially Organized AI
A Brave New World for Scientific Discovery: Are AI Research Ideas Better? -- Can AI Lead Scientific Discovery? Or it is just another uchronia?
DeepMind’s AlphaProteo: Revolutionizing Protein Design with Machine Learning -- Harnessing AI to Create High-Affinity Protein Binders in a Single Step
Can AI Replace Human Researchers -- The AI Scientist: Does Sakana New Method Mean Fully Automated Research?
Safekeep Science’s Future: Can LLMs Transform Peer Review? -- Peer review is today’s science core, but is flawed with bias and burdening researchers. Can we improve it?
Beyond AlphaFold: The Future Of LLM in Medicine -- AlphaFold leaves a complex legacy: What will be the future of LLM in biology and medicine?
How LLMs Can Fuel Gene Editing Revolution -- Gene editing could cure most diseases, and LLMs can make it a reality sooner
AI’s Emerging Role in Disease Detection from Human Speech -- Disease prediction from speech can be the next revolution in healthcare
Unlocking the Dance of Proteins: AlphaFold meets Diffusion -- AlphaFlow Makes Protein Structure PredictionFrom Static to Dynamic
Beyond Words: Unraveling Speech from Brain Waves with AI -- AI is capable of decoding speech from non-invasive brain recordings
Google Med-PaLM M: Towards the Medical AI Generalist -- Google unveils a multi-modal model capable of incredible skills
ClinicalGPT: the LLM clinician -- a new model using for medicine using a clever trick to be more factual correct
Google Med-PaLM 2: is AI ready for medical residency? -- Google's new model achieves impressive results in the medical domain
scGPT: When Transformers Meet Biology and Fall in Love -- Exploring the Potential of Generative Pre-Training for Single-Cell Sequencing and Analysis
PMC-LLaMA: Because Googling Symptoms is Not Enough -- A small model that can be your best friend in medical school (or on trivia night)
Looking into Your Eyes: How Google AI Model Can Predict Your Age from the Eye -- The new model can unlock secrets of aging by analyzing eye photos
Through the Looking Glass, and What Google find there in the eye -- Or How Google is Using Deep Learning to Diagnose Diseases in Eye Photos
Making Language Models Similar to the Human Brain -- There is still a gap between LMs and the human brain in NLP, inspiring AI to the latter could fill it
Google Med-PaLM: The AI Clinician -- Google's new model is trained to answer medical questions. How?
PCA: Bioinformatician’s Favorite Tool Can Be Misleading -- A new study assesses how a most used technique can be problematic
Stable diffusion and the brain: how AI can read our minds -- Researchers were able to reconstruct images using fMRI data
Stable diffusion to fill gaps in medical image data -- A new study shows that stable diffusion could help with medical image analysis and rare diseases. How?
Artificial intelligence to search for alien intelligence -- How SETI project is using AI to answer the question: are we alone?
AI enables designing new proteins from scratch -- How artificial intelligence can allow producing unseen proteins
This Is Your Brain On Code -- New research highlights what happens in the brain while coding
The decline of disruptive science -- We are publishing more than ever but we are now less innovative: why?
Twitter’s Acquisition Raises Red Flags for Scientific Community -- Why scientists and data scientists are concerned
Data sovereignty: sharing is not caring -- Researchers are urging more data transparency, is it right to grant always data access?
Meta’s ESMfold: the rival of AlpahFold2 -- Meta uses a new approach to predict over 600 million protein structures
Cancer Research Needs Better Data -- We have many open questions, and we need data to answer them
Code Reproducibility Crisis in Science And AI -- Saving AI and scientific research requires we share more
Nobel prize Cyberpunk -- A computational view of the most important prize and perspective on AI in scientific discovery
How AI could save a pillar of science -- Peer review is a human job, but we may need the aid of the machine
How Science Contribution Has Become a Toxic Environment -- How computer science has inherited the same mistakes as other disciplines
Machine learning: a friend or a foe for science? -- How machine learning is affecting science reproducibility and how to solve it
AlphaFold2 Year 1: Did It Change the World? -- DeepMind promised us a revolution. Did it happen?
The Curious Case of How MS-excel Was a Nightmare for Bioinformatics -- An example of how Ms-Excel can be deleterious in data science
Speaking the Language of Life: How AlphaFold2 and Co. Are Changing Biology -- AI is reshaping research in biology and opening new frontiers in therapy

 

Back to General index -- Index of tutorials

AI and art

Articles notebook description
OpenAI Sora: Welcome to a Simulated World -- A new text-to-video model shows astonishing capabilities but it is also terrifying experts
How AI is reading a forgotten history -- Ancient scrolls that contain lost literature have been read by AI
MobileDiffusion: Can we generate images on the phone? -- A small and fast model could be used to generate images on the device
Google UniTune: Text-driven Image Editing -- How to use words to modify your images
ControlNet: control your AI art generation -- A new model allows fine control and gets the maximum from stable diffusion
InstructPix2Pix: use text instructions to edit your images -- A new model that allows you to modify your images just by writing the editing instructions
Exploring the Wisdom of the Ages: Using AI art to Draw Philosopher Quotes -- Does an image worth thousands of words? or some words remain elusive?
Unleashing the Power of Generative AI: the Definitive List -- Exploring the latest advancements in AI technology and how they can benefit you
How AI reimages emotions -- Could AI transform in images concepts that are even hard to explain with words?
AI Reimagines Mythical Creatures -- A modern bestiary inspired by medieval ones.
Restore your images with AI -- how to easily restore images with AI
How AI Could Help Preserve Art -- Art masterpieces are a risk at any time; AI and new technologies can give a hand
AI reimagines the world’s 20 most beautiful words -- How to translate words that cannot be translated?
Reimagining The Little Prince with AI -- How AI can reimagine the little prince’s characters from their descriptions
Meta’s new model can turn text prompt into videos -- Make-A-Vide, a new break-through in generative art
Blending the power of AI with the delicacy of poetry -- AI models are now able to generate images from text, what if we furnish them with the words of great poets? A dreamy trip between poetry and AI.

 

Back to General index -- Index of tutorials

AI and Climate change

Articles notebook description
Generative AI Fuels Climate Change -- What is the associated carbon dioxide of your favorite model?
How artificial intelligence could save the Amazon rainforest -- Amazonia is at risk and AI could help preserve it
How AI could fuel global warming -- New large models are energy intensive. How much CO2 is needed for their training?
Machine learning to tackle climate change -- How AI could help against global warming and save the world from humans
Robotics Join Machine Learning for an Electric Future -- How robotics and AI can speed energy transition and reduce emissions

 

Back to General index -- Index of tutorials

Natural Language Processing and LLMs

Articles notebook description
Too Many Thoughts for Nothing: Can Large Reasoning Models Really Reason? -- Going deep into the Capabilities, Limits, and Failure Modes of Thought-Generating AI
OpenAI Thinks Overconfidence is LLM’s Hallucination Cause -- The new recipe: reward humility, curb hallucinations.
The Socratic AI: Knows Others, But Not Itself -- How Language Models Excel at Prediction but Fail at Introspection
To Know or To Do? Your LLM Can’t Have It All (Yet) -- When Bigger Isn’t Always Better for Every Skill
Creativity in LLMs: Optimizing for Diversity and Uniqueness -- Creative writing does not provide a single “gold” answer but allows many valid answers. In short, no, LLMs are capable of generating text but lack creativity.
Knowing Isn’t Doing: Teach Reasoning, Not Facts, to Your LLM -- Decoupling Knowledge and Cognition for Efficient Domain Reasoning
Can Machines Dream? On the Creativity of Large Language Models -- Exploring the Role of Hallucinations, Dependencies, and Imagination in AI Creativity
Do You Know Yourself, ChatGPT? Can You Explain Your Behavior? -- Exploring the Spontaneous Articulation of Implicit Behaviors in Large Language Models
Follow the Ants, They Know the Path: Enhancing LLM Reasoning with ACO-ToT -- Harnessing Swarm Intelligence and Tree of Thought Optimization to Unlock Advanced AI Reasoning
The LLMs’ Dilemma: Thinking Too Much OR Too Little? -- Exploring the fine line between deep reasoning and computational overkill in large language models.
How Much Data Does ChatGPT Need to Reason? Less Than You Think -- Challenging the Big Data Myth: How AI Achieves Complex Reasoning with Surprisingly Few Examples, and How Does It Work
Adapat to Survive: LLMs Meet Evolution -- Evolving Language Models for Better Adaptability, Accuracy, and Performance
Only the Beginning Matters: How the LLM Decides Where to Focus Attention -- Understanding How the First Token Shapes LLM’s Focus and Responses
What if Hallucination Is a Spark of Creativity? Harnessing LLM Flaws for Drug Discovery -- Exploring the Unexpected Potential of AI Hallucinations to Revolutionize Drug Development
Can I Really Trust You, ChatGPT? Bridging AI Confidence and Human Understanding -- Exploring the Calibration and Communication Gaps Shaping Trust in AI-Driven Decisions
The Dream Machine: Decoding Why LLMs Hallucinate Reality -- HALoGEN: Benchmarking and Verifying Hallucinations in Generative Language Models
How Far Is AI from Human Intelligence? -- The LLM revolution has led to various speculations, but besides marketing and fears, how close are we?
Know Yourself: How Much AI is Aware of Itself? -- Exploring Situational Awareness in LLMs through Behavioral Testing and Benchmarking
A Memory for All Transformers: Sharing to Perform Better -- Unlocking Collective Intelligence: How Shared Memory Enhances Transformer Efficiency and Performance
From Solution to Problem: The Reverse Path to Smarter AI -- Reverse reasoning as a trick to better reasoning in LLMs
You’re Not a Writer, ChatGPT — But You Sound Like One. -- The AI That Dreamed of Being Hemingway but Found Itself an Echo
The Art of LLM Bonsai: How to Make Your LLM Small and Still Beautiful -- Mastering the Balance Between Efficiency and Accuracy in LLM Quantization
Open the Artificial Brain: Sparse Autoencoders for LLM Inspection -- A deep dive into LLM visualization and interpretation using sparse autoencoders
Teach What You Know, Learn What Is Hard to Master -- Adaptive Knowledge Distillation for Efficient Learning from Large Language Models
The Savant Syndrome: Is Pattern Recognition Equivalent to Intelligence? -- Exploring the limits of artificial intelligence: why mastering patterns may not equal genuine reasoning
What if LLMs Are Better Than We Think? Or Is It Our Judgement That’s Flawed? -- A Study of Label Errors and Their Impact on LLM Performance Evaluations
Believe In Yourself: Do LLMs Internally Know What Is True? -- Leveraging Internal Representations to Detect and Understand Errors in Large Language Models
Taming the Attention Hydra: Is Too Much Attention Slowing Down Transformers -- Pruning Attention Layers to Boost Transformer Efficiency Without Performance Loss
Speak About Yourself: Using SAEs and LLMs to Decode the Inner Workings of LLMs -- How Sparse Autoencoders and Language Models Collaborate to Make Complex Neural Activations Understandable
Through the Uncanny Mirror: Do LLMs Remember Like the Human Mind? -- Exploring the Eerie Parallels and Profound Differences Between AI and Human Memory
Lie to Me: Why Large Language Models Are Structural Liars -- Unveiling the Inherent Hallucinations and Limitations of AI-Language Models
How the LLM Got Lost in the Network and Discovered Graph Reasoning -- Enhancing large language models: A journey through graph reasoning and instruction-tuning
AI Emergent Properties: What Makes AI Suddenly Learn New Tricks -- The Critical Moment: When and Why AI Learns New Abilities
Strength in Weakness: How ‘Weak’ Models Can Be a Better Teacher than Large LLMs -- Teaching is diversity and small LLM sometimes can grant more
From Syntax to Semantics: How Code Turns LLMs into Better Models -- Exploring the Transformative Impact of Code Data on LLM Performance Across Diverse Tasks
Short and Sweet: Enhancing LLM Performance with Constrained Chain-of-Thought -- Sometimes few words are enough: reducing output length for increasing accuracy
To CoT or Not to CoT: Do LLMs Really Need Chain-of-Thought? -- Looking for Reasoning in LLMs: Is Chain-of-Thought Really the Key to Smarter AI?
AI Hallucinations: Can Memory Hold the Answer? -- Exploring How Memory Mechanisms Can Mitigate Hallucinations in Large Language Models
Can Generative AI Lead to AI Collapse? -- AI eating its own tail: the risk of model collapse in generative systems
Beyond Human Feedback: How to Teach a Genial AI Student -- New Approaches for Guiding AI Evolution Beyond Human Oversight
Expanding Language, Expanding Thought: Vocabulary Size in LLM Scaling -- Optimizing the LLM Vocabulary to Unlock Enhanced Performance and Cognitive Potential
Navigating the Seas of Reason: A Geometric Odyssey to Enhance LLM Reasoning Capabilities -- Exploring the Depths of Self-Attention Graphs and Intrinsic Dimensions in Large Language Models
Chat Quijote and the Windmills: Navigating AI Hallucinations on the Path to Accuracy -- Strategies and Tools for Enhancing Reliability in Large Language Models
Is LLM Performance Predetermined by Their Genetic Code? -- Exploring phylogenetic algorithms to predict the future of large language models
Are Long-Context LLMs Truly Revolutionary? -- Assessing the Impact and Potential of Long-Context Language Models
Can LLMs Truly Learn to Reason Implicitly? -- Unraveling the Mechanisms Behind Grokking and Systematic Generalization in LLMs
An LLM Student’s Handbook: Mastering the Art of Learning and Retaining Knowledge -- Learning and Forgetting: How to Improve the Balance
Maybe GPT Isn’t the Best: BERTs Can Master Generative In-Context Learning -- Challenging AI Paradigms with DeBERTa’s Surprising Capabilities
Clear Waters: What an LLM Thinks Under the Surface -- Anthropic’s Take at Decoding Abstract Features in Large Language Models
Can Transformer Substitute Graph Neural Networks? -- Are transformers able to do graph reasoning and to which extent?
Can a LLM Really Learn New Things -- The Double-Edged Sword of Fine-Tuning Large Language Models
The AI Student Dilemma: Trust Yourself Or The Book? -- LLMs have to decide if trust their knowledge or additional context. What will they choose?
When More is More? When For an LLM is Enough? -- In-context length is the LLM’s secret weapon, but with long-context is all changing
Infini-attention: Can we Really have an Infinite Context Length? -- Google believes that can we have an LLM with an infinite context length
Crossing Boundaries or Building Walls? The Declining Interdisciplinarity of NLP -- In a deluge of information, research is becoming more and more isolated, and this is a problem
You Know Nothing, ChatGPT. How Much Does Your LLM Know? -- Knowledge is power, but how much an LLM can know, and is it enough?
LLM redundancy? It is Time for a Massive Layoff of Layers -- Almost half of a model’s layers are useless, can we get rid of them? How and why?
Do Really Long-Context LLMs Exist -- Long-context LLMs are the topic of the moment, but beyond companies' claims, it is true?
Think, Then Speak: How Researchers Gave AI an Inner Monologue -- QuietStar is a new promising approach for LLM reasoning
The AI worm and the LLM leaf -- New research warns how a LLM can be poisoned and spread around
Indirect Reasoning for LLMs: Not Always There is a Direct Way to the Answer -- Contrapositive and Contradiction for Automated Reasoning can help your model find the right answering
A Requiem for the Transformer? -- Will be the transformer the model leading us to artificial general intelligence? Or will be replaced?
Teaching is Hard: How to Train Small Models and Outperforming Large Counterparts -- Distilling the knowledge of a large model is complex but a new method shows incredible performances
Order Matters: How AI Struggles with the Reverse -- How and why does the reversal curse impact the large language models
Prompt Engineering to Leverage In-Context Learning in Large Language Models -- How to modify your text prompt to obtain the best from an LLM without training
All You Need to Know about In-Context Learning -- What is and how does it work what makes Large Language Models so powerful
Speak to me: How many words a model is reading -- Why and how to overcome the inner limit of a Large Language Model
The AI college student goes back to the bench -- How LLM can solve college exams and why this is important
Can we detect AI-generated text? -- Watermarking could be the solution for detecting it
Say Once! Repeating Words Is Not Helping AI -- How and why is repeating tokens harming LLMs? Why is this a problem?
Is AI funny? Maybe, a Bit -- Why AI is still struggling with humor and why this an important step
The imitation game: Taming the gap between open source and proprietary models -- Can imitation models reach the performance of proprietary models like ChatGPT?
Human-Centered Loss Functions: Not All the Risks Are the Same -- Aligning large language models with human behavior in uncertain futures
SwitchHead: Be Faster To Catch the Prey -- How MoE applied to self-attention can make your model faster and performing
Make it simple! Can we have simple models for complex tasks? -- Can we simplify the current architectures without losing performance?
Scaling Isn’t Everything: How Bigger Models Fail Harder -- Are Large Language Models really understanding programming languages?
Emergent Abilities in AI: Are We Chasing a Myth? -- Changing Perspective on Large Language Models emerging properties
Welcome Back 80s: Transformers Could Be Blown Away by Convolution -- The Hyena model shows how convolution could be faster than self-attention
Speak Only About What You Have Read: Can LLMs Generalize Beyond Their Pretraining Data? -- Unveiling the Limits and Wonders of In-Context Learning in Large Language Models

 

Back to General index -- Index of tutorials

RAG and agents

Articles notebook description
The Limits of Embeddings: Why One Vector Can’t Fit All Queries -- Exploring the theoretical and practical limits of embedding vector retrieval
AI Agent, You’re Better Off Alone. Debate Is a Trap -- Why adding more agents can reduce accuracy, and how to avoid the trap
Apes Strong Together, AI Agents Not: Why Multi-Agent Systems Fail -- A Taxonomy of Failure Modes Across 150 Tasks and Five MAS Frameworks
Beyond Text: Navigating Toward a Multimodal RAG Horizon -- Harnessing the Power of Videos and Multimodal Integration for Next-Generation Retrieval-Augmented Generation
You Cache Only Once: Cache-Augmented Generation (CAG) Instead Of RAG -- Streamlining Knowledge Tasks with Cache-Augmented Generation: A Simpler Alternative to Retrieval-Based Approaches
Do Not Flip a Coin: When to Use RAG or Long Context LLMs -- Understanding the Trade-offs and Best Practices for Optimizing LLMs with External Knowledge Sources
Context vs. Prior Knowledge: How to Modify LLM Behavior -- Unveiling the Mechanism Behind Controlling Sensitivity in Language Models
Neighbors Count: Boosting Document Embeddings with Contextual Encoding -- Harnessing Neighboring Documents to Elevate Retrieval Accuracy through Context-Aware Embeddings
AI Search Engine: Finding Ariadne’s Thread or Losing the Way -- Exploring the Pathways and Pitfalls of Multimodal AI Search with Large Language Models
Sometimes Noise is Music: How Beneficial Noise Can Improve Your RAG -- Unveiling the Dual Nature of Noise in Retrieval-Augmented Generation
J’accuse! The Unjust Demise of RAG in Favor of Long-Context LLMs: A Rebuttal -- Reassessing Retrieval-Augmented Generation in the Age of Long-Context Models
The Convergence of Graph and Vector RAGs: A New Era in Information Retrieval -- Harnessing the Power of Hybrid Models to Transform AI-Driven Knowledge Systems
Knowledge is Nothing Without Reasoning: Unlocking the Full Potential of RAG through Self-Reasoning -- Enhancing Reliability and Traceability in Retrieval-Augmented Generative Models
Balancing Cost and Performance: A Comparative Study of RAG and Long-Context LLMs -- What is better between these approaches? Could they coexist?
GraphRAG: Combining Retrieval and Summarization -- Enhancing Large Language Models for Complex Question Answering over Extensive Text Corpora
How Achieving Performance and Efficiency in RAG -- Exploring Optimal Strategies for Streamlined Retrieval-Augmented Generation Workflows
PlanRAG: Plan Your Way to Better Decisions -- Navigating complex decisions requires a plan: Can LLMs be used for decision-making?
David vs. Goliath: Beating Long-Context Tasks with Small Models -- Unveiling LC-Boost: A Framework for Efficient and Effective Long-Context Processing
HippoRAG: Endowing Large Language Models with Human Memory Dynamics -- Copy the brain for better knowledge integration and retrieval
RAG is Dead, Long Live RAG -- Is it really true that long-context LLMs are killing the RAG?
War and Peace: A Conflictual Love Between the LLM and RAG -- There is a complex relationship between the LLM prior knowledge and the RAG.
Bring Your AI Agents from Virtual to Reality -- AI agents are the new frontier, but how they are doing in the real world?
Follow the Echo: How to Get a Good Embedding from your LLM -- How to overcome the limits of Autoregressive Models for embedding
DeepMind’s SIMA: Rule the Simulated World Before Take Over the Real One -- A new agent by DeepMind shows impressive new generalization skills in videogames
Cosine Similarity and Embeddings Are Still in Love? -- Cosine similarity is the most used method, but it is really the best?
HuggingGPT: Give Your Chatbot an AI Army -- HuggingGPT is capable to manage other models and solve complex tasks

 

Back to General index -- Index of tutorials

LLM models

Articles notebook description
What Is The Best Therapy For a Hallucinating AI Patient? -- Exploring the Art and Science of Prompt Engineering to Cure LLM Hallucinations
LLMs and the Student Dilemma: Learning to Solve or Learning to Remember? -- Investigating Whether Large Language Models Rely on Genuine Understanding or Clever Heuristics in Arithmetic Reasoning
You Know Nothing, John LLM: Why Do You Answer Anyway? -- Distinguishing Knowledge Gaps from Misguided Confidence in Large Language Models
Less Distraction, More Precision: The Diff Transformer’s Secret to Better Language Models -- Unlocking Efficiency in AI: How the Diff Transformer Filters Noise to Enhance Accuracy and Performance
Kolmogorov-Arnold Transformer (KAT): Is the MLP Headed for Retirement? -- Exploring how the Kolmogorov-Arnold Transformer (KAT) challenges the MLP dominance in modern deep-learning
OpenAI’s New ‘Reasoning’ AI Models Arrived: Will They Survive the Hype? -- Will the Captain Catch the Whale of Reasoning or Sink in the Pursuit
DeepMind’s AlphaProof: Achieving Podium Glory at the Math Olympiad Model -- Google DeepMind’s new artificial intelligence systems can solve complex mathematical problems
Google Gemma: is it Really a Gem? -- Google has just released two new open-source LLMs and is pushing for their adoption
MiQu: Can a mysterious model be a GPT-4 rival? -- An open-source model seems to be performing as GPT-4 but we do not know much about it
Are xLSTM a Menace to Transformer Dominion -- Researchers have massively improved LSTM, but what does it mean for the future?
GPT-4O, One Model is All You Need -- The best part is it should be free for everyone
OpenELM Can Be The End of Siri -- Apple thinks the future of generative AI is on devices, but how?
LLaMa 3 is Here. Will It Be The Winning Animal in The Generative AI Zoo. -- LLaMA 3 is in early release, but the new META’s animal has fierce competition now
Does it Really Matter Grok? -- Musk claims he has open-source Grok, but it does matter or is it just another move in larger play?
PlanGPT: LLM domain specific to revolutionizing industries -- Knowledge and planning give the power to reshape industries
LeMA: For an LLM Learning Math is Making Mistakes -- learning from mistakes helps large language models achieve better performance in reasoning tasks
LLemma: a Model Speaking Math -- A model beating previous competitors for mathematical reasoning
Mistral 7B: a New Wind Blowing Away Other Language Models -- Mistral 7B is more performing and faster than other LLMs
GPT-InvestAR: LLMs for better investment -- From Text to Trade: Could an LLM exploit annual reports to predict stock to buy?
Platypus: Quick, Cheap, and Powerful LLM -- Winning over the others with only one GPU and 5 hours of fine-tuning
META LLaMA 2.0: the most disruptive AInimal -- Meta LLaMA can reshape the chatbot and LLM usage landscape
The Intelligence Quotient of GPT-4: how to determinate intelligence -- From Artificial Intelligence to Artificial General Intelligence: Where Does GPT-4 Stand?
Did ChatGPT have an impact? -- Three months after the chatbot took the world by storm what happened?
FinGPT: open-source LLM for finance -- Why this is important? Why do we need it?
META’S LIMA: Maria Kondo’s way for LLMs training -- Less and tidy data to create a model capable to rival ChatGPT
Google USM: how Google plans a 1,000-language AI model -- Can we create a model for all the spoken languages?
SpikeGPT: a 260 M only parameters LM not afraid of competition -- Spiking Neural Networks are a promising alternative for the new generative AI models
Is ChatGPT losing its capabilities? -- The updated version of GPT-4 seems performing worst, is it true?
CodeGen2: a new open-source model for coding -- SaleForce’s effect on how to design an efficient model for coding
META’s LLaMA: A small language model beating giants -- META open-source model will help us to understand how LMs biases arise
SparseGPT: fewer parameters is better? -- How to get rid of 100 billion parameters and happily infer on one GPU
Microsoft BioGPT: Towards the ChatGPT of life science? -- BioGPT achieves the SOTA in different biomedical NLP tasks
Microsoft or: How I Learned to Stop Worrying and Love ChatGPT -- how Google disapproves of this love and other stories related
META’s CICERO: beating humans at diplomacy -- A model able to conversate, persuade and beat you in a game of trust and betrayal
META’s PEER: A Collaborative Language Model -- PEER (Plan, Edit, Explain, Repeat): collaborate with the AI to write a text
Meta’s Hokkien: AI Translates an Unwritten Language for the First Time -- Speech-to-speech model for a language that is passed down predominantly orally
No Language Left Behind -- Meta’s new model is able to translate between 200 different languages making the internet more accessible
Google’s Minerva, Solving Math Problems with AI -- Quantitative reasoning is hard for humans and it is hard for computers. Google’s new model just got astonishing results in solving math problems.
A New BLOOM in AI? Why the BLOOM Model Can Be a Gamechanger -- We are now used to large language models, why is this so special?
Everything but everything you need to know about ChatGPT -- what is known, the latest news, what it is impacting, and what is changing. all in one article
The Unbearable Lightness of Being ChatGPT -- An ethical discussion with the most talked-about chatbot of the moment
Deepmind’s Alphatensor: The AI That Is Reinventing Math -- How the DeepMind’s latest model could revolutionize math

 

Back to General index -- Index of tutorials

Computer vision

Articles notebook description
The Computer Vision’s Battleground: Choose Your Champion -- Which is the best computer vision model? Which one is best for a particular task?
Have convolutional networks become obsolete -- Vision transformers seem to have replaced convolutional networks, but are they really better?
UniverSeg: Universal Scissor for Medical Image Segmentation -- Medical segmentation is hard and expensive. Would be possible a model to cut them all?
META’s Hiera: reduce complexity to increase accuracy -- Simplicity allows AI to reach incredible performance and surprising speed
META’S ImageBind: The Embedding Glue for Your Modalities -- New META’s model is able to obtain a unique embedding for up to six modalities.
META DINO: how self-supervised learning is changing computer vision -- Curated data, visual features, and knowledge distillation: the foundations of next computer vision models
META’S SAM: A Unique Model to Segment Anything -- Segmentation needs a foundation model: why is it important?
Why Do We Have Huge Language Models and Small Vision Transformers? -- Google ViT-22 paves the way for new large transformers and to revolutionize computer vision
Create your painting app with AI and Streamlit App Link / GitHub repository How to make an app with few lines of code and a spare afternoon
A Visual Journey in What Vision-Transformers See -- How some of the largest models see the world

 

Back to General index -- Index of tutorials

Artificial intelligence and music

Articles notebook description
Meta’s MusicGen: a melody is worth 1000 tokens -- Meta's new model has incredible results on text-to-audio
AudioGPT: bridging text to music -- A new AI model connects ChatGPT with audio and music models
Google’s MusicLM: from text description to music -- A new model is generating impressive music from just text prompt
Generate a piano cover with AI -- A new model generates a piano cover from a pop song: how it works? how you can try it?
Microsoft’s Museformer: AI music is the new frontier -- AI art is exploding, music can be next.
Google’s Audiolm: Generating Music by Hearing a Song’s Snippet -- Whether music or speech, Google's new model can continue playing what is hearing.

 

Back to General index -- Index of tutorials

Multi-modal

Articles notebook description
Is Apple ready to launch its own AI? -- MM1 appears to be a sign that Apple is intent on accelerating on AI
Stable Diffusion 3: Can You Still Believe in Your Eyes? -- Stable Diffusion 3 has been announced: what all we know so far
Lord of Vectors: One Embedder to Rule Them All -- Embedders are back in vogue, so why not have a universal one?
Meta-Transformer: one model to rule all -- From text to video, from graph to images, what if we could use just one model?
MiniGPT-4: small chatbot, large vision-language understanding -- Meet the most efficient and open-source rival of GPT-4
BLIP-2: when ChatGPT meets images -- BLIP-2, a new visual language model capable to dialogue about images
Data2vec: one AI to rule all -- A model that can learn across modalities learning by itself
Google’s PaLI: language-image learning in 100 languages -- A new impressive model able to reach state-of-the-art in complex tasks
Multimodal Chain of Thoughts: Solving Problems in a Multimodal World -- The world is not only text: How to extend the chain of thoughts to image and text?

 

Back to General index -- Index of tutorials

AI and ethics

Articles notebook description
Lie to Win: When Competition Makes AI Deceptive -- Inside the race to the bottom: why competitive pressures push AI toward deception.
A Malicious Seed: Fine-Tuning LLMs to Grow Bad Behaviors -- How Narrow Training on Insecure Code Leads to Broad AI Misalignment and Deception
Is Wikipedia an Endangered Species? Is ChatGPT Its Predator? -- Exploring the Impact of Large Language Models on Wikipedia’s Future
The Cultural Lens of AI: Which Party Would Your LLM Vote? -- Unveiling Ideological Bias Across Languages and Cultures in Large Language Models
Be Yourself: Does Assigning Roles Hurt AI Performance? -- Does Personality Matter? How Roles in System Prompts Affect AI Output
Power Corrupts: Hierarchies, Persuasion, and Anti-Social Behavior in LLMs -- Unraveling Power Dynamics and Ethical Implications in LLM Agents
AI Won’t Steal Your Job — But Get Ready for the World’s Most Annoying Coworker -- How AI Assistants Are Boosting Productivity While Becoming the Overachievers of the Office
Past Imperfect: Jailbreaking LLMs with Past Tense Requests -- How Historical Reformulations Expose Vulnerabilities in AI Safety Measures
The Goldfish LLM: Swimming Through Data Without Memorizing It -- Novel Training Approaches to Avoid Data Memorization and Privacy Risks Without Impacting Performance
How transparent are large language models? -- Stanford proposes an index to measure LLM transparency, and the results are not encouraging
Scaling Data, Scaling Bias: A Deep Dive into Hateful Content and Racial Bias in Generative AI -- scaling seems the solution for every issue in machine learning: but it is true?
Reshaping the Model’s Memory without the Need for Retraining -- Erasing any echo of problematic content a large language model has learned
PrAIde and Prejudice: Tracking and Minimize Political Bias in LLMs -- How to track the political biases and their impact on NLP
The Mechanical Symphony: Will AI Displace the Human Workforce? -- GPT-4 shows impressive skills: what will be the impact on the labor market?
The EU wants to regulate your favorite AI tools -- EU is preparing a new AI bill and generative AI is included
Machine unlearning: The duty of forgetting -- How and why it is important to erase data point information from an AI model

 

Back to General index -- Index of tutorials

Others

Articles notebook description
5 Haikus of Artificial Intelligence -- The inner poetry of artificial intelligence
This Is the End of Normalization, and the Transformer Feels Fine -- Exploring the Power of Dynamic Tanh in Transformer Models Without Normalization Layers
Can an LLM Outperform Human Analysts in Financial Analysis? -- Chicago University Has Conducted A Comparative Study of AI and Human Expertise in Earnings Forecasting
The 2023 AI year in brief -- A recap of an incredible AI year
Is AI funny? Maybe, a Bit -- Why AI is still struggling with humor and why this an important step
To AI or not to AI: how to survive? -- With generative AI threatening businesses and side hustles, how you can find space?
The Infinite Babel Library of LLMs -- Open-source, data, and attention: How the future of LLMs will change
RazzAIe awards 2022: what are the worst AI of the year? -- What are the worst models of the year? What went wrong?
Deep learning can tell if you are above the drinking limit -- A new algorithm that can measure your alcohol consumption from your speech
2023: what should we expect to see in AI? -- A discussion on emerging trends and possible scenarios
The Rise of AI: A Look at the 2022 Landscape -- Innovation and disruption: a look-up on what happened in AI in 2022
Can an AI be a data scientist? notebook OpenAI’s ChatGPT is blowing data scientists' minds. Could it steal their job?
Is AI Changing Football? -- Data science has arrived in football. How teams and companies are using it?
Make an app with streamlit in minutes code here Build an app to predict yoga position from photos with Python
DreamFusion: 3D models from text -- A new Google diffusion model that allows 3D images to be obtained from the text.
A critical analysis of your dataset -- Stop finetuning your model: your model is already good, but not your data
How AI and X-rays To Detect Explosives Could Also Identify Cancers -- How AI enhance X-rays to detect concealed explosive and potentially tumors, wall breach by their textures

 

Back to General index -- Index of tutorials

Articles and tutorials of Bioinformatics/AI/ML applied to Biology

This series of tutorials on using machine learning and transcriptomic data with transcriptomic data. I will implement also tutorials about the use of machine learning with biomedical images.

Tutorial notebook description
AML introduction -- Acute Myeloid Leukemia: A general introduction
Intorduction on AI in leukemia -- Artificial intelligence in leukemia
Introduction on computer vision in AML -- Medical image diagnosis in leukemia
Introduction on computer vision in Covid-19 -- Medical Image Diagnosis in COVID-19
Complexity reduction techniques Jupiter notebook Python: PCA, t-SNE, UMAP
Clustering techniques Jupiter notebook Python: Hierarchical clustering, k-means
Clustering: DBSCAN and GMM Jupiter notebook Python: DBSCAN and GMM
Linear regression -- Introduction and the linear regression math
Linear regression Jupiter notebook Python: Linear regression, training, evaluation, inspection and solution
Logistic regression -- Python: Introduction to logistic regression math
Logistic regression Jupiter notebook Python: Logistic regression, training, evaluation, inspection and solution

 

Back to General index -- Index of tutorials

Contributing

Open an issue if you find any error or you want to provide a feedback

License

This project is licensed under the MIT License

Bugs/Issues

Comment or open an issue on Github