Vincent Granville PRO

vincentg64

https://mltechniques.com/resources/

AI & ML interests

GenAI, LLM, synthetic data, optimization, fine-tuning, model evaluation

Recent Activity

posted an update about 22 hours ago

Blueprint: Next-Gen Enterprise RAG & LLM 2.0 – Nvidia PDFs Use Case In my most recent articles and books, I discussed our radically different approach to building enterprise LLMs from scratch, without training, hallucinations, prompt engineering or GPU, while delivering higher accuracy at a much lower cost, safely, at scale and at lightning speed (in-memory). It is also far easier to adapt to specific corpuses and business needs, to fine-tune, and modify, giving you full control over all the components, based on a small number of intuitive parameters and explainable AI. Now, I assembled everything into a well-structured 9-page document (+ 20 pages of code) with one-click links to the sources including our internal library, deep retrieval PDF parser, real-life input corpus, backend tables, and so on. Access to all this is offered only to those acquiring the paper. Our technology is so different from standard LLMs that we call it LLM 2.0. This technical paper is much more than a compact version of past documentation. It highlights new features such as un-stemming to boost exhaustivity, multi-index, relevancy score vectors, multi-level chunking, various multi-token types (some originating from the knowledge graph) and how they are leveraged, as well as pre-assigned multimodal agents. I also discuss the advanced UI — far more than a prompt box — with unaltered concise structured output, suggested keywords for deeper dive, agent or category selection to increase focus, and relevancy scores. Of special interest: simplified, improved architecture, and upgrade to process word associations in large chunks (embeddings) even faster. ➡️ See how to get a free copy, at https://mltblog.com/4fPuvTb

posted an update 16 days ago

LLM 2.0, RAG & Non-Standard Gen AI on GitHub https://mltblog.com/3DsyZSq In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history at https://mltblog.com/4g2sKTv. OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition. ➡️ Read full article with links to GitHub, at https://mltblog.com/3DsyZSq

posted an update 28 days ago

Where LLMs Fail the Most, and How to Fix it https://mltblog.com/41BcGDY Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day. ➡️ First example: I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to a meaningful answer: see featured image. ➡️ Second example: A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address. ➡️ Read the article explaining causes, offering solutions, at https://mltblog.com/41BcGDY

View all activity

Organizations

None yet

vincentg64's activity

posted an update about 22 hours ago

Post

384

Blueprint: Next-Gen Enterprise RAG & LLM 2.0 – Nvidia PDFs Use Case

In my most recent articles and books, I discussed our radically different approach to building enterprise LLMs from scratch, without training, hallucinations, prompt engineering or GPU, while delivering higher accuracy at a much lower cost, safely, at scale and at lightning speed (in-memory). It is also far easier to adapt to specific corpuses and business needs, to fine-tune, and modify, giving you full control over all the components, based on a small number of intuitive parameters and explainable AI.

Now, I assembled everything into a well-structured 9-page document (+ 20 pages of code) with one-click links to the sources including our internal library, deep retrieval PDF parser, real-life input corpus, backend tables, and so on. Access to all this is offered only to those acquiring the paper. Our technology is so different from standard LLMs that we call it LLM 2.0.

This technical paper is much more than a compact version of past documentation. It highlights new features such as un-stemming to boost exhaustivity, multi-index, relevancy score vectors, multi-level chunking, various multi-token types (some originating from the knowledge graph) and how they are leveraged, as well as pre-assigned multimodal agents. I also discuss the advanced UI — far more than a prompt box — with unaltered concise structured output, suggested keywords for deeper dive, agent or category selection to increase focus, and relevancy scores. Of special interest: simplified, improved architecture, and upgrade to process word associations in large chunks (embeddings) even faster.

➡️ See how to get a free copy, at https://mltblog.com/4fPuvTb

posted an update 16 days ago

Post

2220

LLM 2.0, RAG & Non-Standard Gen AI on GitHub https://mltblog.com/3DsyZSq

In this article, I share my latest Gen AI and LLM advances, featuring innovative approaches radically different from both standard AI and classical ML/NLP. The focus is on doing better with less, using efficient architectures, new algorithms and evaluation metrics. It originates from research that I started long ago. It gained significant momentum in the last two years. See background and history at https://mltblog.com/4g2sKTv.

OpenAI, Perplexity, Anthropic, Llama and others typically follow the trend and implement solutions very similar to mines within 3 to 6 months after I publish new milestones. For instance, multi-tokens, knowledge graph tokens, multi-indexes, real-time fine-tuning, mixtures of experts, LLM routers, small enterprise sub-LLMs, prompt distillation, relevancy scoring engine, deep contextual retrieval, optimum agentic chunking, and modern UI instead of the basic prompt box. I keep adding new features all the time, staying ahead of competition.

➡️ Read full article with links to GitHub, at https://mltblog.com/3DsyZSq

1 reply

posted an update 28 days ago

Post

1828

Where LLMs Fail the Most, and How to Fix it https://mltblog.com/41BcGDY

Here I illustrate my two most recent interactions with AI-powered GPT. It was an awful failure, a lot worse than before GenAI. Indeed, I had to revert back to old Google search to get help. This is typical of what hundreds of millions of users now experience every day.

➡️ First example:

I get payments from Stripe. I asked how I can pay someone, as opposed to getting paid, as I had a contact asking me to pay him with Stripe. After 30 mins of prompts to AI support, I got nowhere. In the end I decided to pay my contact using a different platform. I could not figure out how to a meaningful answer: see featured image.

➡️ Second example:

A VC guy I started to interact with sent me a few messages, but I never received any of them. I tried to contact my email provider, but was faced with a GenAI bot to answer the following precise question: his email address is xyz, mine is abc, his messages do not even show up in my spam box, and I did not block their domain name; how to fix this? After receiving irrelevant answers, I ask point blank: can I chat with a real human? Again, irrelevant answers, no matter how I phrase my question. In the end I told my contact to send messages to an alternate email address.

➡️ Read the article explaining causes, offering solutions, at https://mltblog.com/41BcGDY

posted an update about 1 month ago

Post

1427

From 10 Terabytes to Zero Parameter: The LLM 2.0 Revolution https://mltblog.com/4g2sKTv

LLM 2.0 has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter? It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.

However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network. Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers. Many times, errors are quite subtle and can be overlooked.

Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.

➡️ To read full article and learn how LLM 2.0 changes the game, see https://mltblog.com/4g2sKTv

posted an update about 1 month ago

Post

1228

LLM 2.0, the New Generation of Large Language Models https://mltblog.com/49ksOLL

I get many questions about the radically different LLM technology that I started to develop 2 years ago. Initially designed to retrieve information that I could no longer find on the Internet, not with search, OpenAI, Gemini, Perplexity or any other platform, it evolved to become the ideal solution for professional enterprise users. Now agentic and multimodal, automating business tasks at scale with lightning speed, consistently delivering real ROI, bypassing the costs associated to training and GPU with zero weight and explainable AI, tested and developed for Fortune 100 company.

So, what is behind the scenes, how different is it compared to LLM 1.0 (GPT and the likes), how can it be hallucination-free, what makes it a game changer, how did it eliminate prompt engineering, how does it handle knowledge graphs without neural networks, and what are the other benefits?

In a nutshell, the performance is due to building a robust architecture from the ground up and at every step, offering far more than a prompt box, relying on home-made technology rather than faulty Python libraries, and designed by enterprise and tech visionaries for enterprise users.

Contextual smart crawling to retrieve underlying taxonomies, augmented taxonomies, long contextual multi-tokens, real-time fine-tunning, increased security, LLM router with specialized sub-LLMs, an in-memory database architecture of its own to efficiently handle sparsity in keyword associations, contextual backend tables, agents built on the backend, mapping between prompt and corpus keywords, customized PMI rather than cosine similarity, variable-length embeddings, and the scoring engine (the new “PageRank” of LLMs) returning results along with the relevancy scores, are but a few of the differentiators.

➡️ Read the full article, at https://mltblog.com/49ksOLL

1 reply

posted an update about 1 month ago

Post

522

LLM Deep Contextual Retrieval and Multi-Index Chunking: Nvidia PDFs, Case Study https://mltblog.com/3OBfU2p

The technology described here boosts exhaustivity and structuredness in LLM prompt results, efficiently exploiting the knowledge graph and contextual structure present in any professional or enterprise corpus. The case study deals with public financial reports from Nvidia, available as PDF documents.

In this article, I discuss the preprocessing steps used to turn a PDF repository into input suitable for LLMs. It includes contextual chunking, indexing text entities with hierarchical multi-index system, and retrieving contextual elements including lists, sub-lists, fonts (type, color, and size), images and tables – some not detected by standard Python libraries. I also discuss how to build additional contextual information such as agents, categories, or tags, to add to text entities to further improve any LLM architecture, and prompt results.

posted an update about 2 months ago

Post

1186

There is no such thing as a Trained LLM https://mltblog.com/3CEJ9Pt

What I mean here is that traditional LLMs are trained on tasks irrelevant to what they will do for the user. It’s like training a plane to efficiently operate on the runway, but not to fly. In short, it is almost impossible to train an LLM, and evaluating is just as challenging. Then, training is not even necessary. In this article, I dive on all these topics.

➡️ Training LLMs for the wrong tasks

Since the beginnings with Bert, training an LLM typically consists of predicting the next tokens in a sentence, or removing some tokens and then have your algorithm fill the blanks. You optimize the underlying deep neural networks to perform these supervised learning tasks as well as possible. Typically, it involves growing the list of tokens in the training set to billions or trillions, increasing the cost and time to train. However, recently, there is a tendency to work with smaller datasets, by distilling the input sources and token lists. After all, out of one trillion tokens, 99% are noise and do not contribute to improving the results for the end-user; they may even contribute to hallucinations. Keep in mind that human beings have a vocabulary of about 30,000 keywords, and that the number of potential standardized prompts on a specialized corpus (and thus the number of potential answers) is less than a million.

➡️ Read the full articles at https://mltblog.com/3CEJ9Pt, also featuring issues with evaluation metrics and the benefits of untrained LLMs.

posted an update 2 months ago

Post

1634

xLLM: New Generation of Large Language Models for Enterprise

Read full article at https://mltblog.com/4ftTko9

In this article, you will find my PowerPoint presentation describing the most recent features of xLLM, a CPU-based, full context, secure multi-LLM with real-time fine-tuning & explainable AI. It includes several new diagrams describing the innovative architecture, upcoming developments, new features and different use cases.

Content

➡️Enterprise use case: corporate corpus of a Fortune 100 company.
➡️Original version dealing with large websites such as Wolfram and Wikipedia. Comparison with OpenAI.
➡️xLLM for clustering and predictive analytics. Use case: unstructured text (articles) from a media company.
➡️Integration of our game-changing NoGAN tabular data synthesizer, and state-of-the-art model evaluation technology.
➡️Integration of external tools, for instance to solve math problems.
➡️Upcoming version for auto-indexing and cataloging large repositories.
➡️Demo: enterprise xLLM in action, featuring the modern user interface (full web API, not just a prompt box) with command menu and numerous options not found in other LLMs, including debugging, suggested prompts, choice of agents, and fine-tuning in real time.
➡️Relevancy score displayed to the user, for each returned item. I call it the new PageRank for RAG/LLM, using a technology radically different from Google search. See picture.

New startup coming soon!

We will be launching soon (January) a new startup focusing on GenAI at scale for Enterprises; xLLM will be part of the offer with exclusive features. We are looking for early adopters to partner with us on the Journey. The co-founder and CEO, to be announced soon, is Senior Director of GenAI at a Fortune 100 company, where the first version of Enterprise xLLM was implemented. More to come!

Read more, and access the PPT, at https://mltblog.com/4ftTko9

posted an update 3 months ago

Post

1000

New Book: Building Disruptive AI & LLM Technology from Scratch https://mltblog.com/404F1BZ

This book features new advances in game-changing AI and LLM technologies built by GenAItechLab.com. Written in simple English, it is best suited for engineers, developers, data scientists, analysts, consultants and anyone with an analytic background interested in starting a career in AI. The emphasis is on scalable enterprise solutions, easy to implement, yet outperforming vendors both in term of speed and quality, by several orders of magnitude.

Each topic comes with GitHub links, full Python code, datasets, illustrations, and real-life case studies, including from Fortune 100 company. Some of the material is presented as enterprise projects with solution, to help you build robust applications and boost your career. You don’t need expensive GPU and cloud bandwidth to implement them: a standard laptop works.

➡️ Part 1: Hallucination-Free LLM with Real-Time Fine-Tuning

➡️ Part 2: Outperforming Neural Nets and Classic AI

➡️ Part 3: Innovations in Statistical AI

About the author

Vincent Granville is a pioneering GenAI scientist and machine learning expert, co-founder of Data Science Central (acquired by a publicly traded company in 2020), Chief AI Scientist at ML Techniques and GenAI Techlab, former VC-funded executive, author (Elsevier) and patent owner — one related to LLM. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET.

➡️ See content and get your copy, at https://mltblog.com/404F1BZ

posted an update 3 months ago

Post

1284

Building a Ranking System to Enhance Prompt Results: The New PageRank for RAG/LLM

Read full article at https://mltblog.com/4gT62y9

In this document, you will learn how to build a system that decides, among dozens of candidate paragraphs selected from the corpus to answer a prompt, which ones to show in the results, and in what order. The goal is to maximize relevancy while not overwhelming the user with a long, cluttered answer. Think of it as the new PageRank for RAG/LLM, although the algorithm is radically different, and much simpler. The approach is generic and works for all RAG/LLM systems whether based on neural networks or not. It is implemented in xLLM.

The article includes Python code (with links to GitHub) and case study featuring the anonymized augmented corpus of a fortune 100 company, as well as future LLM developments (auto-indexing and LLM for glossary generation).

posted an update 4 months ago

Post

1825

No-Code LLM Fine-Tuning and Debugging in Real Time: Case Study

Full doc at https://mltblog.com/47DisG5

Have you tried the xLLM web API? It allows you to fine-tune and debug an agentic multi-LLM in real time. The input data is part of the anonymized corporate corpus of a Fortune 100 company, dealing with AI policies, documentation, integration, best practices, references, onboarding, and so on. It features one sub-LLM. The full corpus is broken down into 15 sub-LLMs.

One of the goals is to return concise but exhaustive results, using acronyms (a specific table for each sub-LLM) to map multi-tokens found in prompts but not in the corpus, with multi-tokens in the corpus. Exhaustivity is the most overlooked metric when evaluating LLMs designed for search / retrieval. Using xLLM in combination with another LLMs is one of the best approaches, and both can be used to evaluate each other. Yet, thanks to fast in-memory processing, no weight, and no training, the xLLM web API is one of its kind, with capabilities not found in any competing product, free or not.

Read more at https://mltblog.com/47DisG5

replied to their post 4 months ago

You are welcome Stephen!

posted an update 4 months ago

Post

1454

Hyperfast Contextual Custom LLM with Agents, Multitokens, Explainable AI, and Distillation https://mltblog.com/4dNPSnB

New additions to this ground-breaking system include multi-token distillation when processing prompts, agents to meet user intent, more NLP, and a command prompt menu accepting both standard prompts and various actions.

I also added several illustrations, featuring xLLM in action with a full session and sample commands to fine-tune in real-time. All the code, input sources (anonymized corporate corpus from fortune 100 company), contextual backend tables including embeddings, are on GitHub. My system has zero weight, no transformer, and no neural network. It relies on explainable AI, does not require training, is fully reproducible, and fits in memory. Yet your prompts can retrieve relevant full text entities from the corpus with no latency — including URLs, categories, titles, email addresses, and so on — thanks to well-designed architecture.

Read more, get the code, paper and everything for free, at https://mltblog.com/4dNPSnB

2 replies

posted an update 5 months ago

Post

642

30 Features that Dramatically Improve LLM Performance - Part 1 https://mltblog.com/3Aq9iAb

Many are ground-breaking innovations that make LLMs much faster and not prone to hallucinations. They reduce the cost, latency, and amount of computer resources (GPU, training) by several orders of magnitude. Some of them improve security, making your LLM more attractive to corporate clients. I introduced a few of these features in my previous article "New Trends in LLM Architecture". Now I offer a comprehensive list, based on the most recent developments.

Read full article, learn about agentic LLMs, LLM routers, contextual tables, fast search, and more, at https://mltblog.com/3Aq9iAb

replied to their post 5 months ago

See my tests: Python badly fails the congruential equidistribution among others, whatever generator is implemented in Python 3.10.

The one-line formula is this:

(5^n >> n) % (2^n)

Can be executed very efficiently. Each new n gives you n new bits independent from the previous ones. That's one of many sequences proposed in my paper. With n = 10^6, you get a total of 5 x 10^11 bits.

As for non-reproducibility, all of what I tested, you run the code twice, you get two different results. You have to set a seed for all sources of randomness, not some of them as set_seed does.

Finally, I have 40 years of experience designing random generators of increasing quality and PhD in computational stats (postdoc at the statslabs, Cambridge University).

Your Dieharder battery of tests is a joke designed by amateurs who know basic stuff in stats and nothing in number theory. The fact that everyone uses it does not make my statement less true. It does not even test "strong randomness", a concept defined in one of my books.

Actually, you only need one single test to check strong randomness: the full multivariate Kolmogorov-Smirnov distance. As far as I know, I am the only one to have implemented it in any dimension: https://pypi.org/project/genai-evaluation/

posted an update 5 months ago

Post

453

Most LLMs are not reproducible because the underlying deep neural networks are not. Because that's something LLM creators don't care about. We do, and ours are reproducible, including our GenAI that uses GAN.

All you have to do is allow the user to specify the seeds of the random number generators involved. First, you need a good random generator you have full control over. Better than numpy.random. See ours, with infinite period and one line of code, faster and better than what's in Python and elsewhere. Here is the link: https://mltblog.com/4fGDLu0

2 replies

posted an update 5 months ago

Post

1778

Free LLM/RAG course at https://mltblog.com/48GebAG - learn how to build custom architectures from scratch, and earn an LLM certification, all free.

The GenAItechLab Fellowship program allows participants to work on state-of-the-art, enterprise-grade projects, entirely for free, at their own pace, at home or in their workplace. The goal is to help you test, enhance, and further implement applications that outperform solutions offered by AI startups or organizations such as Google or OpenAI.

You will learn how to quickly build faster and lighter systems that deliver better results based on sound evaluation metrics, with a focus on case studies and best practices. Not the least, you will learn modern methods here to stay, designed by world-class expert and investor, Dr. Vincent Granville, founder of GenAItechLab.com.

posted an update 6 months ago

Post

573

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning https://mltblog.com/3WcTS9C -- Just released!

This article features an application of xLLM to extract information from a corporate corpus, using prompts referred to as “queries”. The goal is to serve the business user — typically an employee of the company or someone allowed access — with condensed, relevant pieces of information including links, examples, PDFs, tables, charts, definitions and so on, to professional queries.

My custom sub-LLM designed from scratch does not rely on any Python library or API, and performs better than search tools available on the market, in terms of speed and results relevancy. It offers the user the ability to fine-tune parameters in real time, and can detect user intent to deliver appropriate output. The good performance comes from the quality of the well-structured input sources, combined with smart crawling to retrieve the embedded knowledge graph and integrate it into the backend tables. Traditional tools rely mostly on tokens, embeddings, billions of parameters and frontend tricks such as prompt engineering to fix backend issues.

To the contrary, my approach focuses on building a solid backend foundational architecture from the ground up. Tokens and embeddings are not the most important components, by a long shot. Cosine similarity and dot products are replaced by pointwise mutual information. There is no neural network, no training, and a small number of explainable parameters, easy to fine-tune.

Read more, access the code and data, at https://mltblog.com/3WcTS9C

posted an update 6 months ago

Post

1238

How to create custom LLMs from scratch

See my new podcast on this topic, at https://mltblog.com/3xS1bf5

Despite GPT, Claude, Gemini, LLama and the other host of LLMs that we have access to, a variety of organizations are still exploring their options when it comes to custom LLMs. Logging in to ChatGPT is easy enough, and so is creating a ‘custom’ openAI GPT, but what does it take to create a truly custom LLM? When and why might this be useful, and will it be worth the effort?

posted an update 6 months ago

Post

468

New Trends in LLM: Overview with Focus on xLLM

Read full article and download PowerPoint presentation at https://mltblog.com/3KqlNO7

If you ever wondered how xLLM is different from other LLM and RAG architectures, what are the foundational changes that make it appealing to fortune 100 companies, and what are the innovations being copied by competitors, read on. In this article, I share the latest trends and provide a high-level summary of xLLM, describing the ground-breaking technologies that make it unique, faster, and better for professional users and experts. In particular, I share my PowerPoint presentation on the topic.

Search is becoming hot again, this time powered by RAG and LLMs rather than PageRank. New LLMs may not use transformers, and energy-efficient implementations are gaining popularity, with an attempt to lower GPU usage, and thus costs. Yet all but xLLM still rely on Blackbox neural networks.

Great evaluation metrics remain elusive and will remain so probably forever: in the end, LLMs, just like clustering, are part of unsupervised learning. Two users looking at a non-trivial dataset will never agree on what the “true” underlying cluster structure is. Because “true” is meaningless in this context. The same applies to LLMs. With some exceptions: when used for predictive analytics, that is, supervised learning, it is possible to tell which LLM is best in absolute terms (to some extent; it also depends on the dataset).