-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First draft of embedding blog post (#426)
* First draft of embedding blog post This is focused on a technical audience, with the goal of sharing how and why we're building niche targeting. There will be a second post focusing on the marketing side of things. * A couple updates * Rename page to niche targeting * Add syntax highlighting * Fix a mistake in python formatting * Update content/posts/2024-niche-ad-targeting.md Co-authored-by: David Fischer <[email protected]> * Update content/posts/2024-niche-ad-targeting.md Co-authored-by: David Fischer <[email protected]> * Update content/posts/2024-niche-ad-targeting.md Co-authored-by: David Fischer <[email protected]> * Update content/posts/2024-niche-ad-targeting.md Co-authored-by: David Fischer <[email protected]> * Address feedback * Even more dev-focused title * Apply suggestions from code review Co-authored-by: David Fischer <[email protected]> * Update image * Update content/posts/2024-niche-ad-targeting.md Co-authored-by: David Fischer <[email protected]> --------- Co-authored-by: David Fischer <[email protected]>
- Loading branch information
1 parent
5381931
commit a2a0d3f
Showing
5 changed files
with
214 additions
and
1 deletion.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
Title: Using embeddings in production with Postgres & Django for niche ad targeting | ||
Date: Apr 23, 2024 | ||
description: How we built better contextual ad targeting using PostgreSQL, Django, and embeddings which are a way to encode the "relatedness" of text and pages for use in machine learning. | ||
tags: content-targeting, engineering | ||
authors: Eric Holscher | ||
image: /images/posts/niche-targeting.png | ||
|
||
This is an update to our original post on [content-based ad targeting](https://www.ethicalads.io/blog/2022/11/a-new-approach-to-content-based-targeting-for-advertising/). | ||
In this post, I'll talk a bit more about our next step, using machine learning (embeddings specifically) to build better contextual ad targeting. | ||
|
||
At the end of our last post, | ||
we were crawling all our publisher's pages, | ||
and categorizing pages into _Topics_ based on page text. | ||
We did this by training a model with ~100 examples of each topic, | ||
and then storing the topics in our database for fast ad serving. | ||
|
||
This gave us a good starting point for targeting ads by topic, | ||
but we wanted to get more granular. | ||
|
||
## Targeting each page individually with embeddings | ||
|
||
Our new approach is to use [word embeddings](https://en.wikipedia.org/wiki/Word_embedding) to represent both the advertisers landing page and the publisher pages. | ||
This allows us to generate a representation of these pages, | ||
which can be compared against each other. | ||
|
||
We're currently using Python's [SentenceTransformers](https://www.sbert.net/) library to generate these embeddings. | ||
We will likely upgrade to a more advanced model in the future, | ||
but this was perfect for our initial tests. | ||
|
||
### Generating, storing, and querying embeddings | ||
|
||
A simple example of what this looks like might be: | ||
|
||
```python | ||
import requests | ||
from bs4 import BeautifulSoup | ||
from sentence_transformers import SentenceTransformer | ||
|
||
# Generate embeddings for a page | ||
|
||
model = SentenceTransformer(MODEL_NAME, cache_folder=CACHE_FOLDER) | ||
text = BeautifulSoup(requests.get(url), 'html.parser').get_text() | ||
embedding = model.encode(text) | ||
print(embedding.tolist()) | ||
``` | ||
|
||
We're then using [pgvector](https://github.com/pgvector/pgvector) and [pgvector-python](https://github.com/pgvector/pgvector-python) to manage these embeddings in Django & Postgres, | ||
which is what we're already using in production. | ||
|
||
```python | ||
from django.db import models | ||
from pgvector.django import VectorField | ||
|
||
# Store the content in Postgres/Django | ||
|
||
class Embedding(models.Model): | ||
# FK where we keep metadata about the URL | ||
analyzed_url = models.ForeignKey( | ||
AnalyzedUrl, | ||
on_delete=models.CASCADE, | ||
related_name="embeddings", | ||
) | ||
|
||
# Model name so we can use different models in the future | ||
model = models.CharField(max_length=255, default=None, null=True, blank=True) | ||
|
||
# The actual embedding | ||
vector = VectorField(dimensions=384, default=None, null=True, blank=True) | ||
``` | ||
|
||
Then we're able to query the database for the most similar publisher pages to an advertiser's landing page: | ||
|
||
```python | ||
from pgvector.django import CosineDistance | ||
from .models import Embedding | ||
|
||
# Find the most similar ads for the page we're serving an ad on | ||
|
||
Embedding.objects.annotate( | ||
distance=CosineDistance("vector", embedding) | ||
).order_by("distance") | ||
|
||
``` | ||
|
||
## Try out a demo | ||
|
||
You can see a screenshot of our niche targeting in action at the top of this page. | ||
This is a simple proof of concept, | ||
but you can see how we're able to target ads specifically focusing on MongoDB and Databases, | ||
when serving a MongoDB ad. | ||
|
||
You can [try out our Niche Targeting Demo](https://www.ethicalads.io/advertisers/similar-pages/?url=https%3A%2F%2Fwww.mongodb.com%2Fatlas), | ||
and let us know how it goes! | ||
|
||
## Advantages of Niche Targeting | ||
|
||
There is a huge win both in terms of privacy and user experience with this approach: | ||
|
||
* **We're able to target ads to pages without needing to know anything about the user.** The better we get at targeting, the more powerful our ethical advertising approach becomes, and the larger we can scale out network. | ||
* The user experience of minimalist, well-targeted ads is better. We're able to show fewer ads and charge more for them because they perform better. This is a win-win for everyone. | ||
* We were able to implement this approach with minimal changes to our existing infrastructure, mostly because we're already heavily invested in the Python ecosystem and Postgres. | ||
|
||
## Challenges and Considerations | ||
|
||
We have a few challenges to overcome with this approach: | ||
|
||
* We currently aim for 50ms for ad response time, and this approach is currently slower than that. We're working on optimizing this with indexing, and might look at using an in-memory vector store in the future. | ||
* Embeddings currently work pretty well, but can often associate things that are not relevant. For example, we're run into issues where the same words are used to mean different things (eg. "model"), and the embeddings can get confused. | ||
|
||
## Conclusion | ||
|
||
This approach is still in its early stages, but we're excited about the potential. | ||
The better we can get at ethical ad targeting, | ||
everyone in our network benefits: | ||
|
||
* **Advertisers** get better ad targeting, ensuring they show up in front of the right users. | ||
* **Publishers** get more money while showing a single ad rather than resorting to multiple, larger ads that take over their site | ||
* **Users** get a better experience, with ads that are relevant to the content they are reading. | ||
|
||
This is our vision for advertising, | ||
and we're excited about the potential of this approach. | ||
|
||
Thanks so much to Simon Willison for his [blog post on embeddings](https://simonwillison.net/2023/Oct/23/embeddings/), | ||
which is what inspired me to try this approach. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
.codehilite { | ||
border: 1px solid $gray-400; | ||
margin-bottom: 1rem; | ||
} | ||
.codehilite pre { | ||
padding: 1rem; | ||
margin-bottom: 0 !important; | ||
} | ||
|
||
// Default Pygment Styles | ||
// | ||
// Generated with: | ||
// pygmentize -S default -f html -a .codehilite | ||
|
||
pre { line-height: 125%; } | ||
td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } | ||
span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } | ||
td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } | ||
span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } | ||
.codehilite .hll { background-color: #ffffcc } | ||
.codehilite { background: #f8f8f8; } | ||
.codehilite .c { color: #408080; font-style: italic } /* Comment */ | ||
.codehilite .err { border: 1px solid #FF0000 } /* Error */ | ||
.codehilite .k { color: #008000; font-weight: bold } /* Keyword */ | ||
.codehilite .o { color: #666666 } /* Operator */ | ||
.codehilite .ch { color: #408080; font-style: italic } /* Comment.Hashbang */ | ||
.codehilite .cm { color: #408080; font-style: italic } /* Comment.Multiline */ | ||
.codehilite .cp { color: #BC7A00 } /* Comment.Preproc */ | ||
.codehilite .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */ | ||
.codehilite .c1 { color: #408080; font-style: italic } /* Comment.Single */ | ||
.codehilite .cs { color: #408080; font-style: italic } /* Comment.Special */ | ||
.codehilite .gd { color: #A00000 } /* Generic.Deleted */ | ||
.codehilite .ge { font-style: italic } /* Generic.Emph */ | ||
.codehilite .gr { color: #FF0000 } /* Generic.Error */ | ||
.codehilite .gh { color: #000080; font-weight: bold } /* Generic.Heading */ | ||
.codehilite .gi { color: #00A000 } /* Generic.Inserted */ | ||
.codehilite .go { color: #888888 } /* Generic.Output */ | ||
.codehilite .gp { color: #000080; font-weight: bold } /* Generic.Prompt */ | ||
.codehilite .gs { font-weight: bold } /* Generic.Strong */ | ||
.codehilite .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ | ||
.codehilite .gt { color: #0044DD } /* Generic.Traceback */ | ||
.codehilite .kc { color: #008000; font-weight: bold } /* Keyword.Constant */ | ||
.codehilite .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */ | ||
.codehilite .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */ | ||
.codehilite .kp { color: #008000 } /* Keyword.Pseudo */ | ||
.codehilite .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */ | ||
.codehilite .kt { color: #B00040 } /* Keyword.Type */ | ||
.codehilite .m { color: #666666 } /* Literal.Number */ | ||
.codehilite .s { color: #BA2121 } /* Literal.String */ | ||
.codehilite .na { color: #7D9029 } /* Name.Attribute */ | ||
.codehilite .nb { color: #008000 } /* Name.Builtin */ | ||
.codehilite .nc { color: #0000FF; font-weight: bold } /* Name.Class */ | ||
.codehilite .no { color: #880000 } /* Name.Constant */ | ||
.codehilite .nd { color: #AA22FF } /* Name.Decorator */ | ||
.codehilite .ni { color: #999999; font-weight: bold } /* Name.Entity */ | ||
.codehilite .ne { color: #D2413A; font-weight: bold } /* Name.Exception */ | ||
.codehilite .nf { color: #0000FF } /* Name.Function */ | ||
.codehilite .nl { color: #A0A000 } /* Name.Label */ | ||
.codehilite .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */ | ||
.codehilite .nt { color: #008000; font-weight: bold } /* Name.Tag */ | ||
.codehilite .nv { color: #19177C } /* Name.Variable */ | ||
.codehilite .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */ | ||
.codehilite .w { color: #bbbbbb } /* Text.Whitespace */ | ||
.codehilite .mb { color: #666666 } /* Literal.Number.Bin */ | ||
.codehilite .mf { color: #666666 } /* Literal.Number.Float */ | ||
.codehilite .mh { color: #666666 } /* Literal.Number.Hex */ | ||
.codehilite .mi { color: #666666 } /* Literal.Number.Integer */ | ||
.codehilite .mo { color: #666666 } /* Literal.Number.Oct */ | ||
.codehilite .sa { color: #BA2121 } /* Literal.String.Affix */ | ||
.codehilite .sb { color: #BA2121 } /* Literal.String.Backtick */ | ||
.codehilite .sc { color: #BA2121 } /* Literal.String.Char */ | ||
.codehilite .dl { color: #BA2121 } /* Literal.String.Delimiter */ | ||
.codehilite .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */ | ||
.codehilite .s2 { color: #BA2121 } /* Literal.String.Double */ | ||
.codehilite .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */ | ||
.codehilite .sh { color: #BA2121 } /* Literal.String.Heredoc */ | ||
.codehilite .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */ | ||
.codehilite .sx { color: #008000 } /* Literal.String.Other */ | ||
.codehilite .sr { color: #BB6688 } /* Literal.String.Regex */ | ||
.codehilite .s1 { color: #BA2121 } /* Literal.String.Single */ | ||
.codehilite .ss { color: #19177C } /* Literal.String.Symbol */ | ||
.codehilite .bp { color: #008000 } /* Name.Builtin.Pseudo */ | ||
.codehilite .fm { color: #0000FF } /* Name.Function.Magic */ | ||
.codehilite .vc { color: #19177C } /* Name.Variable.Class */ | ||
.codehilite .vg { color: #19177C } /* Name.Variable.Global */ | ||
.codehilite .vi { color: #19177C } /* Name.Variable.Instance */ | ||
.codehilite .vm { color: #19177C } /* Name.Variable.Magic */ | ||
.codehilite .il { color: #666666 } /* Literal.Number.Integer.Long */ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,3 +37,4 @@ | |
@import "tables"; | ||
@import "theme"; | ||
@import "ads-styles"; | ||
@import "pygments_codehilite"; |