Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature filter keywords 3710 #5263

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

leoseg
Copy link
Contributor

@leoseg leoseg commented Dec 14, 2024

Option to block keyswords so no post containing this keywords in their name, body or url are returned, keywords are stored in seperate table related to person

@aeharding
Copy link

What happens if you block "X"? Will it match "Xylophone"? What about "x.com"?

Unlike person and community blocks this seems a bit more ambiguous.

@leoseg
Copy link
Contributor Author

leoseg commented Dec 16, 2024

In the moment yes, because the ilike searches for case insensitive substrings.
A possibility could be to block only posts and descriptions which containing the blocked keyword as one word of his own (without being part of another word) , or urls having the keyword as parameter, or identifier without prefixes and suffices. For example for www.mcdonalds.com and the 'mcdonalds' part should be equal the blocked keyword if it should be filtered.

@Nutomic
Copy link
Member

Nutomic commented Dec 17, 2024

You could also consider a minimum length of eg 3 characters for each blocked string.

And you need to make sure that the checks are passing (click the details link at the bottom of this page). See .woodpecker.yml in the repo for the commands it runs, so you can check and fix them locally.

@leoseg
Copy link
Contributor Author

leoseg commented Dec 19, 2024

yes this sounds more simple and in the Connect Lenny app they do it like this. I will add the restriction, fix checks add unit tests when I have time at the beginning of the next year.

@leoseg leoseg force-pushed the Feature-Filter-Keywords-3710 branch from 0441a7d to 3d1e7d5 Compare February 6, 2025 22:51
@dessalines
Copy link
Member

I apologize for being slow to review this, ping me or request review when you'd like me to take a look.

@leoseg leoseg marked this pull request as ready for review March 5, 2025 10:14
@leoseg leoseg requested a review from dessalines as a code owner March 5, 2025 10:14
.route("/report/resolve", put().to(resolve_post_report)),
.route("/report/resolve", put().to(resolve_post_report))
.route("/site_metadata", get().to(get_link_metadata))
.route("/block", post().to(user_block_keyword_for_posts)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont change api v3, this is only kept for backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the changes

@@ -355,7 +356,8 @@ pub fn config(cfg: &mut ServiceConfig, rate_limit: &RateLimitCell) {
scope("/block")
.route("/person", post().to(user_block_person))
.route("/community", post().to(user_block_community))
.route("/instance", post().to(user_block_instance)),
.route("/instance", post().to(user_block_instance))
.route("/post", post().to(user_block_keyword_for_posts)),
Copy link
Member

@Nutomic Nutomic Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about /account/block/post_keywords?

@@ -0,0 +1,3 @@
ALTER TABLE post_keyword_block
ALTER COLUMN keyword TYPE varchar(50);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should only be a single migration per pull request. For the table name user_post_keyword_block is clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the new migration and altered the first migration.

id serial PRIMARY KEY,
keyword varchar(255) NOT NULL,
person_id int REFERENCES person (id) ON UPDATE CASCADE ON DELETE CASCADE NOT NULL
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a unique constraint for (keyword, person_id). You also might be able to use that as primary key and get rid of id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

diesel::table! {
post_keyword_block (id) {
id -> Int4,
#[max_length = 20]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This length doesnt match the migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixxed

#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize, Default)]
#[cfg_attr(feature = "full", derive(DieselNewType, TS))]
#[cfg_attr(feature = "full", ts(export))]
/// The comment reply id.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong copypaste

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is removed because id is replaced by compose primary key

PostKeywordBlock::unblock_keyword(&mut context.pool(), &post_keyword_block_form).await?;
}
Ok(Json(SuccessResponse::default()))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of blocking or unblocking keywords one by one, you could also update the whole blocklist at once (just like we do for user languages). Not sure which approach is preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, should be more easy for the Frontend to implement, updated that this way.

sql::<Bool>("NOT (post.body LIKE ANY(")
.bind::<Array<Text>, _>(transformed_strings.clone())
.sql("))"),
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this raw sql you can build a regex like (keyword1|keyword2|...) and filter with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be less performant, as the regex string could be a bit long when there are for example 15 keywords? If i would do that would this be with like and than the regex (keyword1|keyword2|...) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, cc @dessalines @dullbananas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use not similar to:

Regex would def be slower than this. But yeah none of this custom binding is necessary. Also extract the building of this string to its own function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use this loop:

for keyword in blocked_keywords {
    let pattern = format!("%{}%", s);
    query = query.filter(post::name.not_like(pattern));
    query = query.filter(post::url.not_like(pattern));
    query = query.filter(post::body.not_like(pattern));
}

Also, it should probably be ilike instead of like

leoseg added 3 commits March 9, 2025 13:10
…block and replaced id as primary key with the composed key person_id and keyword. Also now use update function which updates the blocked keywords for an user with a new list instead keyword by keyword.
@leoseg leoseg force-pushed the Feature-Filter-Keywords-3710 branch from 3524070 to fc49f01 Compare March 10, 2025 09:39
.values(post_keyword_block_form)
.get_result::<Self>(conn)
.await
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used in a test, you can replace it with update(). And the unblock method below is completely unused.

#[cfg_attr(feature = "full", derive(TS))]
#[cfg_attr(feature = "full", ts(export))]
pub struct BlockKeywordForPost {
pub keywords_to_block: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be here, but it should be part of the SaveUserSettings struct in crates/api_common/src/person.rs .

Look at its discussion_languages field for an example of a vector update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this code should be in crates/api/src/local_user/save_settings.rs , because its a user setting.

Try to extract most of the checking to a function also, to keep that function readable.

@@ -0,0 +1,87 @@
use crate::{
newtypes::PersonId,
schema::user_post_keyword_block::dsl::{keyword, person_id, user_post_keyword_block},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't import the dsl use keyword_block::table and keyword_block::{column_name}. Check the other db schema impl files for examples.

Comment on lines +38 to +41
conn
.build_transaction()
.run(|conn| {
Box::pin(async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this transaction, as we realized that using transactions within DB code is a no no, and the transactions should be done at the highest level (the API level). Otherwise we can get untraceable errors from embedded transactions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this should be fine, see #5480 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying @Nutomic's comment: .build_transaction().run(...) must be changed to .transaction(...) to make it fine

Comment on lines +29 to +37
let current = UserPostKeywordBlock::for_person(&mut conn.into(), for_person_id).await?;
if current
.iter()
.map(|obj| obj.keyword.clone())
.collect::<Vec<_>>()
== keywords_to_block_posts
{
return Ok(());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this block, its pointless. Just delete then update no matter what. The on_conflict_do_nothing below should handle everything fine.

If they passed Some(keywords_to_block) we can assume they want to update them.

@@ -10,6 +10,9 @@ use strum::{Display, EnumIter};
#[non_exhaustive]
// TODO: order these based on the crate they belong to (utils, federation, db, api)
pub enum LemmyErrorType {
BlockKeywordLimitReached,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a limit, right?

@@ -10,6 +10,9 @@ use strum::{Display, EnumIter};
#[non_exhaustive]
// TODO: order these based on the crate they belong to (utils, federation, db, api)
pub enum LemmyErrorType {
BlockKeywordLimitReached,
BlockKeywordToShort,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockKeywordTooShort

@@ -10,6 +10,9 @@ use strum::{Display, EnumIter};
#[non_exhaustive]
// TODO: order these based on the crate they belong to (utils, federation, db, api)
pub enum LemmyErrorType {
BlockKeywordLimitReached,
BlockKeywordToShort,
BlockKeywordToLong,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TooLong

@@ -0,0 +1,6 @@
CREATE TABLE user_post_keyword_block (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table name doesn't look like any other one.

  • Look at local_user_language for an example.
  • It should be named local_user_keyword_block , as its a local user setting, not a person field that we need to worry about federation.
  • Its columns should be local_user_id, keyword (in that order)
  • I don't think we need to reference that this applies only to posts : keyword_block is enough. In the future we might want this to apply to comments and other things anyway.
  • The performance of this scares me, and I don't think there's really a way we can make lower(post.title) not similar to '%(keyword_1|keyword_2|...)%' performant. But I'm still willing to merge it and hack on that later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "REFERENCES person" becomes "REFERENCES local_user"

@@ -355,7 +356,8 @@ pub fn config(cfg: &mut ServiceConfig, rate_limit: &RateLimitCell) {
scope("/block")
.route("/person", post().to(user_block_person))
.route("/community", post().to(user_block_community))
.route("/instance", post().to(user_block_instance)),
.route("/instance", post().to(user_block_instance))
.route("/post_keywords", post().to(user_block_keyword_for_posts)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, as the keywords to block will be included as part of save_user_settings

@@ -34,3 +34,4 @@ dev_pgdata/

# database dumps
*.sqldump
tmp.schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably unnecessary. If you're locally running the commands from the "check_diesel_schema" CI step, then skip the cp and diff commands.

local_user_view: LocalUserView,
) -> LemmyResult<Json<SuccessResponse>> {
for keyword in &data.keywords_to_block {
let trimmed = keyword.trim();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a Vec of trimmed keywords, and use it in both the length check and the call to UserPostKeywordBlock::update.

Comment on lines +38 to +41
conn
.build_transaction()
.run(|conn| {
Box::pin(async move {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifying @Nutomic's comment: .build_transaction().run(...) must be changed to .transaction(...) to make it fine

Comment on lines +22 to +24
if data.keywords_to_block.len() >= 15 {
Err(LemmyErrorType::BlockKeywordLimitReached)?;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how length is checked in other places, and this will also change the limit to 1000 which is good. The newly created error type must also be removed.

Suggested change
if data.keywords_to_block.len() >= 15 {
Err(LemmyErrorType::BlockKeywordLimitReached)?;
}
if data.keywords_to_block.len() > MAX_API_PARAM_ELEMENTS {
Err(LemmyErrorType::TooManyItems)?;
}

@@ -0,0 +1,6 @@
CREATE TABLE user_post_keyword_block (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "REFERENCES person" becomes "REFERENCES local_user"

@leoseg
Copy link
Contributor Author

leoseg commented Mar 14, 2025

Thanks for all the feedback! I'll look into it soon and update the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants