-
-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature filter keywords 3710 #5263
base: main
Are you sure you want to change the base?
Conversation
What happens if you block "X"? Will it match "Xylophone"? What about "x.com"? Unlike person and community blocks this seems a bit more ambiguous. |
In the moment yes, because the ilike searches for case insensitive substrings. |
You could also consider a minimum length of eg 3 characters for each blocked string. And you need to make sure that the checks are passing (click the details link at the bottom of this page). See .woodpecker.yml in the repo for the commands it runs, so you can check and fix them locally. |
yes this sounds more simple and in the Connect Lenny app they do it like this. I will add the restriction, fix checks add unit tests when I have time at the beginning of the next year. |
…nding functions in diesel
0441a7d
to
3d1e7d5
Compare
I apologize for being slow to review this, ping me or request review when you'd like me to take a look. |
# Conflicts: # crates/db_views/src/post/post_view.rs # src/api_routes_v3.rs
src/api_routes_v3.rs
Outdated
.route("/report/resolve", put().to(resolve_post_report)), | ||
.route("/report/resolve", put().to(resolve_post_report)) | ||
.route("/site_metadata", get().to(get_link_metadata)) | ||
.route("/block", post().to(user_block_keyword_for_posts)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont change api v3, this is only kept for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the changes
src/api_routes_v4.rs
Outdated
@@ -355,7 +356,8 @@ pub fn config(cfg: &mut ServiceConfig, rate_limit: &RateLimitCell) { | |||
scope("/block") | |||
.route("/person", post().to(user_block_person)) | |||
.route("/community", post().to(user_block_community)) | |||
.route("/instance", post().to(user_block_instance)), | |||
.route("/instance", post().to(user_block_instance)) | |||
.route("/post", post().to(user_block_keyword_for_posts)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about /account/block/post_keywords
?
@@ -0,0 +1,3 @@ | |||
ALTER TABLE post_keyword_block | |||
ALTER COLUMN keyword TYPE varchar(50); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should only be a single migration per pull request. For the table name user_post_keyword_block
is clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the new migration and altered the first migration.
id serial PRIMARY KEY, | ||
keyword varchar(255) NOT NULL, | ||
person_id int REFERENCES person (id) ON UPDATE CASCADE ON DELETE CASCADE NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a unique constraint for (keyword, person_id)
. You also might be able to use that as primary key and get rid of id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
crates/db_schema/src/schema.rs
Outdated
diesel::table! { | ||
post_keyword_block (id) { | ||
id -> Int4, | ||
#[max_length = 20] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This length doesnt match the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixxed
crates/db_schema/src/newtypes.rs
Outdated
#[derive(Debug, Copy, Clone, Hash, Eq, PartialEq, Serialize, Deserialize, Default)] | ||
#[cfg_attr(feature = "full", derive(DieselNewType, TS))] | ||
#[cfg_attr(feature = "full", ts(export))] | ||
/// The comment reply id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong copypaste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removed because id is replaced by compose primary key
PostKeywordBlock::unblock_keyword(&mut context.pool(), &post_keyword_block_form).await?; | ||
} | ||
Ok(Json(SuccessResponse::default())) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of blocking or unblocking keywords one by one, you could also update the whole blocklist at once (just like we do for user languages). Not sure which approach is preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, should be more easy for the Frontend to implement, updated that this way.
sql::<Bool>("NOT (post.body LIKE ANY(") | ||
.bind::<Array<Text>, _>(transformed_strings.clone()) | ||
.sql("))"), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this raw sql you can build a regex like (keyword1|keyword2|...)
and filter with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would be less performant, as the regex string could be a bit long when there are for example 15 keywords? If i would do that would this be with like and than the regex (keyword1|keyword2|...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, cc @dessalines @dullbananas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to use not similar to:
- https://docs.rs/diesel/latest/diesel/expression_methods/trait.PgTextExpressionMethods.html#method.not_similar_to
- https://stackoverflow.com/a/4928155/1655478
Regex would def be slower than this. But yeah none of this custom binding is necessary. Also extract the building of this string to its own function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use this loop:
for keyword in blocked_keywords {
let pattern = format!("%{}%", s);
query = query.filter(post::name.not_like(pattern));
query = query.filter(post::url.not_like(pattern));
query = query.filter(post::body.not_like(pattern));
}
Also, it should probably be ilike instead of like
…nged of api path in v4
…block and replaced id as primary key with the composed key person_id and keyword. Also now use update function which updates the blocked keywords for an user with a new list instead keyword by keyword.
3524070
to
fc49f01
Compare
.values(post_keyword_block_form) | ||
.get_result::<Self>(conn) | ||
.await | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in a test, you can replace it with update()
. And the unblock method below is completely unused.
#[cfg_attr(feature = "full", derive(TS))] | ||
#[cfg_attr(feature = "full", ts(export))] | ||
pub struct BlockKeywordForPost { | ||
pub keywords_to_block: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here, but it should be part of the SaveUserSettings
struct in crates/api_common/src/person.rs
.
Look at its discussion_languages
field for an example of a vector update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this code should be in crates/api/src/local_user/save_settings.rs
, because its a user setting.
Try to extract most of the checking to a function also, to keep that function readable.
@@ -0,0 +1,87 @@ | |||
use crate::{ | |||
newtypes::PersonId, | |||
schema::user_post_keyword_block::dsl::{keyword, person_id, user_post_keyword_block}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't import the dsl
use keyword_block::table
and keyword_block::{column_name}
. Check the other db schema impl files for examples.
conn | ||
.build_transaction() | ||
.run(|conn| { | ||
Box::pin(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this transaction, as we realized that using transactions within DB code is a no no, and the transactions should be done at the highest level (the API level). Otherwise we can get untraceable errors from embedded transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this should be fine, see #5480 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying @Nutomic's comment: .build_transaction().run(...)
must be changed to .transaction(...)
to make it fine
let current = UserPostKeywordBlock::for_person(&mut conn.into(), for_person_id).await?; | ||
if current | ||
.iter() | ||
.map(|obj| obj.keyword.clone()) | ||
.collect::<Vec<_>>() | ||
== keywords_to_block_posts | ||
{ | ||
return Ok(()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this block, its pointless. Just delete then update no matter what. The on_conflict_do_nothing below should handle everything fine.
If they passed Some(keywords_to_block)
we can assume they want to update them.
@@ -10,6 +10,9 @@ use strum::{Display, EnumIter}; | |||
#[non_exhaustive] | |||
// TODO: order these based on the crate they belong to (utils, federation, db, api) | |||
pub enum LemmyErrorType { | |||
BlockKeywordLimitReached, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a limit, right?
@@ -10,6 +10,9 @@ use strum::{Display, EnumIter}; | |||
#[non_exhaustive] | |||
// TODO: order these based on the crate they belong to (utils, federation, db, api) | |||
pub enum LemmyErrorType { | |||
BlockKeywordLimitReached, | |||
BlockKeywordToShort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlockKeywordTooShort
@@ -10,6 +10,9 @@ use strum::{Display, EnumIter}; | |||
#[non_exhaustive] | |||
// TODO: order these based on the crate they belong to (utils, federation, db, api) | |||
pub enum LemmyErrorType { | |||
BlockKeywordLimitReached, | |||
BlockKeywordToShort, | |||
BlockKeywordToLong, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TooLong
@@ -0,0 +1,6 @@ | |||
CREATE TABLE user_post_keyword_block ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table name doesn't look like any other one.
- Look at
local_user_language
for an example. - It should be named
local_user_keyword_block
, as its a local user setting, not a person field that we need to worry about federation. - Its columns should be
local_user_id, keyword
(in that order) - I don't think we need to reference that this applies only to posts :
keyword_block
is enough. In the future we might want this to apply to comments and other things anyway. - The performance of this scares me, and I don't think there's really a way we can make
lower(post.title) not similar to '%(keyword_1|keyword_2|...)%'
performant. But I'm still willing to merge it and hack on that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, "REFERENCES person" becomes "REFERENCES local_user"
@@ -355,7 +356,8 @@ pub fn config(cfg: &mut ServiceConfig, rate_limit: &RateLimitCell) { | |||
scope("/block") | |||
.route("/person", post().to(user_block_person)) | |||
.route("/community", post().to(user_block_community)) | |||
.route("/instance", post().to(user_block_instance)), | |||
.route("/instance", post().to(user_block_instance)) | |||
.route("/post_keywords", post().to(user_block_keyword_for_posts)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed, as the keywords to block will be included as part of save_user_settings
@@ -34,3 +34,4 @@ dev_pgdata/ | |||
|
|||
# database dumps | |||
*.sqldump | |||
tmp.schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably unnecessary. If you're locally running the commands from the "check_diesel_schema" CI step, then skip the cp and diff commands.
local_user_view: LocalUserView, | ||
) -> LemmyResult<Json<SuccessResponse>> { | ||
for keyword in &data.keywords_to_block { | ||
let trimmed = keyword.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a Vec of trimmed keywords, and use it in both the length check and the call to UserPostKeywordBlock::update.
conn | ||
.build_transaction() | ||
.run(|conn| { | ||
Box::pin(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarifying @Nutomic's comment: .build_transaction().run(...)
must be changed to .transaction(...)
to make it fine
if data.keywords_to_block.len() >= 15 { | ||
Err(LemmyErrorType::BlockKeywordLimitReached)?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how length is checked in other places, and this will also change the limit to 1000 which is good. The newly created error type must also be removed.
if data.keywords_to_block.len() >= 15 { | |
Err(LemmyErrorType::BlockKeywordLimitReached)?; | |
} | |
if data.keywords_to_block.len() > MAX_API_PARAM_ELEMENTS { | |
Err(LemmyErrorType::TooManyItems)?; | |
} |
@@ -0,0 +1,6 @@ | |||
CREATE TABLE user_post_keyword_block ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, "REFERENCES person" becomes "REFERENCES local_user"
Thanks for all the feedback! I'll look into it soon and update the PR. |
Option to block keyswords so no post containing this keywords in their name, body or url are returned, keywords are stored in seperate table related to person