-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make remote follow username regex comply to rfc 7565 #3528
Make remote follow username regex comply to rfc 7565 #3528
Conversation
The failing unit test here is also failing on main, I have raisedan issue and it looks like that's already being fixed on #3431. @mouse-reeve I could use your input on the other two issues. The reason it's so long is because I'm capturing the entire RFC7565 allowed character set, but I see your regex for local names is only alpha numbeic plus dashes, dots, and underscores. If your intent for Bookwyrm is to only allow those (Mastodon does similar) then easy enough I'll just trim the regex down to that. |
@timothyjrogers Thanks for looking at this. I'm certainly not a regex expert, but a couple of things that may be helpful and/or need to be made consistent: Our class UsernameField(ActivitypubFieldMixin, models.CharField):
"""activitypub-aware username field"""
def __init__(self, activitypub_field="preferredUsername", **kwargs):
self.activitypub_field = activitypub_field
super(ActivitypubFieldMixin, self).__init__(
_("username"),
max_length=150,
unique=True,
validators=[validate_username],
error_messages={
"unique": _("A user with that username already exists."),
},
) GitHub is also complaining that your regex allows an unlimited number of dashes, so if it's possible to put a character limit on the regex check itself that would be handy. You'll also notice that def validate_username(value):
"""make sure usernames look okay"""
if not re.match(r"^[A-Za-z\-_\.0-9]+@[A-Za-z\-_\.0-9]+\.[a-z]{2,}$", value):
raise ValidationError(
_("%(value)s is not a valid username"),
params={"value": value},
) To my eyes this looks a little more similar to yours than to the current regex in the follow view, so maybe we just need to align them? |
…emote-follow-username-regex-comply-to-rfc-7566
…f github.com:timothyjrogers/bookwyrm into make-remote-follow-username-regex-comply-to-rfc-7565
Thank you for the feedback @hughrun and @ilkka-ollakka , excellent points. I did not see the username limit in the model, I've clamped the regex so it will only capture 150 characters or less. I also moved the regex over into the regex utility, and split it up into multiline strings to make it easier to read. I do generally agree with @ilkka-ollakka that there is probably a cleaner way to do this -- clamping the length on that regex will still persist this problem if there was a user with a 151+ character username, maybe on an instance running a modified fork or something. Some extra parsing and validating logic wouldn't go amiss in the follow controller, but that's a much less likely scenario and this bug was just about usernames being chopped at certain characters so I think maybe a separate issue could be logged to improve the validation in general? In any case I've tightened it up enough to make CodeQL and the linter happy, and confirmed locally that it will recognize remote usernames compliant to the spec and the character limit. If we want to address additional validation here I'm happy to do so, otherwise we can log another issue. Thanks again! |
Thanks @timothyjrogers I've just realised that I didn't properly think through your original question:
The answer to this is definitely "yes, the intent is to limit the character-set", though we may be able to be slightly more generous. Whilst we'd prefer to be compliant with standards where possible, in this case the generosity of RFC7565 is in conflict with the limitations of RFC3986, because usernames become part of URIs for various things. That means we can't allow any of these characters reserved in RFC3986 for usernames because they will potentially break the URIs:
Because a user account is created for every remote user when they are followed, that means we can't allow these characters in usernames anywhere. My apologies for not picking up on this earlier. |
Thanks @hughrun ! That makes sense. Given that, I have restricted the remote name regex as described above to just |
Description
This change updates the username parsing for remote follows, which is based on the acct: property of a Fediverse user, to allow more than just alphanumeric characters and instead allow for all characters allowed for in RFC 7565.
What type of Pull Request is this?
Does this PR change settings or dependencies, or break something?
Details of breaking or configuration changes (if any of above checked)
Documentation
Tests