Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behavior of Str::slug in particular language #54444

Closed
kenan-recebli opened this issue Feb 2, 2025 · 8 comments · May be fixed by #54446
Closed

Weird behavior of Str::slug in particular language #54444

kenan-recebli opened this issue Feb 2, 2025 · 8 comments · May be fixed by #54446

Comments

@kenan-recebli
Copy link

kenan-recebli commented Feb 2, 2025

Laravel Version

11.41.3

PHP Version

8.3.16

Database Driver & Version

No response

Description

I think it's better just to show:

Image

First, I couldn't realize why slug becomes longer, then noticed extra letters. Then tried to manually delete letters to find out letter combinations that cause this issue.

Maybe it happens in other languages too, but I tested some languages, they all return nad.

Maybe there is are other combinations also, that behaves weird, but I didn't test.

First, I thought that it's because of the ascii method, but no. I couldn't find the source of this issue.

UPDATE
Find out a little bit more combinations (in the second example letter is trimmed):

Image

Steps To Reproduce

Just do:
str('nad')->slug('az')
str('yab')->slug('az')
str('ab')->slug('az')

@rodrigopedra
Copy link
Contributor

The first parameter is not the language, but the separator.

You can use named parameters to specify a language while using the default separator:

$ php artisan tinker
Psy Shell v0.12.7 (PHP 8.2.27 — cli) by Justin Hileman
> str('nad')->slug(language: 'az')
= Illuminate\Support\Stringable {#6612
    value: "nad",
  }

> str('nad nad')->slug(language: 'az')
= Illuminate\Support\Stringable {#6637
    value: "nad-nad",
  }

@andreich1980
Copy link
Contributor

@rodrigopedra But it does not explain how is str('nad')->slug(separator: 'az') equal to nazd

@rodrigopedra
Copy link
Contributor

This line here, on the Str@slug() method, does this odd behavior:

// Replace all separator characters and whitespace by a single separator
$title = preg_replace('!['.preg_quote($separator).'\s]+!u', $separator, $title);

When the separator has more than one character, all of its characters are added to the regular expression's character class.

So the expression on the line above will evaluate to:

preg_replace('![az\s]+!u', 'az', 'nad')

Which means it will replace any letters a, z, or any blank characters (\s), by the sequence az (the separator).

Thus, as nad has a a character, this gets replaced by az, then becoming nazd.

You can try with this simple script:

<?php

$title = 'nad';
$separator = 'az';

$title = preg_replace('!['.preg_quote($separator).'\s]+!u', $separator, $title);

echo $title, \PHP_EOL;

One could propose a PR to assert a separator only has a single character.

On the other hand, people could be relying on the current behavior to generate strings like this:

$ php artisan tinker
Psy Shell v0.12.7 (PHP 8.2.27 — cli) by Justin Hileman
> str('PHP is great!')->slug('-=-')
= Illuminate\Support\Stringable {#6609
    value: "php-=-is-=-great",
  }

@shaedrich
Copy link
Contributor

Multiple characters might not be a problem, as long as they are proper delimiters 🤔

@kenan-recebli
Copy link
Author

The first parameter is not the language, but the separator.

You can use named parameters to specify a language while using the default separator:

$ php artisan tinker
Psy Shell v0.12.7 (PHP 8.2.27 — cli) by Justin Hileman
> str('nad')->slug(language: 'az')
= Illuminate\Support\Stringable {#6612
    value: "nad",
  }

> str('nad nad')->slug(language: 'az')
= Illuminate\Support\Stringable {#6637
    value: "nad-nad",
  }

Oh, you're right. I don't remember when I last used this method, until now (wrote it once in Sluggable trait, where I passed all arguments, and that's all). I didn't even notice that I should pass language as the second argument. An outside perspective always helps :)

@kenan-recebli
Copy link
Author

@rodrigopedra But it does not explain how is str('nad')->slug(separator: 'az') equal to nazd

Yeah, maybe if it worked like expected, without any replace, because the input doesn't have any spaces, it would help me to notice the source of the issue.

@vlakoff
Copy link
Contributor

vlakoff commented Feb 4, 2025

By the way, please note that the method is only designed for single-character separators, mainly due to the regex character classes.

I considered suggesting support for multi-character separators, but it would be somewhat difficult to implement because of the negated character class (here). There are various ways to solve this, but it would complicate the code, and I don't think it would be worth it.

To add some robustness, we could throw an exception if the provided separator is longer than one character.

@vlakoff
Copy link
Contributor

vlakoff commented Feb 4, 2025

I attempted this implementation of multi-character separators, and as expected, it made the code more complex, harder to understand, and more prone to regression. Therefore, I reaffirm that it is not worth pursuing.

Just as a reminder, the second argument of the final trim() (here) is a list of characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants