Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split: Fix return type when passing in non-literal and distribute unions of parameter Delimiter #1047

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ghaaj
Copy link
Contributor

@ghaaj ghaaj commented Jan 23, 2025

Currently, passing a non-literal string to Split's parameter S or Delimiter results in unexpected behavior:

import type {Split} from 'type-fest';

type SplitLiteralByNonLiteral = Split<'ABCD', string>;
	// expected:
	//	string[] or more precisely
	//	| ['ABCD'] | ['A', 'B', 'C', 'D'] | ['', ''] | ['', 'BCD']
	//	| ['A', ''] | ['A', 'CD'] | ['AB', ''] | ['AB', 'D']
	//	| ['ABC', ''] | ['A', 'D'] | ['', 'D'] | ['', 'CD']
	// actual: ['A', 'C', '']
type SplitNonLiteralByLiteral = Split<string, 'ABCD'>;
	// expected: string[]
	// actual: [string]
type SplitNonLiteralByEmptyString = Split<string, ''>;
	// expected: string[]
	// actual: []
type SplitNonLiteralByNonLiteral = Split<string, string>;
	// expected: string[]
	// actual: [string]

There is also a problem with Delimiter not being distributed correctly when it is a union type:

import type {Split} from 'type-fest';

type SplitByUnion = Split<'A,B.C,D.E,F.G.H', ',' | '.'>
	// expected: ['A', 'B.C', 'D.E', 'F.G.H'] | ['A,B', 'C,D', 'E,F', 'G', 'H']
	// actual:
	//	| ['A' | 'A,B', 'B.C' | 'B', 'C' | 'C,D', 'D.E' | 'D', 'E' | 'E,F', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'C' | 'C,D', 'D.E' | 'D', 'E' | 'E,F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'C' | 'C,D', 'D.E' | 'D', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'C' | 'C,D', 'E' | 'E,F', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'C' | 'C,D', 'E' | 'E,F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'D.E' | 'D', 'E' | 'E,F', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'D.E' | 'D', 'E' | 'E,F', 'G', 'H']
	//	| ['A' | 'A,B', 'B.C' | 'B', 'D.E' | 'D', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'C' | 'C,D', 'D.E' | 'D', 'E' | 'E,F', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'C' | 'C,D', 'D.E' | 'D', 'E' | 'E,F', 'G', 'H']
	//	| ['A' | 'A,B', 'C' | 'C,D', 'D.E' | 'D', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'C' | 'C,D', 'E' | 'E,F', 'F', 'G', 'H']
	//	| ['A' | 'A,B', 'C' | 'C,D', 'E' | 'E,F', 'G', 'H']

playground

This PR fixes the behavior of Split so that it meets the above requirements.
As for the first one (SplitLiteralByNonLiteral) I adopted the latter, but I don't think it necessarily have to be that way.

@som-sm
Copy link
Collaborator

som-sm commented Jan 23, 2025

As for the first one (SplitLiteralByNonLiteral) I adopted the latter, but I don't think it necessarily have to be that way.

@ghaaj I don't think that's practically very useful, it's much better to simply return string[] in such cases. And it will become overwhelming very quickly:
image

Also, what happens in cases like Split<"aBCd", Uppercase<string>>? At a glance, it seems that you're currently not using Delimiter to generate the resulting sequences. However, to handle cases like these, you'd have to utilize Delimiter, which would probably make the solution even more complicated—for no real benefit.

ghaaj and others added 3 commits January 23, 2025 20:44
* Use `IsStringLiteral` for conditional branches, which is more appropriate than `string extends ..`
* Add test cases where `S` or `Delimiter` is not `string` itself but also not a literal (e.g., Uppercase<string>)

Co-authored-by: Som Shekhar Mukherjee <[email protected]>
…extends true ? ..` into `SplitHelper` to fix ts2589
@som-sm
Copy link
Collaborator

som-sm commented Jan 23, 2025

Looks like Get fails if we return string[] for non-literal cases, because it expects Split<"foo.${number}", "."> to return ["foo", `${number}`] instead of string[].

So, we'd probably have to introduce this change as an option.

@ghaaj
Copy link
Contributor Author

ghaaj commented Jan 23, 2025

type SplitHelper<
	S extends string,
	_Delimiter extends string,
	Accumulator extends string[] = [],
> = S extends `${infer __}`
	? IsStringLiteral<_Delimiter> extends true
		? _Delimiter extends infer Delimiter extends string
			? S extends `${infer Head}${Delimiter}${infer Tail}`
				? SplitHelper<Tail, Delimiter, [...Accumulator, Head]>
				: Delimiter extends ''
					? Accumulator
					: [...Accumulator, S]
			: never
		: string[]
	: string[];

This would probably fix the behavior in the case where S is a template literal type, but I'm not sure that resolving Split<`foo.${string}.bar`, '.'> to ['foo', string, 'bar'] is appropriate.

@som-sm
Copy link
Collaborator

som-sm commented Jan 23, 2025

but I'm not sure that resolving Split<foo.${string}.bar, '.'> to ['foo', string, 'bar'] is appropriate.

Yeah, I also think this isn't the correct behaviour, because foo.${string}.bar could represent something like foo.a.b.c.bar, and splitting it by . wouldn't produce ['foo', string, 'bar'].

But, what initially seemed like a bug now seems like a breaking change, so I think it's better to introduce an option (like strict) with a default value of false for now. Then, in the next major release, we could update it's default value to true.

And, it makes sense to implement this as an option because there may be valid use cases where you'd want Split<`foo.${string}.bar`, '.'> to return ['foo', string, 'bar']. However, those cases should be opt-in rather than the default behaviour.

@som-sm
Copy link
Collaborator

som-sm commented Jan 24, 2025

@ghaaj So, even with strictLiteralChecks set to false, there's some changes to the existing behaviour. For example, Split<string, "foo"> now returns string[] instead of [string], which I think is fine!

However, not sure if we should keep this behaviour consistent with Delimiter as well, like should Split<"a1b", `${number}`> continue to return ["a", "b"]? Maybe we should only return string[] for non-template strings like string or Uppercase<string>, similar to S.

That said, this feels like it's getting overly complicated. I feel we shouldn't change anything when strictLiteralChecks is false, WDYT? And there's no good reason to return ["a", "b"] for Split<"a1b", `${number}`> and return string[] for Split<"aAb", Uppercase<string>>.

@ghaaj
Copy link
Contributor Author

ghaaj commented Jan 25, 2025

Why I restricted Delimiter to string literals only is that it is currently difficult to return correct types.

For example, how should we resolve Split<'a20.25b', `${number}`>? As 2, 0, 5, 20, 0., .2, 25, 20., 0.2, .25, 20.2, 0.25 and 20.25 can all be interpreted as number, the return type may be ['a', 'b'] | ['a', '', 'b'] | ['a', '', '', 'b'] | ['a', '', '', '', 'b'] | ['a', '', '.', '', 'b'] | ...['a', '0.', '5b'] | ['a2', '.25b'] | ['a20.2', 'b'] | ['a', '.25b'] | ['a2', '25b'] | ['a20', '5b'] | ['a20.', 'b'] | ...; it is almost impossible to count them all.
(The first example I gave was wrong, because the behavior was such that unions were not distributed.)
This is also the case with Split<'a_b_cd', `_${string}`>.

I think it's fine as it is now about the behavior when strictLiteralChecks is false. In this case, the breaking changes are as follows:

  • When S is not assignable to a template literal, the return type becomes string[]
  • When Delimiter is not a string literal, the return type becomes string[]

As for the first one, the change is acceptable because I think the behavior so far is a bug (SplitNonLiteralBy* on the first comment).
As for the second one, I've already mentioned it earlier.

@som-sm
Copy link
Collaborator

som-sm commented Jan 26, 2025

Umm...I still have few concerns:

  1. When strictLiteralChecks is false, the goal clearly isn't accuracy but rather adopting a solution with minimal friction. For example, returning ["a", "0.25b"] for Split<'a20.25b', `${number}`> is surely incorrect, but so is returning ["a", `${number}`] for Split<`a.${number}`, ".">. And there's no good reason for why we're lenient in the latter case but not in the former, it's just an assumption that the latter is more practical than the former.

  2. For Split<"aBc", Uppercase<string>>, we're returning string[], even though this case has no ambiguity and there’s only one reasonable result: ["a", "c"]. So, it’s inconsistent to return string[] here and return ["a", `${number}`] in the Split<a.${number}`, "."> case, where there's ambiguity.

  3. Most importantly, since it's unclear which use cases are practical and which are not, I would avoid introducing any breaking changes.

  4. Additionally, properly documenting the behaviour when strictLiteralChecks is false would be really challenging.

While I don’t really have a strong opinion here, I still believe leaving the behaviour unchanged when strictLiteralChecks is false is a better and safer choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants