Skip to content

Conversation

@KristofferC
Copy link
Member

@KristofferC KristofferC commented Nov 6, 2025

I quite often get the whole history search filled with duplicates which feels like low signal ratio. This only keep uniques:

Before vs after

Screenshot 2025-11-06 at 21 27 19 Screenshot 2025-11-06 at 21 27 26

cc @tecosaur @topolarity

Tests written by Claude Code 🤖

@KristofferC KristofferC requested a review from tecosaur November 6, 2025 20:37
@KristofferC KristofferC added REPL Julia's REPL (Read Eval Print Loop) backport 1.13 labels Nov 6, 2025
@Keno
Copy link
Member

Keno commented Nov 6, 2025

How do the up and down arrow keys work with this. I sometimes deliberately choose a different identical copy not for the contents, but for the place in history (no pun intended).

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

So, I've already got something like this in #59953 (now that I look, it may not be pushed though: I brought in some commits from another PR to try building on, so I'll need to do some cherry-picking). There's a key difference in the approaches we've taken though, and I wonder which behaviour we actually want: here we use an allocated Set{String} to have only the first instance of a history entry be shown, in each processed chunk (chunks do not have a fixed on consistent size).

By contrast, I use the history content + mode + status to deduplicate consecutive entries. At the moment, I feel like this is probably the preferable behaviour, since I also like the history as a log of what I've done that I can look at, and this preserves that aspect better while getting rid of consecutive duplicates. I also like that the implementation is O(n) instead of O(n^2), particularly as someone with 200k history entries 🙂.

@KristofferC
Copy link
Member Author

@tecosaur If you have an alternative implementation that you say perform better please put it up so it can actually be benchmarked against. And please make it a separate change instead of in the amalgamation PR.

I sometimes deliberately choose a different identical copy not for the contents, but for the place in history

Can you elaborate on this. You write ENV[" and you see 15 ENV["JULIA_PKG_AUTO_PRECOMPILE"] and you pick the sixth? I can make a keybind to disable the duplication?

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

If you have an alternative implementation that you say perform better please put it up so it can actually be benchmarked against. And please make it a separate change instead of in the amalgamation PR.

Yea, I'll pick the bugfixes and tweaks off that so we can discuss/merge them separately to the expanded history format. Let's me know if you have a preference between a "minor changes" PR vs. individual fix/tweak PRs.

@Keno
Copy link
Member

Keno commented Nov 7, 2025

Can you elaborate on this. You write ENV[" and you see 15 ENV["JULIA_PKG_AUTO_PRECOMPILE"] and you pick the sixth? I can make a keybind to disable the duplication?

Usually not the sixth, but the second from the bottom because I messed up whatever sequence I was testing in the latest iteration and want to start over with the previous one (by hitting down arrow after running the line)

@KristofferC
Copy link
Member Author

and want to start over with the previous one (by hitting down arrow after running the line)

I'm not sure that feature exists any more with the new history search. But note that you can select multiple entries (with tab) so if you want a set of changes you can just bring all of those in in one go.

@KristofferC
Copy link
Member Author

Filtering out repeats seems to at least be done for whatever default behavior I have with zsh + fzf.

@KristofferC KristofferC changed the title only show unique entries in history search only show unique entries in history search when filtering Nov 7, 2025
@KristofferC
Copy link
Member Author

I changed this to not remove duplicates when you are not filtering in case you want to go and tab-collect a collection of consecutive ones them earlier use.

Also fixed not creating a Set inside the function that is called repeatedly when collecting entries to show.

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

Three things:

  • I don't think this should deduplicate the same content in different modes
  • Can't we have seen::Set{String}?
  • I'd rather reuse a Set{String} than create a new one every time (I'd like to also cycle through nine different vectors for the search, but that's trickier)

- use a consistent type for `seen` and pass in if we should deduplicate
- reuse a set accross searches
@KristofferC
Copy link
Member Author

updated based on that

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

One further comment: When !isfiltering we could just skip calling filterchunkrev! at all (the final history is hist) and set filter_idx to 0, allowing us to change seen from a Union{Set{...}, Nothing} to just be a Set, and drop the isnothing logic within filterchunkrev!.

@KristofferC
Copy link
Member Author

allowing us to change seen from a Union{Set{...}, Nothing} to just be a Set, and drop the isnothing logic within filterchunkrev!.

I did do that change already (but not in the way you described).

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

I have a feeling my comments came (i.e. were written) mid force-pushed changes, I'll have a look at the latest.

@tecosaur
Copy link
Member

tecosaur commented Nov 7, 2025

Hmm, I know filterchunkrev! is a little bit of a lost cause in terms of the number of arguments (it's never going to be a nice 2/3 argument function), but I do think it would be nice to minimise the number of arguments it takes, to the extent possible.

To this end, I'd advocate for a signature dropping deduplicate::Bool like

filterchunkrev!(state::SelectorState, candidates::DenseVector{HistEntry}, seen::Set{Tuple{Symbol, String}}, idx::Int = length(candidates);
                maxtime::Float64 = Inf, maxresults::Int = length(candidates))

and skipping filterchunkrev! when no filtering is needed.

It would also allow us to drop the isfiltering state from persisting outside while loop runs if we re-use the implication of filter_idx == 0 that no filtering is needed.

Unless there's something I've missed?

Edit: I'm thinking something like this:

cands_current = hist
if isempty(filter_spec.exacts) && isempty(filter_spec.negatives) &&
    isempty(filter_spec.regexps) && isempty(filter_spec.modes)
    # No filtering needed, show all history
    filter_idx = 0
    append!(state.candidates, cands_current)
else
    # Find the most strict candidate list available
    for (cond, cands) in Iterators.reverse(cands_cache)
        if ismorestrict(cands_cond, cond)
            cands_current = cands
            break
        end
    end
    # Start filtering candidates
    empty!(filter_seen)
    filter_idx = filterchunkrev!(
        state, cands_current, seen;
        maxtime = time() + 0.01,
        maxresults = outsize[1])
end

@KristofferC
Copy link
Member Author

👍

Copy link
Member

@tecosaur tecosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with how this all looks now, thanks for putting up with my nitpicks!

@KristofferC
Copy link
Member Author

Thanks for review!

@ghyatzo
Copy link
Contributor

ghyatzo commented Nov 7, 2025

Cross posting from Zulip:
Thinking about this, what about having only consecutive items being filtered? So sequences like: A B C A B D A E C still show the call progression with its variations instead of just A B C D E which loses a lot of information regarding the context of D and E. While it would filter correctly sequences like A B A B A B if I filter only for A or B.

Basically Run length encoding the (currently shown, filtered or not) history. Independent in behavior if I am filtering or not. Same behavior, but auto off normally and auto on when filtering (which can then be toggled and configured (how many entries to show before collapsing)).

It would only need a last_entry and a counter for how many consecutive times you've seen it.

I know it would be a rather big change and Maybe It's too late and this train is gone, but in any case thanks for this work on the REPL 😊

@KristofferC
Copy link
Member Author

KristofferC commented Nov 7, 2025

I will merge this because I think it is an improvement over status quo but it could be modified in the future. Regarding what you wrote,

Think about the following history (each line is an entry):

ENV["JULIA_PRECOMPILE_AUTO"] = 1
a bunch of lines here
bla bla
print("this is a string that happens to have the word `env` in it")
some other lines
bla bla
ENV["JULIA_PRECOMPILE_AUTO"] = 1

and

ENV["JULIA_PRECOMPILE_AUTO"] = 1
a bunch of lines here
bla bla
print("this is a string that happens to not have it in it")
some other lines
bla bla
ENV["JULIA_PRECOMPILE_AUTO"] = 1

Now you filter on env. Why should the print line in the middle decide if ENV["JULIA_PRECOMPILE_AUTO"] = 1 is printed twice in the output? It has no relation to the other ENV entries except they both happen to be shown by a fuzzy search for "env".

@KristofferC KristofferC merged commit 6058082 into master Nov 7, 2025
5 of 7 checks passed
@KristofferC KristofferC deleted the kc/unique_history branch November 7, 2025 17:37
KristofferC added a commit that referenced this pull request Nov 10, 2025
Co-authored-by: KristofferC <[email protected]>
(cherry picked from commit 6058082)
KristofferC added a commit that referenced this pull request Nov 11, 2025
Co-authored-by: KristofferC <[email protected]>
(cherry picked from commit 6058082)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 1.13 REPL Julia's REPL (Read Eval Print Loop)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants