Skip to content

Regex matching in a rope #23

@muxcmux

Description

@muxcmux

Hi, this isn't an issue but more of an ask for some pointers on how to make finding regex matches in a crop::Rope work.

I checked out the regex crate but its api seems to be mainly designed around working with &strs . They recommended to use the lower level regex-automata when:

You want to use the lower level search APIs. For example, both the lazy DFA and fully compiled
DFAs support searching by exploring the automaton one state at a time. This might be useful, for
example, for stream searches or searches of strings stored in non-contiguous in memory.

This is what Helix editor is doing for regex search/replace. They have created a regex-cursor lib and in it they also provide an implementation of their Cursor trait for use with ropey. From what I can tell, this is used to create a regex Input with their RopeyCursor (which impls said Cursor trait) from a RopeSlice. I had a look at the implementation and it seems quite straightforward.

It takes a chunks iterator (also available in crop) and then just stores internally a reference to the current chunk's bytes and offset, changing them as advance and backtrack methods are called on the cursor. To advance the cursor, it delegates to the chunk iterator's next method, but in order to backtrack the cursor, it calls a prev() method on ropey's chunk iterator.

And this is where I got stuck. I tried to create a similar CropCursor in my program which implements the Cursor trait from the regex-cursor crate, but I don't know how to "move backwards" crop's Chunks iterator in a similar way to ropey's one.

Internally it delegates to Tree -> Leaves, and from there, I can't tell what is going on without studying crop's code in more detail.

Ropey also has a public method which can return a chunk at a byte offset, which seems to be used in Helix to match a regex on a partial document.

So I was hoping you could shed some light on what would be a sensible approach to implement Regex matching on a Rope. Should I simply .to_string() the rope, pass that to a Regex and call it a day? Or should I pursue this effort of implementing the Cursor trait from regex-cursor, in which case how do I go about moving crop's Chunk iterator backwards, and getting a chunk at a byte offset? Or maybe there is another way to do this, which I haven't considered?

Either way, thank you for your time and for your work on this library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions