-
Notifications
You must be signed in to change notification settings - Fork 9
refactor(go-segmenter): replace custom GoSegmenter with Tree-Sitter implementation #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rh-aiq-main
Are you sure you want to change the base?
refactor(go-segmenter): replace custom GoSegmenter with Tree-Sitter implementation #138
Conversation
4b877d9 to
a627169
Compare
|
Hi @vbelouso, Can you please rebase and resolve conflicts before i'm starting reviewing it? |
…itter implementation Signed-off-by: Vladimir Belousov <[email protected]>
a627169 to
5a756be
Compare
Done |
| return re.search("[A-Z][a-z0-9-]*", function_name) | ||
| return bool(re.search("[A-Z][a-z0-9-]*", function_name)) | ||
|
|
||
| def get_function_name(self, function: Document) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbelouso This is an example of something that is not working correctly ( the example test is failing) , get_function_name should return the variable name containing the anonymous function.
@pytest.mark.asyncio
async def test_transitive_search_golang_generic():
parser = GoLanguageFunctionsParser()
doc1 = Document(page_content=("greet := func() { // Assigning anonymous function to a variable 'greet'\n"
" fmt.Println(\"Greetings from a variable-assigned anonymous function!\")\n"
" }"))
name = parser.get_function_name(doc1)
print(f"name_of_function={name}")
assert name == "greet"Your revised GoSegmenter with TreeSitter parse the anonymous function assigned to a variable correctly,
But instead of taking the name of the variable in this case, it return :=, which is incorrect , please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zvigrinberg
Updated.
I also increased the number of test cases.
Signed-off-by: Vladimir Belousov <[email protected]>
Signed-off-by: Vladimir Belousov <[email protected]>
5310848 to
9261a8c
Compare

Summary
This PR replaces the legacy regex-based Go segmenter with a native Tree-Sitter parser (GoSegmenterExtended), enabling syntax-aware extraction of Go functions, methods, anonymous functions, and types with deterministic and reproducible results.
Rationale
Implementation highlights
Architectural impact
The segmentation layer now uses a structured syntax tree (Tree-Sitter) instead of regex parsing.
Downstream modules such as ChainOfCallsRetriever and function analyzers still operate on text chunks, but now those chunks are syntactically well-formed and consistent across runs.
This refactor lays the foundation for future AST-based semantic analysis (e.g. variable/type inference, symbol resolution).
Benchmark
Tested on https://github.com/openshift/origin with 35001 Go files