Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.hi-IN.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Build the code review graph for this project
| **23 भाषाएं + नोटबुक** | Python, TypeScript/TSX, JavaScript, Vue, Svelte, Go, Rust, Java, Scala, C#, Ruby, Kotlin, Swift, PHP, Solidity, C/C++, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Jupyter/Databricks (.ipynb) |
| **ब्लास्ट-रेडियस विश्लेषण** | दिखाता है कि किसी भी बदलाव से कौन से फ़ंक्शन, क्लासेज़, और फ़ाइलें प्रभावित होती हैं |
| **ऑटो-अपडेट हुक्स** | बिना मैन्युअल हस्तक्षेप के हर फ़ाइल एडिट और git कमिट पर ग्राफ अपडेट होता है |
| **सिमेंटिक सर्च** | sentence-transformers, Google Gemini, MiniMax, या किसी भी OpenAI-compatible एंडपॉइंट (असली OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI) के ज़रिए वैकल्पिक वेक्टर एम्बेडिंग |
| **सिमेंटिक सर्च** | sentence-transformers, Google Gemini, MiniMax, या किसी भी OpenAI-compatible एंडपॉइंट (असली OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI) के ज़रिए वैकल्पिक वेक्टर एम्बेडिंग. फ़ंक्शन body का implementation हिस्सा एम्बेड होता है (signature और शुरुआती docstring/header comments हटा दिए जाते हैं ताकि snippet का token budget real logic पर खर्च हो, documentation पर नहीं). long-context मॉडल्स "कोड क्या करता है" के हिसाब से रैंक कर सकते हैं |
| **इंटरैक्टिव विज़ुअलाइज़ेशन** | सर्च, कम्युनिटी लीजेंड टॉगल, और डिग्री-स्केल्ड नोड्स के साथ D3.js फ़ोर्स-डायरेक्टेड ग्राफ |
| **हब और ब्रिज डिटेक्शन** | betweenness centrality के ज़रिए सबसे ज़्यादा कनेक्टेड नोड्स और आर्किटेक्चरल चोकपॉइंट्स खोजें |
| **सरप्राइज़ स्कोरिंग** | अप्रत्याशित कपलिंग का पता लगाएं: क्रॉस-कम्युनिटी, क्रॉस-लैंग्वेज, पेरीफ़ेरल-टू-हब एज़ेज़ |
Expand Down Expand Up @@ -196,6 +196,8 @@ code-review-graph register <path> # मल्टी-रिपो रजिस
code-review-graph unregister <id> # रजिस्ट्री से रिपो हटाएं
code-review-graph repos # रजिस्टर्ड रिपॉज़िटरीज़ की सूची
code-review-graph eval # मूल्यांकन बेंचमार्क चलाएं
code-review-graph embed # एम्बेडिंग वेक्टर्स कंप्यूट / रिफ्रेश (नए रेपो में body enrichment ऑटो-एक्टिव)
code-review-graph embed --include-body --confirm-reembed # मौजूदा DB: body enrichment को एक्सप्लिसिट opt-in
code-review-graph serve # MCP सर्वर शुरू करें
```

Expand Down Expand Up @@ -286,7 +288,7 @@ export CRG_OPENAI_BATCH_SIZE=100 # टाइट बैच

> **मॉडल चुनने की सलाह।** लंबे समय के उपयोग के लिए `-preview` / `-beta` / `-exp` वाले model ID (जैसे `google/gemini-embedding-2-preview`) से बचें — preview मॉडल्स के वज़न बदल सकते हैं (डाइमेंशन बदलने पर पूरा re-embed करना पड़ेगा) या बिना नोटिस deprecate हो सकते हैं। स्टेबल GA मॉडल्स की सलाह दी जाती है: `text-embedding-3-small` / `text-embedding-3-large` (OpenAI), `Qwen/Qwen3-Embedding-8B` (vLLM / LocalAI सेल्फ-होस्टेड के ज़रिए), या `gemini-embedding-001` (नेटिव Gemini provider के ज़रिए, `GOOGLE_API_KEY` चाहिए).
>
> साथ ही ध्यान दें: वर्तमान में `code-review-graph` केवल **फ़ंक्शन सिग्नेचर** एम्बेड करता है (प्रति नोड ~10 tokens, जैसे `"parse_file function (path: str) returns Tree"`). जिन मॉडल्स की क्वालिटी का मुख्य source लंबे context में function body को समझना है (जैसे Gemini 2 या Qwen3-8B के MTEB-code SOTA स्कोर्स), वे इस इनपुट लंबाई पर छोटे मॉडल्स से कम अंतर दिखाएंगे। Body / docstring एम्बेडिंग को फ़ॉलो-अप एन्हांसमेंट के रूप में ट्रैक किया जा रहा है।
> `code-review-graph` signature के साथ-साथ **फ़ंक्शन body के implementation** हिस्से का truncated snippet भी एम्बेड करता है (signature और शुरुआती docstring/header comments हटा दिए जाते हैं ताकि snippet token budget real logic पर जाए; प्रति प्रोवाइडर char budget: local 700 / google 3000 / minimax 3000 / openai 6000). अब long-context मॉडल्स (Gemini 2, Qwen3-8B) फ़ंक्शन के नाम के बजाय "वह क्या करता है" के आधार पर रैंक कर सकते हैं। नए रेपो में यह अपने आप active हो जाता है; मौजूदा DB legacy signature-only व्यवहार रखते हैं जब तक आप `code-review-graph embed --include-body --confirm-reembed` से opt-in नहीं करते (एक बार का पूरा re-embed क्लाउड प्रोवाइडर्स पर API tokens खर्च कर सकता है — signature-only पर रहने के लिए `CRG_EMBED_INCLUDE_BODY=0` सेट करें).

</details>

Expand Down
6 changes: 4 additions & 2 deletions README.ja-JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ gitコミットやファイル保存のたびにフックが起動します。
| **23言語 + ノートブック** | Python, TypeScript/TSX, JavaScript, Vue, Svelte, Go, Rust, Java, Scala, C#, Ruby, Kotlin, Swift, PHP, Solidity, C/C++, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Jupyter/Databricks (.ipynb) |
| **影響範囲分析** | 変更によって影響を受ける関数、クラス、ファイルを正確に表示 |
| **自動更新フック** | ファイル編集やgitコミットのたびに手動操作なしでグラフを更新 |
| **セマンティック検索** | sentence-transformers、Google Gemini、MiniMax、またはOpenAI互換エンドポイント(本家OpenAI、Azure、new-api、LiteLLM、vLLM、LocalAI)によるオプションのベクトル埋め込み |
| **セマンティック検索** | sentence-transformers、Google Gemini、MiniMax、またはOpenAI互換エンドポイント(本家OpenAI、Azure、new-api、LiteLLM、vLLM、LocalAI)によるオプションのベクトル埋め込み。関数 body の実装部分を埋め込みます(シグネチャと先頭の docstring / ヘッダーコメントは取り除き、snippet の token 予算を実ロジックに回します)。長 context モデルが「コードが何をしているか」でランキングできます |
| **インタラクティブ可視化** | D3.js力学レイアウトグラフ。検索、コミュニティ凡例切替、次数スケーリングノード対応 |
| **ハブ・ブリッジ検出** | 最も接続の多いノードと媒介中心性によるアーキテクチャのボトルネックを発見 |
| **サプライズスコアリング** | 予期しない結合を検出:コミュニティ間、言語間、周辺からハブへのエッジ |
Expand Down Expand Up @@ -198,6 +198,8 @@ code-review-graph register <path> # マルチリポジトリレジストリに
code-review-graph unregister <id> # レジストリからリポジトリを削除
code-review-graph repos # 登録済みリポジトリの一覧表示
code-review-graph eval # 評価ベンチマークの実行
code-review-graph embed # 埋め込みベクトルの計算 / 更新(新規リポジトリはbody enrichmentが自動有効)
code-review-graph embed --include-body --confirm-reembed # 既存DB: body enrichmentを明示的にopt-in
code-review-graph serve # MCPサーバーの起動
```

Expand Down Expand Up @@ -288,7 +290,7 @@ base URLがlocalhost(`127.0.0.1`、`localhost`、`0.0.0.0`、`::1`)を指し

> **モデル選択のヒント。** `-preview` / `-beta` / `-exp` 付きのmodel ID(例:`google/gemini-embedding-2-preview`)は長期運用には避けてください。preview モデルは重みが変更される(次元が変わると全ノード re-embed 必須)か、予告なく deprecate される可能性があります。安定版 GA モデル推奨:`text-embedding-3-small` / `text-embedding-3-large`(OpenAI)、`Qwen/Qwen3-Embedding-8B`(vLLM / LocalAI 自前ホスト経由)、または `gemini-embedding-001`(ネイティブ Gemini provider 経由、`GOOGLE_API_KEY` が必要)。
>
> また注意:現状 `code-review-graph` は**関数シグネチャのみ**を埋め込みます(ノードあたり約10トークン、例:`"parse_file function (path: str) returns Tree"`)。長 context で関数 body を理解する能力で差をつけるモデル(Gemini 2Qwen3-8B の MTEB-code SOTA スコア)は、この入力長では小型モデルとの差がかなり縮まります。Body / docstring 埋め込みはフォローアップ拡張として追跡中です
> `code-review-graph` はシグネチャに加えて**関数 body の実装**部分の切り取り片も埋め込みます(シグネチャと先頭の docstring / ヘッダーコメントは取り除き、snippet の token 予算を実ロジックに割り当てます。プロバイダごとの文字数上限:local 700 / google 3000 / minimax 3000 / openai 6000)。長 context モデル(Gemini 2Qwen3-8B)は「関数が何をしているか」でランク付けでき、名前だけに頼らなくなります。新規リポジトリでは自動で有効;既存 DB はレガシーの signature-only 動作を保持し、`code-review-graph embed --include-body --confirm-reembed` で明示的に opt-in するまで切り替わりません(一度限りの全件 re-embed はクラウドプロバイダで API トークンを消費します。signature-only のままにしたい場合は `CRG_EMBED_INCLUDE_BODY=0`)

</details>

Expand Down
6 changes: 4 additions & 2 deletions README.ko-KR.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ git 커밋이나 파일 저장마다 훅이 실행됩니다. 그래프는 변경
| **23개 언어 + 노트북** | Python, TypeScript/TSX, JavaScript, Vue, Svelte, Go, Rust, Java, Scala, C#, Ruby, Kotlin, Swift, PHP, Solidity, C/C++, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Jupyter/Databricks (.ipynb) |
| **영향 범위 분석** | 변경에 의해 영향 받는 함수, 클래스, 파일을 정확히 보여줍니다 |
| **자동 업데이트 훅** | 수동 개입 없이 파일 편집 및 git 커밋마다 그래프가 업데이트됩니다 |
| **시맨틱 검색** | sentence-transformers, Google Gemini, MiniMax, 또는 OpenAI 호환 엔드포인트(실제 OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI)를 통한 선택적 벡터 임베딩 |
| **시맨틱 검색** | sentence-transformers, Google Gemini, MiniMax, 또는 OpenAI 호환 엔드포인트(실제 OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI)를 통한 선택적 벡터 임베딩. 함수 body의 구현 부분을 임베딩합니다(시그니처와 선두 docstring/헤더 주석은 제거하여 snippet 토큰 예산을 실제 로직에 할당). 긴 context 모델이 "코드가 무엇을 하는지"로 랭킹할 수 있습니다 |
| **인터랙티브 시각화** | 검색, 커뮤니티 범례 토글, 차수 기반 노드 크기 조정이 가능한 D3.js 포스 다이렉티드 그래프 |
| **허브 및 브릿지 감지** | 가장 많이 연결된 노드와 매개 중심성을 통한 아키텍처 병목 지점 탐색 |
| **서프라이즈 스코어링** | 예상치 못한 결합 감지: 커뮤니티 간, 언어 간, 주변부-허브 엣지 |
Expand Down Expand Up @@ -198,6 +198,8 @@ code-review-graph register <path> # 멀티 레포 레지스트리에 저장소
code-review-graph unregister <id> # 레지스트리에서 저장소 제거
code-review-graph repos # 등록된 저장소 목록
code-review-graph eval # 평가 벤치마크 실행
code-review-graph embed # 임베딩 벡터 계산 / 갱신 (신규 repo는 body enrichment 자동 활성화)
code-review-graph embed --include-body --confirm-reembed # 기존 DB: body enrichment 명시적 opt-in
code-review-graph serve # MCP 서버 시작
```

Expand Down Expand Up @@ -288,7 +290,7 @@ base URL이 localhost(`127.0.0.1`, `localhost`, `0.0.0.0`, `::1`)를 가리킬

> **모델 선택 팁.** `-preview` / `-beta` / `-exp` 접미사가 붙은 model ID(예: `google/gemini-embedding-2-preview`)는 장기 운영용으로 피하세요. preview 모델은 가중치가 바뀌거나(차원 변경 시 전체 re-embed 필수) 예고 없이 deprecate될 수 있습니다. 안정 GA 모델 권장: `text-embedding-3-small` / `text-embedding-3-large`(OpenAI), `Qwen/Qwen3-Embedding-8B`(vLLM / LocalAI 자체 호스팅 경유), 또는 `gemini-embedding-001`(네이티브 Gemini provider 경유, `GOOGLE_API_KEY` 필요).
>
> 참고로 현재 `code-review-graph`는 **함수 시그니처만** 임베딩합니다(노드당 약 10 토큰, 예: `"parse_file function (path: str) returns Tree"`). 긴 context로 함수 body를 이해하는 능력으로 차별화되는 모델(Gemini 2 또는 Qwen3-8B의 MTEB-code SOTA 점수)은 이 입력 길이에서 소형 모델과의 품질 차이가 훨씬 좁아집니다. Body / docstring 임베딩은 후속 개선 과제로 추적 중입니다.
> `code-review-graph`는 시그니처에 더해 **함수 body 구현부** 스니펫도 함께 임베딩합니다(시그니처와 선두 docstring/헤더 주석은 제거하고 snippet 토큰 예산을 실제 로직에 할당. 프로바이더별 문자 예산: local 700 / google 3000 / minimax 3000 / openai 6000). 이제 긴 context 모델(Gemini 2, Qwen3-8B)이 함수 이름만이 아니라 "함수가 무엇을 하는지"로 랭킹할 수 있습니다. 새 레포지토리는 자동으로 활성화; 기존 DB는 레거시 signature-only 동작을 유지하며, `code-review-graph embed --include-body --confirm-reembed`로 명시적 opt-in해야 전환됩니다(일회성 전량 re-embed는 클라우드 프로바이더에서 API 토큰을 소비. signature-only로 유지하려면 `CRG_EMBED_INCLUDE_BODY=0`).

</details>

Expand Down
20 changes: 13 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ The blast-radius analysis never misses an actually impacted file (perfect recall
| **23 languages + notebooks** | Python, TypeScript/TSX, JavaScript, Vue, Svelte, Go, Rust, Java, Scala, C#, Ruby, Kotlin, Swift, PHP, Solidity, C/C++, Dart, R, Perl, Lua, Zig, PowerShell, Julia, Jupyter/Databricks (.ipynb) |
| **Blast-radius analysis** | Shows exactly which functions, classes, and files are affected by any change |
| **Auto-update hooks** | Graph updates on every file edit and git commit without manual intervention |
| **Semantic search** | Optional vector embeddings via sentence-transformers, Google Gemini, MiniMax, or any OpenAI-compatible endpoint (real OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI) |
| **Semantic search** | Optional vector embeddings via sentence-transformers, Google Gemini, MiniMax, or any OpenAI-compatible endpoint (real OpenAI, Azure, new-api, LiteLLM, vLLM, LocalAI). Embeds the function body logic (signature + leading docstrings/header comments are stripped so the token budget goes to implementation, not documentation) — long-context models can rank by what code *does* |
| **Interactive visualisation** | D3.js force-directed graph with search, community legend toggles, and degree-scaled nodes |
| **Hub & bridge detection** | Find most-connected nodes and architectural chokepoints via betweenness centrality |
| **Surprise scoring** | Detect unexpected coupling: cross-community, cross-language, peripheral-to-hub edges |
Expand Down Expand Up @@ -258,6 +258,8 @@ code-review-graph register <path> # Register repo in multi-repo registry
code-review-graph unregister <id> # Remove repo from registry
code-review-graph repos # List registered repositories
code-review-graph eval # Run evaluation benchmarks
code-review-graph embed # Compute / refresh embeddings (new repos auto-enable body)
code-review-graph embed --include-body --confirm-reembed # Existing DB: opt in to body enrichment
code-review-graph serve # Start MCP server
```

Expand Down Expand Up @@ -376,12 +378,16 @@ The cloud-egress warning is auto-skipped when the base URL points to localhost
> `gemini-embedding-001` (via the native Gemini provider, which requires
> `GOOGLE_API_KEY` instead of the OpenAI-compatible path).
>
> Also note: `code-review-graph` currently embeds **function signatures only**
> (~10 tokens per node, e.g. `"parse_file function (path: str) returns Tree"`).
> Models whose headline quality comes from long-context body understanding
> (such as Gemini 2 or Qwen3-8B at their MTEB-code SOTA scores) will see a
> much narrower quality gap against smaller models at this input length.
> Body/docstring embedding is tracked as a follow-up enhancement.
> `code-review-graph` also embeds a truncated **function body
> implementation** snippet alongside the signature (signature + leading
> docstrings / header comments are stripped so the snippet tokens go to
> real logic; per-provider char budget: local 700 / google 3000 /
> minimax 3000 / openai 6000). Long-context models (Gemini 2, Qwen3-8B)
> can rank by what a function *does* and not just what it's named. New repositories pick this up automatically;
> existing DBs keep the legacy signature-only behavior until you opt in
> with `code-review-graph embed --include-body --confirm-reembed` (a
> one-time re-embed that may spend API tokens on cloud providers — set
> `CRG_EMBED_INCLUDE_BODY=0` to stay on signatures).

#### Tool Filtering

Expand Down
Loading
Loading