Skip to content

Conversation

@wingmann2
Copy link

When picking a color for the classification banner, this is always set to white as hardcoded value "#fff". When selecting certain color shades (ex. bright yellow) this banner is unreadable. This change would allow the user to select the color of the text to go along with the paired background color. This may be typically white or black, however this picker allows the user more flexibility with styling and further compliance.

image

Bionic711 and others added 4 commits October 3, 2025 22:50
* fix video indexer auth

* fixed video indexer support

* fixed video indexer support

* fix video indexer support

* fixed video indexer support

* added support for .xlsm

* fixed all scope issue

* added supprot for xml, yaml, and log

#### **JSON**

- Uses `RecursiveJsonSplitter`:
  - `max_chunk_size=600`
  - `convert_lists=True`
- Produces JSON strings that retain original structure.
- See `process_json_file`.

#### **XML**

- Uses `RecursiveCharacterTextSplitter` with XML-aware separators.
- **Structure-preserving chunking**:
  - Separators prioritized: `\n\n` → `\n` → `>` (end of XML tags) → space → character
  - Splits at logical boundaries to maintain tag integrity
- **Chunked by 4000 characters** with 200-character overlap for context preservation.
- **Goal**: Preserve XML structure while providing manageable chunks for LLM processing.
- See `process_xml`.

#### **YAML / YML**

- Processed using regex word splitting (similar to TXT).
- **Chunked by 400 words**.
- Maintains YAML structure through simple word-based splitting.
- See `process_yaml`.

#### **LOG**

- Processed using line-based chunking to maintain log record integrity.
- **Never splits mid-line** to preserve complete log entries.
- **Line-Level Chunking**:
  1. Split file by lines using `splitlines(keepends=True)` to preserve line endings.
  2. Accumulate complete lines until reaching target word count ≈1000 words.
  3. When adding next line would exceed target AND chunk already has content:
     - Finalize current chunk
     - Start new chunk with current line
  4. If single line exceeds target, it gets its own chunk to prevent infinite loops.
  5. Emit chunks with complete log records.
- **Goal**: Provide substantial log context (1000 words) while ensuring no log entry is split across chunks.
- See `process_log`.

* updated yaml to use recursivesplitter

* removed chunk overlap for yaml and xml

* added support for older .doc files and .docm

* added keyword and abstract for each doc in citation

* added multi-modal input support

* added ai vision analysis
@wingmann2
Copy link
Author

@Bionic711 @paullizer PR #526 #526 was auto closed after rebase and development branch got hosed. This is a fresh development branch with the isolated changes to review. Still unsure why there was conflicts shown in fork > merge on files not included with this change but will revisit contributor status down the road when changes become more frequent

@wingmann2 wingmann2 changed the base branch from main to Development November 30, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants