Skip to content

Fix/evaluation bugs#15

Open
R09722akaBennett wants to merge 7 commits into
brandonstarxel:devfrom
R09722akaBennett:fix/evaluation-bugs
Open

Fix/evaluation bugs#15
R09722akaBennett wants to merge 7 commits into
brandonstarxel:devfrom
R09722akaBennett:fix/evaluation-bugs

Conversation

@R09722akaBennett
Copy link
Copy Markdown

This PR addresses a bugs encountered during the evaluation run:

  • Updates the _chunker_to_collection method to use get_or_create_collection, avoiding a NotFoundError when a collection does not exist.

Closes #13

brandonstarxel and others added 7 commits September 27, 2024 14:06
## UnicodeDecodeError Fix for Windows

### Problem
On Windows systems, the default encoding (cp1252) can cause UnicodeDecodeError when reading files containing non-ASCII characters. This breaks functionality for Windows users working with international text.

### Solution
This PR modifies the `_get_chunks_and_metadata` method to detect Windows environments and explicitly use UTF-8 encoding when opening files on that platform. The behavior on other platforms remains unchanged.

### Testing
Tested on:
- Windows 10 with files containing Unicode characters (previously errored, now works)
- Linux with the same files (behavior unchanged)

This change is minimally invasive but fixes a significant usability issue for Windows users.
…icodeDecodeErrorOnWindows

Update base_evaluation.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants