A high-performance CSV processing library for Ruby, providing efficient sorting, validation, and batch processing capabilities.
- Efficient CSV Sorting: Sort large CSV files with minimal memory usage using external merge sort
- URL Validation: Built-in validation for URL fields
- Protocol Validation: Validate protocol presence in fields
- Batch Processing: Process CSV data in configurable batch sizes
- Memory Management: Configurable buffer sizes for optimal memory usage
- Error Tracking: Detailed error reporting for validation failures
Add this line to your application's Gemfile:
gem 'csv_utils', git: "https://github.com/movableink/csv_utils"And then execute:
bundle installOr install it yourself as:
gem install csv_utilsrequire 'csv_utils'
# Create a new sorter
sorter = CsvUtils::Sorter.new("my_source", [0, 1], 100)
)
# Add rows
sorter.add_row(["value1", "value2", "url1"])
sorter.add_row(["value3", "value4", "url2"])
# Sort and get results
result = sorter.sort!
puts "Total rows processed: #{result['total_rows']}"
# Read back the result in batches
sorter.each_batch(1000) do |batch|
  batch.each do |row|
    # Process each row
  end
end# Set validation schema
sorter.set_validation_schema([:url, :protocol, nil])  # nil means ignore column
# Process rows with validation
sorter.add_row(["https://example.com", "http://", "ignored"])require 'csv_utils'
# Create a validator
validator = CsvUtils::Validator.new(
  [
    { column_name: "url", validation_type: :url },
    { column_name: "protocol", validation_type: :protocol },
    { column_name: "name", validation_type: nil }
  ],
  "error_log.csv"
)
# Validate a CSV file
validator.add_file("input.csv")
# Get validation results
status = validator.status
puts "Total rows: #{status[:total_rows_processed]}"
puts "URL errors: #{status[:failed_url_error_count]}"
puts "Protocol errors: #{status[:failed_protocol_error_count]}"
puts "Parse errors: #{status[:parse_error_count]}"After checking out the repo, run bundle to install dependencies. Then, run bundle exec rake compile to build the native code. Then run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/movableink/csv_utils.
Copyright 2025 Movable, Inc.