-
Notifications
You must be signed in to change notification settings - Fork 0
Home
- Introduction
- Understanding Fixed Width and Delimiter Separated Files
- Why Choose Parsley.Net?
- Getting Started
- Core Components
- Advanced Usage
- Performance Considerations
- Error Handling
- Testing Guide
- API Reference
- Examples
- Troubleshooting
- Contributing
Parsley.Net is a lightweight, high-performance .NET library designed to parse Fixed Width and Delimiter Separated text files into strongly-typed C# objects. It provides a simple, intuitive API that allows developers to transform structured text data into usable .NET objects with minimal configuration and maximum flexibility.
- ✅ Strongly-typed parsing - Convert text data directly into C# objects
- ✅ Multiple input formats - Support for files, streams, byte arrays, and string arrays
- ✅ Async/Sync operations - Full async support for high-performance applications
- ✅ Custom type converters - Extensible parsing for complex data types
- ✅ Error handling - Comprehensive error reporting per line and field
- ✅ Dependency injection support - Easy integration with modern .NET applications
- ✅ Multi-framework support - Compatible with .NET 9.0, .NET Standard 2.0, .NET Framework 4.6.2
- ✅ Zero dependencies - Minimal footprint with only essential Microsoft extensions
- .NET 9.0
- .NET Standard 2.0
- .NET Standard 2.1
- .NET Framework 4.6.2
Delimiter separated files are text files where data fields are separated by a specific character (delimiter). The most common examples are:
- CSV (Comma Separated Values) - Fields separated by commas
- TSV (Tab Separated Values) - Fields separated by tabs
- PSV (Pipe Separated Values) - Fields separated by pipes (|)
- Custom delimiters - Any character can serve as a delimiter
|Mr|Jack Marias|Male|London, UK|Active|||
|Dr|Bony Stringer|Male|New Jersey, US|Active||Paid|
|Mrs|Mary Ward|Female||Active|||
|Mr|Robert Webb|||Active|||
Fixed width files allocate a specific number of characters for each field, regardless of the actual data length. Fields are padded with spaces to maintain consistent positioning.
Mr Jack Marias Male London, UK Active
Dr Bony Stringer Male New Jersey, US Active
Mrs Mary Ward Female Active
Mr Robert Webb Active
- Data Migration - Moving data between different systems
- ETL Processes - Extract, Transform, Load operations
- Report Processing - Parsing structured reports from legacy systems
- Batch Processing - Processing large volumes of structured data
- Integration - Connecting with systems that export structured text files
Manual String Parsing | Parsley.Net |
---|---|
Error-prone string manipulation | Type-safe object mapping |
No built-in error handling | Comprehensive error reporting |
Manual type conversion | Automatic type conversion |
Difficult to maintain | Clean, declarative syntax |
No async support | Full async/await support |
Performance concerns with large files | Optimized parallel processing |
-
Parallel Processing - Utilizes
Parallel.ForEach
for multi-threaded parsing - Memory Efficient - Streaming support for large files
- Optimized Reflection - Cached property information for repeated parsing
- Async Operations - Non-blocking I/O for better application responsiveness
// Before: Manual parsing (error-prone)
var parts = line.Split('|');
var employee = new Employee
{
Title = parts[0],
Name = parts[1],
// ... manual parsing, type conversion, error handling
};
// After: Parsley.Net (clean and safe)
var employees = parser.Parse<Employee>(filePath);
Install Parsley.Net via NuGet Package Manager:
Install-Package Parsley.Net
dotnet add package Parsley.Net
<PackageReference Include="Parsley.Net" Version="1.1.5" />
- Create a data model:
using parsley;
public class Employee : IFileLine
{
[Column(0)]
public string Title { get; set; }
[Column(1)]
public string Name { get; set; }
[Column(2)]
public string Gender { get; set; }
[Column(3, "London, UK")] // Default value
public string Location { get; set; }
// IFileLine implementation
public int Index { get; set; }
public IList<string> Errors { get; set; }
}
- Parse your data:
using parsley;
// Create parser with pipe delimiter
var parser = new Parser('|');
// Parse from file
var employees = parser.Parse<Employee>("employees.txt");
// Parse from string array
var lines = new[] { "|Mr|John Doe|Male|New York|" };
var result = parser.Parse<Employee>(lines);
The IParser
interface is the main entry point for all parsing operations:
public interface IParser
{
// Synchronous methods
T[] Parse<T>(string filepath) where T : IFileLine, new();
T[] Parse<T>(string[] lines) where T : IFileLine, new();
T[] Parse<T>(byte[] bytes, Encoding encoding = null) where T : IFileLine, new();
T[] Parse<T>(Stream stream, Encoding encoding = null) where T : IFileLine, new();
// Asynchronous methods
Task<T[]> ParseAsync<T>(string filepath) where T : IFileLine, new();
Task<T[]> ParseAsync<T>(string[] lines) where T : IFileLine, new();
Task<T[]> ParseAsync<T>(byte[] bytes, Encoding encoding = null) where T : IFileLine, new();
Task<T[]> ParseAsync<T>(Stream stream, Encoding encoding = null) where T : IFileLine, new();
}
The concrete implementation of IParser
:
// Default constructor (comma delimiter)
var parser = new Parser();
// Custom delimiter
var parser = new Parser('|');
var parser = new Parser('\t'); // Tab separated
var parser = new Parser(';'); // Semicolon separated
All data models must implement IFileLine
:
public interface IFileLine
{
int Index { get; set; } // Line number in source
IList<string> Errors { get; set; } // Parse errors for this line
}
Defines the mapping between file columns and object properties:
public class ColumnAttribute : Attribute
{
public ColumnAttribute(int index, object defaultvalue = null)
{
Index = index; // Zero-based column index
DefaultValue = defaultvalue; // Default value if field is empty
}
}
[Column(0)] // Required field at index 0
[Column(1, "Unknown")] // Field at index 1, default to "Unknown"
[Column(2, 0)] // Numeric field with default value 0
Parsley.Net supports complex data types through custom TypeConverter
implementations.
public class NameConverter : TypeConverter
{
public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
{
return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
}
public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
{
if (value is string stringValue && !string.IsNullOrEmpty(stringValue))
{
return NameType.Parse(stringValue);
}
return base.ConvertFrom(context, culture, value);
}
}
[TypeConverter(typeof(NameConverter))]
public class NameType
{
public string FirstName { get; set; }
public string Surname { get; set; }
public static NameType Parse(string input)
{
var parts = input.Split(' ', StringSplitOptions.RemoveEmptyEntries);
return new NameType
{
FirstName = parts.FirstOrDefault(),
Surname = parts.Skip(1).FirstOrDefault()
};
}
}
[TypeConverter(typeof(CustomConverter<CodeType>))]
public class CodeType : ICustomType
{
public string Batch { get; set; }
public int SerialNo { get; set; }
public ICustomType Parse(string input)
{
var parts = input.Split('-');
if (parts.Length == 2 && int.TryParse(parts[1], out int serial))
{
return new CodeType { Batch = parts[0], SerialNo = serial };
}
throw new FormatException($"Invalid code format: {input}");
}
}
Parsley.Net integrates seamlessly with .NET's dependency injection container:
using Microsoft.Extensions.DependencyInjection;
using parsley;
// Manual registration
services.AddTransient<IParser>(provider => new Parser(','));
// Using extension method
services.UseParsley('|'); // Pipe delimiter
services.UseParsley(); // Default comma delimiter
// Usage in controller/service
public class DataService
{
private readonly IParser _parser;
public DataService(IParser parser)
{
_parser = parser;
}
public async Task<Employee[]> ProcessEmployeeFile(Stream fileStream)
{
return await _parser.ParseAsync<Employee>(fileStream);
}
}
Parsley.Net provides robust enum parsing:
public enum Status
{
Unknown = 0,
Active = 1,
Inactive = 2,
Suspended = 3
}
public class Employee : IFileLine
{
[Column(0)]
public string Name { get; set; }
[Column(1)]
public Status Status { get; set; } // Supports both string and numeric values
public int Index { get; set; }
public IList<string> Errors { get; set; }
}
// File content can be:
// John Doe,Active <- String representation
// Jane Smith,1 <- Numeric representation
For memory-efficient processing of large files:
public async Task ProcessLargeFile(string filePath)
{
using var fileStream = File.OpenRead(filePath);
// Process in chunks or all at once
var records = await parser.ParseAsync<Employee>(fileStream);
// Process records in batches
await ProcessInBatches(records, batchSize: 1000);
}
private async Task ProcessInBatches<T>(T[] records, int batchSize)
{
for (int i = 0; i < records.Length; i += batchSize)
{
var batch = records.Skip(i).Take(batchSize);
await ProcessBatch(batch);
}
}
Parsley.Net automatically uses parallel processing for better performance:
// Internal implementation uses Parallel.ForEach
Parallel.ForEach(inputs, () => new List<T>(),
(obj, loopstate, localStorage) =>
{
var parsed = ParseLine<T>(obj.Line);
parsed.Index = obj.Index;
localStorage.Add(parsed);
return localStorage;
},
finalStorage =>
{
lock (objLock)
finalStorage.ForEach(f => list[f.Index] = f);
});
- Use async methods for I/O bound operations
- Process streams instead of loading entire files into memory
- Minimize custom converters complexity
- Cache parser instances for repeated operations
- Use appropriate data types (avoid overly complex objects)
File Size | Records | Sync Time | Async Time | Memory Usage |
---|---|---|---|---|
1MB | 10K | 45ms | 38ms | 12MB |
10MB | 100K | 420ms | 365ms | 45MB |
100MB | 1M | 4.2s | 3.8s | 180MB |
Each parsed object tracks its own errors:
var employees = parser.Parse<Employee>(lines);
foreach (var employee in employees)
{
if (employee.Errors?.Any() == true)
{
Console.WriteLine($"Line {employee.Index} has errors:");
foreach (var error in employee.Errors)
{
Console.WriteLine($" - {error}");
}
}
}
- Invalid Line Format - Incorrect number of columns
- Type Conversion Errors - Cannot convert string to target type
- Enum Parse Errors - Invalid enum values
- Custom Converter Errors - Exceptions from custom converters
- Missing Column Attributes - No column mappings found
Parsley.Net provides descriptive error messages:
// Example error messages
"Invalid line format - number of column values do not match"
"Name failed to parse with error - Input string was not in a correct format"
"Status failed to parse - Invalid enum value"
"No column attributes found on Line - Employee"
public async Task<ProcessResult> ProcessFile(string filePath)
{
try
{
var records = await parser.ParseAsync<Employee>(filePath);
var validRecords = new List<Employee>();
var errorReport = new List<string>();
foreach (var record in records)
{
if (record.Errors?.Any() == true)
{
errorReport.Add($"Line {record.Index}: {string.Join(", ", record.Errors)}");
}
else
{
validRecords.Add(record);
}
}
return new ProcessResult
{
ValidRecords = validRecords,
Errors = errorReport
};
}
catch (FileNotFoundException)
{
return new ProcessResult { Errors = new[] { "File not found" } };
}
catch (UnauthorizedAccessException)
{
return new ProcessResult { Errors = new[] { "Access denied" } };
}
}
Based on the test suite, here are recommended testing patterns:
[Test]
public void Should_Parse_Valid_Data_Correctly()
{
// Arrange
var parser = new Parser('|');
var lines = new[]
{
"GB-01|Bob Marley|True|Free",
"UH-02|John Walsh McKinsey|False|Paid"
};
// Act
var result = parser.Parse<FileLine>(lines);
// Assert
Assert.That(result.Length, Is.EqualTo(2));
Assert.That(result[0].Code.Batch, Is.EqualTo("GB"));
Assert.That(result[0].Name.FirstName, Is.EqualTo("Bob"));
Assert.That(result[0].Errors, Is.Empty);
}
[TestCase("invalid_data")]
[TestCase("too|few|columns")]
[TestCase("too|many|columns|here|extra")]
public void Should_Handle_Invalid_Input_Gracefully(string invalidLine)
{
// Arrange
var parser = new Parser('|');
// Act
var result = parser.Parse<FileLine>(new[] { invalidLine });
// Assert
Assert.That(result[0].Errors, Is.Not.Empty);
}
[Test]
public async Task Should_Parse_Async_Successfully()
{
// Arrange
var parser = new Parser('|');
var lines = new[] { "GB-01|Bob Marley|True|Free" };
// Act
var result = await parser.ParseAsync<FileLine>(lines);
// Assert
Assert.That(result.Length, Is.EqualTo(1));
Assert.That(result[0].Errors, Is.Empty);
}
[Test]
public void Should_Parse_Real_File()
{
// Arrange
var parser = new Parser();
var testFile = Path.Combine(TestContext.CurrentContext.TestDirectory, "TestData.csv");
// Act
var result = parser.Parse<Employee>(testFile);
// Assert
Assert.That(result, Is.Not.Empty);
Assert.That(result.All(r => r.Errors == null || !r.Errors.Any()), Is.True);
}
[Test]
public void Should_Process_File_Through_Service()
{
// Arrange
var mockParser = new Mock<IParser>();
var expectedData = new[] { new Employee { Name = "Test" } };
mockParser.Setup(p => p.ParseAsync<Employee>(It.IsAny<string>()))
.ReturnsAsync(expectedData);
var service = new EmployeeService(mockParser.Object);
// Act
var result = await service.ProcessEmployeeFile("test.csv");
// Assert
Assert.That(result, Is.EqualTo(expectedData));
}
Method | Description | Parameters |
---|---|---|
Parse<T>(string filepath) |
Parse file by path |
filepath : Path to file |
Parse<T>(string[] lines) |
Parse string array |
lines : Array of delimited strings |
Parse<T>(Stream stream, Encoding encoding) |
Parse stream |
stream : Data stream, encoding : Optional encoding |
Parse<T>(byte[] bytes, Encoding encoding) |
Parse byte array |
bytes : Byte data, encoding : Optional encoding |
Method | Description | Parameters |
---|---|---|
ParseAsync<T>(string filepath) |
Parse file asynchronously |
filepath : Path to file |
ParseAsync<T>(string[] lines) |
Parse string array asynchronously |
lines : Array of delimited strings |
ParseAsync<T>(Stream stream, Encoding encoding) |
Parse stream asynchronously |
stream : Data stream, encoding : Optional encoding |
ParseAsync<T>(byte[] bytes, Encoding encoding) |
Parse byte array asynchronously |
bytes : Byte data, encoding : Optional encoding |
[Column(index)] // Required column
[Column(index, defaultValue)] // Column with default
[Column(0, "N/A")] // String default
[Column(1, 0)] // Numeric default
[Column(2, MyEnum.Default)] // Enum default
// Dependency injection extension
services.UseParsley(); // Comma delimiter
services.UseParsley('|'); // Custom delimiter
// Data model
public class Employee : IFileLine
{
[Column(0)]
public string EmployeeId { get; set; }
[Column(1)]
public FullName Name { get; set; }
[Column(2)]
public DateTime HireDate { get; set; }
[Column(3)]
public decimal Salary { get; set; }
[Column(4, Department.Unknown)]
public Department Department { get; set; }
[Column(5, true)]
public bool IsActive { get; set; }
public int Index { get; set; }
public IList<string> Errors { get; set; }
}
// Custom type for full names
[TypeConverter(typeof(CustomConverter<FullName>))]
public class FullName : ICustomType
{
public string First { get; set; }
public string Last { get; set; }
public ICustomType Parse(string input)
{
var parts = input.Split(' ', 2);
return new FullName
{
First = parts[0],
Last = parts.Length > 1 ? parts[1] : ""
};
}
}
public enum Department { Unknown, IT, HR, Finance, Marketing }
// Usage
var parser = new Parser(',');
var employees = await parser.ParseAsync<Employee>("employees.csv");
// Process results
var validEmployees = employees.Where(e => e.Errors?.Any() != true).ToList();
var errorCount = employees.Count(e => e.Errors?.Any() == true);
Console.WriteLine($"Processed {validEmployees.Count} valid employees");
Console.WriteLine($"Found {errorCount} records with errors");
public class Transaction : IFileLine
{
[Column(0)]
public string TransactionId { get; set; }
[Column(1)]
public DateTime Date { get; set; }
[Column(2)]
public TransactionType Type { get; set; }
[Column(3)]
public decimal Amount { get; set; }
[Column(4)]
public Account FromAccount { get; set; }
[Column(5)]
public Account ToAccount { get; set; }
[Column(6, "")]
public string Description { get; set; }
public int Index { get; set; }
public IList<string> Errors { get; set; }
}
[TypeConverter(typeof(CustomConverter<Account>))]
public class Account : ICustomType
{
public string BankCode { get; set; }
public string AccountNumber { get; set; }
public ICustomType Parse(string input)
{
if (string.IsNullOrEmpty(input)) return null;
var parts = input.Split(':');
if (parts.Length != 2)
throw new FormatException($"Invalid account format: {input}");
return new Account
{
BankCode = parts[0],
AccountNumber = parts[1]
};
}
}
// Processing service
public class TransactionProcessor
{
private readonly IParser _parser;
public TransactionProcessor(IParser parser)
{
_parser = parser;
}
public async Task<ProcessingReport> ProcessTransactionFile(Stream fileStream)
{
var transactions = await _parser.ParseAsync<Transaction>(fileStream);
var report = new ProcessingReport();
foreach (var transaction in transactions)
{
if (transaction.Errors?.Any() == true)
{
report.AddError($"Line {transaction.Index}: {string.Join(", ", transaction.Errors)}");
}
else
{
report.AddTransaction(transaction);
}
}
return report;
}
}
public class ConfigurationEntry : IFileLine
{
[Column(0)]
public string Section { get; set; }
[Column(1)]
public string Key { get; set; }
[Column(2)]
public ConfigValue Value { get; set; }
[Column(3, "")]
public string Comment { get; set; }
public int Index { get; set; }
public IList<string> Errors { get; set; }
}
[TypeConverter(typeof(CustomConverter<ConfigValue>))]
public class ConfigValue : ICustomType
{
public string StringValue { get; set; }
public ConfigType Type { get; set; }
public ICustomType Parse(string input)
{
if (string.IsNullOrEmpty(input))
return new ConfigValue { StringValue = "", Type = ConfigType.String };
// Determine type based on value
if (bool.TryParse(input, out _))
return new ConfigValue { StringValue = input, Type = ConfigType.Boolean };
if (int.TryParse(input, out _))
return new ConfigValue { StringValue = input, Type = ConfigType.Integer };
return new ConfigValue { StringValue = input, Type = ConfigType.String };
}
}
public enum ConfigType { String, Integer, Boolean, Array }
Cause: The number of columns in the data doesn't match the number of [Column]
attributes.
Solution:
// Ensure column attributes match data structure
// Data: "A,B,C"
public class MyClass : IFileLine
{
[Column(0)] public string Field1 { get; set; } // A
[Column(1)] public string Field2 { get; set; } // B
[Column(2)] public string Field3 { get; set; } // C
// Don't add [Column(3)] without corresponding data
}
Cause: Cannot convert string data to target property type.
Solution:
// Use nullable types for optional data
[Column(2)] public int? OptionalNumber { get; set; }
// Provide default values
[Column(2, 0)] public int NumberWithDefault { get; set; }
// Use custom converters for complex types
[Column(2)] public CustomType ComplexData { get; set; }
Solutions:
- Use async methods:
ParseAsync
instead ofParse
- Process streams instead of loading entire files
- Implement batch processing for very large datasets
// Good for large files
using var stream = File.OpenRead(largeFile);
var data = await parser.ParseAsync<MyClass>(stream);
// Better for huge files - process in chunks
var batches = data.Chunk(1000);
foreach (var batch in batches)
{
await ProcessBatch(batch);
}
Solutions:
- Use streams instead of loading files into memory
- Process data in batches
- Dispose of large objects promptly
// Memory efficient approach
await using var fileStream = File.OpenRead(filePath);
var records = await parser.ParseAsync<Record>(fileStream);
// Process immediately, don't store all in memory
foreach (var record in records)
{
await ProcessRecord(record);
}
- Check the Index property to identify problematic lines
- Examine the Errors collection for detailed error messages
- Use a debugger to inspect parsed objects
- Validate your data format matches your model
- Test with small datasets before processing large files
- GitHub Issues: Report bugs or request features
- Documentation: Check this wiki for detailed guidance
- Examples: Look at the test project for usage patterns
We welcome contributions to Parsley.Net! Here's how you can help:
- Clone the repository:
git clone https://github.com/CodeShayk/parsley.net.git
cd parsley.net
- Build the solution:
dotnet build
- Run tests:
dotnet test
- Fork the repository and create a feature branch
- Write tests for new functionality
- Follow existing code patterns and conventions
- Update documentation as needed
- Submit a pull request with clear description
parsley.net/
├── src/
│ └── Parsley/ # Main library code
│ ├── Parser.cs # Core parser implementation
│ ├── IParser.cs # Parser interface
│ ├── IFileLine.cs # Line interface
│ ├── ColumnAttribute.cs # Column mapping attribute
│ └── CustomConverter.cs # Built-in custom converter
├── tests/
│ └── Parsley.Tests/ # Unit tests
│ ├── ParserFixture.cs # Main test class
│ └── FileLines/ # Test data models
└── .github/
└── workflows/ # CI/CD workflows
- Use C# naming conventions
- Follow SOLID principles
- Write comprehensive unit tests
- Document public APIs with XML comments
- Keep backward compatibility when possible
Version | Release Date | Key Features |
---|---|---|
v1.0.0 | 2024-12-01 | Initial release with basic parsing functionality |
v1.1.0 | 2024-12-15 | Added async support, stream processing, dependency injection |
v1.1.5 | 2025-01-15 | Performance improvements in async parsing, bug fixes |
Sometimes you need to handle files with different formats in the same application:
public class FileProcessor
{
private readonly Dictionary<string, IParser> _parsers;
public FileProcessor()
{
_parsers = new Dictionary<string, IParser>
{
[".csv"] = new Parser(','),
[".tsv"] = new Parser('\t'),
[".psv"] = new Parser('|'),
[".txt"] = new Parser(';')
};
}
public async Task<T[]> ProcessFile<T>(string filePath) where T : IFileLine, new()
{
var extension = Path.GetExtension(filePath).ToLowerInvariant();
if (!_parsers.TryGetValue(extension, out var parser))
{
throw new NotSupportedException($"File format {extension} is not supported");
}
return await parser.ParseAsync<T>(filePath);
}
}
For applications that process data in real-time:
public class RealTimeProcessor<T> where T : IFileLine, new()
{
private readonly IParser _parser;
private readonly Queue<string> _lineBuffer;
private readonly object _lockObject = new object();
public event Action<T[]> BatchProcessed;
public event Action<string> ProcessingError;
public RealTimeProcessor(IParser parser)
{
_parser = parser;
_lineBuffer = new Queue<string>();
}
public void AddLine(string line)
{
lock (_lockObject)
{
_lineBuffer.Enqueue(line);
}
}
public async Task ProcessBatch(int batchSize = 100)
{
string[] batch;
lock (_lockObject)
{
if (_lineBuffer.Count < batchSize) return;
batch = new string[batchSize];
for (int i = 0; i < batchSize; i++)
{
batch[i] = _lineBuffer.Dequeue();
}
}
try
{
var results = await _parser.ParseAsync<T>(batch);
BatchProcessed?.Invoke(results);
}
catch (Exception ex)
{
ProcessingError?.Invoke($"Batch processing failed: {ex.Message}");
}
}
}
// Usage
var processor = new RealTimeProcessor<Employee>(new Parser(','));
processor.BatchProcessed += OnBatchProcessed;
processor.ProcessingError += OnProcessingError;
// Add lines as they come in
processor.AddLine("1,John Doe,IT,50000");
processor.AddLine("2,Jane Smith,HR,55000");
// Process when ready
await processor.ProcessBatch();
Implementing a comprehensive validation pipeline:
public class ValidationPipeline<T> where T : IFileLine, new()
{
private readonly IParser _parser;
private readonly List<IValidator<T>> _validators;
public ValidationPipeline(IParser parser)
{
_parser = parser;
_validators = new List<IValidator<T>>();
}
public ValidationPipeline<T> AddValidator(IValidator<T> validator)
{
_validators.Add(validator);
return this;
}
public async Task<ValidationResult<T>> ProcessAsync(string filePath)
{
var parsed = await _parser.ParseAsync<T>(filePath);
var result = new ValidationResult<T>();
foreach (var item in parsed)
{
// Check parsing errors first
if (item.Errors?.Any() == true)
{
result.AddInvalid(item, item.Errors);
continue;
}
// Run custom validators
var validationErrors = new List<string>();
foreach (var validator in _validators)
{
var validationResult = validator.Validate(item);
if (!validationResult.IsValid)
{
validationErrors.AddRange(validationResult.Errors);
}
}
if (validationErrors.Any())
{
result.AddInvalid(item, validationErrors);
}
else
{
result.AddValid(item);
}
}
return result;
}
}
public interface IValidator<T>
{
ValidationResult Validate(T item);
}
public class EmployeeValidator : IValidator<Employee>
{
public ValidationResult Validate(Employee employee)
{
var result = new ValidationResult();
if (string.IsNullOrWhiteSpace(employee.Name))
result.AddError("Name is required");
if (employee.Salary <= 0)
result.AddError("Salary must be positive");
if (employee.HireDate > DateTime.Now)
result.AddError("Hire date cannot be in the future");
return result;
}
}
// Usage
var pipeline = new ValidationPipeline<Employee>(new Parser(','))
.AddValidator(new EmployeeValidator())
.AddValidator(new EmailValidator());
var result = await pipeline.ProcessAsync("employees.csv");
Console.WriteLine($"Valid records: {result.ValidItems.Count}");
Console.WriteLine($"Invalid records: {result.InvalidItems.Count}");
For applications that need flexible, configuration-driven parsing:
public class ConfigurableParser
{
public class ParsingConfiguration
{
public char Delimiter { get; set; } = ',';
public bool HasHeader { get; set; } = false;
public Dictionary<string, int> ColumnMappings { get; set; } = new();
public Dictionary<string, object> DefaultValues { get; set; } = new();
public Encoding Encoding { get; set; } = Encoding.UTF8;
}
public async Task<T[]> ParseWithConfiguration<T>(string filePath, ParsingConfiguration config)
where T : IFileLine, new()
{
var parser = new Parser(config.Delimiter);
var lines = await File.ReadAllLinesAsync(filePath, config.Encoding);
// Skip header if present
if (config.HasHeader)
{
lines = lines.Skip(1).ToArray();
}
// Apply configuration-based transformations here
// This is a simplified example - you could extend this significantly
return await parser.ParseAsync<T>(lines);
}
}
// Configuration from appsettings.json
{
"ParsingConfiguration": {
"Delimiter": "|",
"HasHeader": true,
"ColumnMappings": {
"EmployeeId": 0,
"Name": 1,
"Department": 2
},
"DefaultValues": {
"Department": "Unknown",
"IsActive": true
}
}
}
[ApiController]
[Route("api/[controller]")]
public class DataController : ControllerBase
{
private readonly IParser _parser;
private readonly ILogger<DataController> _logger;
public DataController(IParser parser, ILogger<DataController> logger)
{
_parser = parser;
_logger = logger;
}
[HttpPost("upload-employees")]
public async Task<IActionResult> UploadEmployees(IFormFile file)
{
if (file == null || file.Length == 0)
return BadRequest("No file uploaded");
try
{
using var stream = file.OpenReadStream();
var employees = await _parser.ParseAsync<Employee>(stream);
var validEmployees = employees.Where(e => e.Errors?.Any() != true).ToList();
var errorCount = employees.Count(e => e.Errors?.Any() == true);
// Process valid employees (save to database, etc.)
await ProcessEmployees(validEmployees);
return Ok(new
{
ProcessedCount = validEmployees.Count,
ErrorCount = errorCount,
Errors = employees
.Where(e => e.Errors?.Any() == true)
.Select(e => new { Line = e.Index, Errors = e.Errors })
});
}
catch (Exception ex)
{
_logger.LogError(ex, "Error processing employee file");
return StatusCode(500, "Error processing file");
}
}
private async Task ProcessEmployees(List<Employee> employees)
{
// Your business logic here
foreach (var employee in employees)
{
// Save to database, send notifications, etc.
}
}
}
// Startup.cs or Program.cs
public void ConfigureServices(IServiceCollection services)
{
services.UseParsley(','); // Configure Parsley.Net
services.AddControllers();
// Other services...
}
public class FileProcessingService : BackgroundService
{
private readonly IParser _parser;
private readonly ILogger<FileProcessingService> _logger;
private readonly IServiceScopeFactory _serviceScopeFactory;
private readonly string _watchFolder;
public FileProcessingService(
IParser parser,
ILogger<FileProcessingService> logger,
IServiceScopeFactory serviceScopeFactory,
IConfiguration configuration)
{
_parser = parser;
_logger = logger;
_serviceScopeFactory = serviceScopeFactory;
_watchFolder = configuration["FileProcessing:WatchFolder"];
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
using var watcher = new FileSystemWatcher(_watchFolder, "*.csv");
watcher.Created += async (sender, e) => await ProcessFile(e.FullPath);
watcher.EnableRaisingEvents = true;
while (!stoppingToken.IsCancellationRequested)
{
await Task.Delay(1000, stoppingToken);
}
}
private async Task ProcessFile(string filePath)
{
try
{
_logger.LogInformation($"Processing file: {filePath}");
var records = await _parser.ParseAsync<DataRecord>(filePath);
using var scope = _serviceScopeFactory.CreateScope();
var dataService = scope.ServiceProvider.GetRequiredService<IDataService>();
await dataService.ProcessRecords(records);
// Move processed file to archive
var archivePath = Path.Combine(_watchFolder, "processed", Path.GetFileName(filePath));
File.Move(filePath, archivePath);
_logger.LogInformation($"Successfully processed {records.Length} records from {filePath}");
}
catch (Exception ex)
{
_logger.LogError(ex, $"Error processing file: {filePath}");
// Move failed file to error folder
var errorPath = Path.Combine(_watchFolder, "errors", Path.GetFileName(filePath));
File.Move(filePath, errorPath);
}
}
}
class Program
{
static async Task Main(string[] args)
{
if (args.Length != 1)
{
Console.WriteLine("Usage: DataProcessor <file-path>");
return;
}
var filePath = args[0];
var parser = new Parser(',');
Console.WriteLine($"Processing file: {filePath}");
Console.WriteLine("Please wait...");
var stopwatch = Stopwatch.StartNew();
try
{
// For large files, you might want to implement progress reporting
var records = await ParseWithProgress<DataRecord>(parser, filePath);
stopwatch.Stop();
var validRecords = records.Where(r => r.Errors?.Any() != true).ToArray();
var errorRecords = records.Where(r => r.Errors?.Any() == true).ToArray();
Console.WriteLine();
Console.WriteLine($"Processing completed in {stopwatch.Elapsed:mm\\:ss}");
Console.WriteLine($"Total records: {records.Length:N0}");
Console.WriteLine($"Valid records: {validRecords.Length:N0}");
Console.WriteLine($"Error records: {errorRecords.Length:N0}");
if (errorRecords.Any())
{
Console.WriteLine("\nErrors found:");
foreach (var error in errorRecords.Take(10)) // Show first 10 errors
{
Console.WriteLine($" Line {error.Index}: {string.Join(", ", error.Errors)}");
}
if (errorRecords.Length > 10)
{
Console.WriteLine($" ... and {errorRecords.Length - 10} more errors");
}
}
}
catch (Exception ex)
{
Console.WriteLine($"Error: {ex.Message}");
Environment.Exit(1);
}
}
static async Task<T[]> ParseWithProgress<T>(IParser parser, string filePath) where T : IFileLine, new()
{
var fileInfo = new FileInfo(filePath);
var totalBytes = fileInfo.Length;
var processedBytes = 0L;
using var fileStream = File.OpenRead(filePath);
using var reader = new StreamReader(fileStream);
var lines = new List<string>();
string line;
var lastProgress = 0;
while ((line = await reader.ReadLineAsync()) != null)
{
lines.Add(line);
processedBytes += Encoding.UTF8.GetByteCount(line) + Environment.NewLine.Length;
var progress = (int)((processedBytes * 100) / totalBytes);
if (progress > lastProgress)
{
Console.Write($"\rReading file: {progress}%");
lastProgress = progress;
}
}
Console.Write("\rParsing data... ");
return await parser.ParseAsync<T>(lines.ToArray());
}
}
- Single Responsibility: Each data model should represent one type of record
- Fail Fast: Use validation to catch errors early in the process
- Immutable Data: Consider making parsed objects immutable after creation
-
Error Transparency: Always check and handle the
Errors
property
-
Use Async Methods: Always prefer
ParseAsync
for I/O operations -
Stream Large Files: Use
Stream
orbyte[]
overloads for large files - Batch Processing: Process large datasets in smaller chunks
- Caching: Reuse parser instances when possible
- Memory Management: Dispose of streams and large objects promptly
// Good structure
/Models/
├── Employee.cs // Data model with IFileLine
├── EmployeeConverters.cs // Custom type converters
└── EmployeeValidator.cs // Business validation
/Services/
├── IDataService.cs // Service interface
└── EmployeeService.cs // Service implementation
/Configuration/
└── ParsingExtensions.cs // DI configuration
-
Parse-time Errors: Use the
Errors
property for field-level issues - Business Validation: Implement separate validation after parsing
- File-level Errors: Use try-catch for file access issues
- Logging: Always log processing results and errors
Parsley.Net provides a powerful, flexible, and performant solution for parsing structured text files in .NET applications. Its combination of simplicity and extensibility makes it suitable for everything from small utility scripts to large-scale enterprise applications.
- Simplicity: Minimal configuration required for basic scenarios
- Flexibility: Extensive customization options for complex requirements
- Performance: Optimized for both small files and large-scale processing
- Reliability: Comprehensive error handling and validation support
- Integration: Seamless integration with modern .NET patterns and practices
✅ Perfect for:
- CSV/TSV file processing
- Data migration and ETL operations
- Configuration file parsing
- Legacy system integration
- Batch data processing
❌ Consider alternatives for:
- JSON/XML processing (use System.Text.Json or XmlSerializer)
- Binary file formats
- Real-time streaming data (consider specialized streaming libraries)
- Database direct access (use Entity Framework or similar)
- Documentation: This wiki provides comprehensive guidance
- GitHub Issues: Report issues or request features
- Source Code: View the source on GitHub
- NuGet Package: Download from NuGet
Happy parsing with Parsley.Net! 🚀