CSVCoder

0.0.4

A Swift package for encoding and decoding CSV files using the Codable protocol
g-cqd/CSVCoder

What's New

CSVCoder 0.0.4

2026-01-01T04:33:52Z

CSVCoder 0.0.4

Other Changes

  • refactor: deduplicate code, remove unused code, improve CI/CD (5f936d8)

Benchmark Results

Click to expand benchmark results
[0/1] Planning build
Building for production...
[0/2] Write swift-version--3A98693DF01E6000.txt
[1/5] Write sources
[4/6] Compiling CSVCoderMacros CSVIndexedMacro.swift
[5/6] Compiling CSVCoder CSVDecoder.swift
[5/8] Write sources
[8/9] Compiling CSVCoderTestFixtures TestModels.swift
[9/10] Compiling CSVCoderBenchmarks HardwareInfo.swift
Build of product 'CSVCoderBenchmarks' complete! (10.19s)
running Raw Parse 1M rows (Iterate Only)... done! (5625.67 ms)
running Raw Parse 1M rows (Iterate + String)... done! (6060.75 ms)
running Raw Parse 100K Quoted Rows (Iterate Only)... done! (450.64 ms)
running Raw Parse 100K Quoted Rows (Iterate + String)... done! (530.52 ms)
running Decode 1K rows (simple)... done! (12.03 ms)
running Decode 10K rows (simple)... done! (114.11 ms)
running Decode 100K rows (simple)... done! (1188.34 ms)
running Decode 1M rows (simple)... done! (12385.73 ms)
running Decode 10K rows (complex, 8 fields)... done! (303.27 ms)
running Decode 100K rows (complex, 8 fields)... done! (2736.97 ms)
running Decode 10K rows (quoted fields)... done! (105.92 ms)
running Decode 10K rows (50 columns wide)... done! (844.05 ms)
running Decode 10K rows (500-byte fields)... done! (340.99 ms)
running Decode 100K rows (numeric fields)... done! (1124.77 ms)
running Decode 50K orders (18 fields, optionals)... done! (2476.56 ms)
running Decode 100K transactions (13 fields)... done! (3845.81 ms)
running Decode 100K log entries (12 fields)... done! (4080.83 ms)
running Decode 10K stress-quoted (nested quotes, newlines)... done! (108.42 ms)
running Decode 50K Unicode-heavy rows... done! (603.47 ms)
running Decode 1K rows (10KB fields)... done! (766.73 ms)
running Decode 1K rows (200 columns wide)... done! (327.91 ms)
running Encode 1K rows... done! (8.28 ms)
running Encode 10K rows... done! (67.73 ms)
running Encode 100K rows... done! (708.67 ms)
running Encode 1M rows... done! (6341.55 ms)
running Encode 10K rows (quoted fields)... done! (55.74 ms)
running Encode 10K rows (500-byte fields)... done! (315.32 ms)
running Encode 50K orders (18 fields, optionals)... done! (1000.83 ms)
running Encode 100K rows to Data... done! (613.38 ms)
running Encode 100K rows to String... done! (591.36 ms)
running Decode 1K rows with snake_case conversion... done! (12.55 ms)
running Decode 1K rows with flexible date parsing... done! (536.99 ms)
running Decode 1K rows with flexible number parsing... done! (700.49 ms)
running Decode 100K rows (sequential, p=1)... done! (2550.62 ms)
running Decode 100K rows (parallel, p=all)... done! (2050.15 ms)
running Decode 1M rows (parallel, p=all)... done! (42485.14 ms)
running Encode 100K rows (sequential, p=1)... done! (571.42 ms)
running Encode 100K rows (parallel, p=all)... done! (246.71 ms)
running Encode 1M rows (parallel, p=all)... done! (2522.13 ms)
running Decode 100K from file (parallel)... done! (2487.10 ms)
running Encode 100K to file (parallel)... done! (486.89 ms)
running Mixed: Decode + Transform + Encode 10K... done! (182.87 ms)
running Mixed: Filter + Aggregate 100K orders... done! (2613.24 ms)
════════════════════════════════════════════════════════════════════════
                    CSVCoder Benchmark Suite
════════════════════════════════════════════════════════════════════════

Hardware:
  CPU:      Apple M1 (Virtual)
  Cores:    3 total (3P + 0E)
  Memory:   7 GB

Software:
  OS:       Version 15.7.2 (Build 24G325)
  Swift:    6.2+
  Build:    Release

════════════════════════════════════════════════════════════════════════


name                                               time               std        iterations
-------------------------------------------------------------------------------------------
Raw Parse 1M rows (Iterate Only)                    1888373041.000 ns ±   2.43 %          3
Raw Parse 1M rows (Iterate + String)                2033232792.000 ns ±   3.14 %          3
Raw Parse 100K Quoted Rows (Iterate Only)            146072084.000 ns ±   4.96 %          3
Raw Parse 100K Quoted Rows (Iterate + String)        170585667.000 ns ±   6.62 %          3
Decode 1K rows (simple)                                3713167.000 ns ±  14.36 %          3
Decode 10K rows (simple)                              37930041.000 ns ±   0.50 %          3
Decode 100K rows (simple)                            388280500.000 ns ±   4.29 %          3
Decode 1M rows (simple)                             4109932541.000 ns ±   2.64 %          3
Decode 10K rows (complex, 8 fields)                  100891000.000 ns ±  10.48 %          3
Decode 100K rows (complex, 8 fields)                 896302709.000 ns ±   3.64 %          3
Decode 10K rows (quoted fields)                       35414875.000 ns ±   0.58 %          3
Decode 10K rows (50 columns wide)                    284072584.000 ns ±   2.33 %          3
Decode 10K rows (500-byte fields)                    113264542.000 ns ±   0.77 %          3
Decode 100K rows (numeric fields)                    371084208.000 ns ±   1.88 %          3
Decode 50K orders (18 fields, optionals)             826619084.000 ns ±   1.62 %          3
Decode 100K transactions (13 fields)                1250223042.000 ns ±   4.73 %          3
Decode 100K log entries (12 fields)                 1227192500.000 ns ±  21.33 %          3
Decode 10K stress-quoted (nested quotes, newlines)    35378125.000 ns ±   4.02 %          3
Decode 50K Unicode-heavy rows                        187145416.000 ns ±  17.71 %          3
Decode 1K rows (10KB fields)                         268664750.000 ns ±  14.28 %          3
Decode 1K rows (200 columns wide)                    110969708.000 ns ±   6.68 %          3
Encode 1K rows                                         2389750.000 ns ±  46.66 %          3
Encode 10K rows                                       19455083.000 ns ±  30.30 %          3
Encode 100K rows                                     252983625.000 ns ±  12.18 %          3
Encode 1M rows                                      2000760083.000 ns ±  12.14 %          3
Encode 10K rows (quoted fields)                       18305708.000 ns ±   3.01 %          3
Encode 10K rows (500-byte fields)                    104931042.000 ns ±   1.08 %          3
Encode 50K orders (18 fields, optionals)             330624333.000 ns ±   2.17 %          3
Encode 100K rows to Data                             193927333.000 ns ±  10.73 %          3
Encode 100K rows to String                           194850041.000 ns ±   4.41 %          3
Decode 1K rows with snake_case conversion              3871292.000 ns ±  14.80 %          3
Decode 1K rows with flexible date parsing            180241250.000 ns ±   1.85 %          3
Decode 1K rows with flexible number parsing          216465375.000 ns ±  17.98 %          3
Decode 100K rows (sequential, p=1)                   861428209.000 ns ±   3.49 %          3
Decode 100K rows (parallel, p=all)                   710027750.000 ns ±   7.05 %          3
Decode 1M rows (parallel, p=all)                   14221885750.000 ns ±   1.33 %          3
Encode 100K rows (sequential, p=1)                   188691792.000 ns ±   2.47 %          3
Encode 100K rows (parallel, p=all)                    82414000.000 ns ±   0.45 %          3
Encode 1M rows (parallel, p=all)                     838160666.000 ns ±   2.77 %          3
Decode 100K from file (parallel)                     769300667.000 ns ±  20.14 %          3
Encode 100K to file (parallel)                       157407458.000 ns ±   9.65 %          3
Mixed: Decode + Transform + Encode 10K                61664417.000 ns ±   3.71 %          3
Mixed: Filter + Aggregate 100K orders                848897250.000 ns ±   5.85 %          3

Installation

Swift Package Manager:

dependencies: [
    .package(url: "https://github.com/g-cqd/CSVCoder.git", from: "0.0.4")
]

CSVCoder

A Swift CSV encoder/decoder using the Codable protocol, similar to JSONEncoder/JSONDecoder.

Features

  • Type-safe CSV encoding/decoding via Swift's Codable protocol
  • Zero-boilerplate macros (@CSVIndexed, @CSVColumn) for headerless CSV
  • Multi-encoding support (UTF-8, ISO-8859-1, Windows-1252, UTF-16, UTF-32)
  • Streaming encoding/decoding for O(1) memory with large files
  • Parallel encoding/decoding for multi-core performance
  • Smart error suggestions with typo detection and strategy hints
  • Configurable delimiters (comma, semicolon, tab, etc.)
  • Multiple date encoding strategies (ISO 8601, Unix timestamp, custom format)
  • Flexible decoding strategies for dates, numbers, and booleans with auto-detection
  • Key decoding strategies (snake_case, kebab-case, PascalCase conversion)
  • Index-based decoding for headerless CSV files
  • CSVIndexedDecodable for automatic column ordering via CodingKeys
  • Rich error diagnostics with row/column location information
  • Optional value handling with configurable nil encoding
  • SIMD-accelerated parsing and field scanning
  • Thread-safe with Sendable conformance
  • Swift 6.2 Approachable Concurrency compatible with nonisolated types

Requirements

  • iOS 18.0+ / macOS 15.0+
  • Swift 6.2+

Installation

Swift Package Manager

Add CSVCoder to your Package.swift:

dependencies: [
    .package(url: "https://github.com/g-cqd/CSVCoder.git", from: "1.0.0")
]

Or in Xcode: File → Add Package Dependencies → Enter the repository URL.

Usage

Encoding

import CSVCoder

struct Person: Codable {
    let name: String
    let age: Int
    let email: String?
}

let people = [
    Person(name: "Alice", age: 30, email: "alice@example.com"),
    Person(name: "Bob", age: 25, email: nil)
]

let encoder = CSVEncoder()
let csvString = try encoder.encodeToString(people)
// Output:
// name,age,email
// Alice,30,alice@example.com
// Bob,25,

Decoding

import CSVCoder

let csvData = """
name,age,email
Alice,30,alice@example.com
Bob,25,
""".data(using: .utf8)!

let decoder = CSVDecoder()
let people = try decoder.decode([Person].self, from: csvData)

Configuration

let config = CSVEncoder.Configuration(
    delimiter: ";",                           // Use semicolon
    includeHeaders: true,                     // Include header row
    dateEncodingStrategy: .iso8601,           // ISO 8601 dates
    nilEncodingStrategy: .emptyString,        // Empty string for nil
    lineEnding: .crlf                         // Windows line endings
)

let encoder = CSVEncoder(configuration: config)

Date Encoding Strategies

  • .iso8601 - ISO 8601 format (default)
  • .secondsSince1970 - Unix timestamp in seconds
  • .millisecondsSince1970 - Unix timestamp in milliseconds
  • .formatted(String) - Custom date format string
  • .custom((Date) throws -> String) - Custom closure

Single Row Encoding

let person = Person(name: "Alice", age: 30, email: "alice@example.com")
let row = try encoder.encodeRow(person)
// Output: Alice,30,alice@example.com

Streaming Encoding

Encode large datasets with O(1) memory usage:

// Stream encode to file
try await encoder.encode(asyncSequence, to: fileURL)

// Stream encode array to file
try await encoder.encode(largeArray, to: fileURL)

// Encode to async stream of rows
for try await row in encoder.encodeToStream(asyncSequence) {
    sendToNetwork(row)
}

Parallel Encoding

Utilize multiple cores for faster encoding:

// Parallel encode to file
try await encoder.encodeParallel(records, to: fileURL,
    parallelConfig: .init(parallelism: 8))

// Parallel encode to Data
let data = try await encoder.encodeParallel(records)

// Batched parallel for progress reporting
for try await batch in encoder.encodeParallelBatched(records,
    parallelConfig: .init(chunkSize: 10_000)) {
    print("Encoded \(batch.count) rows")
}

Advanced Decoding

Key Decoding Strategies

Automatically convert CSV header names to Swift property names:

struct User: Codable {
    let firstName: String
    let lastName: String
    let emailAddress: String
}

let csv = """
first_name,last_name,email_address
John,Doe,john@example.com
"""

// snake_case headers → camelCase properties
let config = CSVDecoder.Configuration(
    keyDecodingStrategy: .convertFromSnakeCase
)
let decoder = CSVDecoder(configuration: config)
let users = try decoder.decode([User].self, from: csv)

Available strategies:

  • .useDefaultKeys - Use headers as-is (default)
  • .convertFromSnakeCase - first_namefirstName
  • .convertFromKebabCase - first-namefirstName
  • .convertFromScreamingSnakeCase - FIRST_NAMEfirstName
  • .convertFromPascalCase - FirstNamefirstName
  • .custom((String) -> String) - Custom transformation

Column Mapping

Map specific CSV headers to property names:

struct Product: Codable {
    let id: Int
    let name: String
    let price: Double
}

let csv = """
product_id,product_name,unit_price
1,Widget,9.99
"""

let config = CSVDecoder.Configuration(
    columnMapping: [
        "product_id": "id",
        "product_name": "name",
        "unit_price": "price"
    ]
)

Index-Based Decoding

Decode headerless CSV files by column index:

let csv = """
Alice,30,95.5
Bob,25,88.0
"""

let config = CSVDecoder.Configuration(
    hasHeaders: false,
    indexMapping: [0: "name", 1: "age", 2: "score"]
)
let decoder = CSVDecoder(configuration: config)
let records = try decoder.decode([Person].self, from: csv)

@CSVIndexed Macro (Zero Boilerplate)

Eliminate all boilerplate for headerless CSV with the @CSVIndexed macro:

@CSVIndexed
struct Person: Codable {
    let name: String
    let age: Int
    let score: Double
}

// No manual CodingKeys or typealias needed
let config = CSVDecoder.Configuration(hasHeaders: false)
let decoder = CSVDecoder(configuration: config)
let people = try decoder.decode([Person].self, from: csv)

The macro generates CodingKeys, CSVCodingKeys, and protocol conformance automatically.

Custom Column Names with @CSVColumn

Map properties to different CSV column names:

@CSVIndexed
struct Product: Codable {
    let id: Int

    @CSVColumn("product_name")
    let name: String

    @CSVColumn("unit_price")
    let price: Double
}

CSVIndexedDecodable (Manual Protocol)

For more control, conform to CSVIndexedDecodable manually:

struct Person: CSVIndexedDecodable {
    let name: String
    let age: Int
    let score: Double

    // CodingKeys order defines column order
    enum CodingKeys: String, CodingKey, CaseIterable {
        case name, age, score  // Column 0, 1, 2
    }

    typealias CSVCodingKeys = CodingKeys
}

// No indexMapping needed - decoder auto-detects CSVIndexedDecodable conformance
let config = CSVDecoder.Configuration(hasHeaders: false)
let decoder = CSVDecoder(configuration: config)
let people = try decoder.decode([Person].self, from: csv)

The order of cases in CodingKeys determines the column mapping automatically. The decoder detects CSVIndexedDecodable conformance at runtime, so you use the same decode() method as regular Codable types.

Flexible Decoding Strategies

Date Decoding

Auto-detect dates from 20+ common formats:

let config = CSVDecoder.Configuration(
    dateDecodingStrategy: .flexible  // Auto-detect ISO, US, EU formats
)

Or provide a hint for better performance:

let config = CSVDecoder.Configuration(
    dateDecodingStrategy: .flexibleWithHint(preferred: "yyyy-MM-dd")
)

Available strategies:

  • .deferredToDate - Use Date's Decodable implementation (default)
  • .iso8601 - ISO 8601 format
  • .secondsSince1970 / .millisecondsSince1970 - Unix timestamps
  • .formatted(String) - Custom date format
  • .flexible - Auto-detect from common patterns
  • .flexibleWithHint(preferred:) - Try preferred format first, then auto-detect
  • .custom((String) throws -> Date) - Custom closure

Number Decoding

Handle international number formats:

let config = CSVDecoder.Configuration(
    numberDecodingStrategy: .flexible  // Auto-detect US/EU formats, strip currency
)

Available strategies:

  • .standard - Swift's standard number parsing (default)
  • .flexible - Auto-detect 1,234.56 (US) and 1.234,56 (EU), strip currency symbols
  • .locale(Locale) - Use specific locale for parsing

Boolean Decoding

Support international boolean values:

let config = CSVDecoder.Configuration(
    boolDecodingStrategy: .flexible  // Recognize oui/non, ja/nein, да/нет, etc.
)

Available strategies:

  • .standard - Recognize true/yes/1, false/no/0 (default)
  • .flexible - Extended i18n values (oui/non, ja/nein, да/нет, 是/否, etc.)
  • .custom(trueValues:falseValues:) - Custom value sets

Error Diagnostics

Decoding errors include precise location information:

do {
    let records = try decoder.decode([Person].self, from: csv)
} catch let error as CSVDecodingError {
    print(error.errorDescription!)
    // "Type mismatch: expected Int, found 'invalid' at row 3, column 'age'"

    if let location = error.location {
        print("Row: \(location.row ?? 0)")      // 3
        print("Column: \(location.column ?? "")")  // "age"
    }
}

Swift 6.2 Approachable Concurrency

CSVCoder is compatible with projects using SWIFT_DEFAULT_ACTOR_ISOLATION = MainActor. All encoding/decoding types are marked nonisolated to allow usage from any actor context.

Performance

Benchmark Environment:

  • CPU: Apple M2 Pro
  • Cores: 10 (6 performance + 4 efficiency)
  • Memory: 16 GB
  • OS: macOS 26.3
  • Swift: 6.2+
  • Build: Release

Performance Characteristics

CSVCoder uses SIMD-accelerated parsing with 64-byte vector operations and SWAR (SIMD Within A Register) for 8-byte fallback processing. This optimization is particularly effective for:

  • Quoted fields with long spans of non-structural bytes (~15% faster)
  • Large fields (500+ bytes) where vectorized scanning shines
  • Unicode-heavy content processed efficiently in bulk

For CSV files with many short, simple fields, the SIMD overhead is minimal but present. The trade-off favors real-world CSV data which typically contains quoted text fields.

Decoding

Benchmark Time Throughput
1K rows (simple) 3.2 ms ~313K rows/s
10K rows (simple) 32 ms ~313K rows/s
100K rows (simple) 326 ms ~307K rows/s
1M rows (simple) 3.29 s ~304K rows/s
10K rows (complex, 8 fields) 74 ms ~135K rows/s
10K rows (quoted fields) 30 ms ~333K rows/s
10K rows (50 columns wide) 259 ms ~39K rows/s
10K rows (500-byte fields) 96 ms ~104K rows/s
100K rows (numeric fields) 329 ms ~304K rows/s

Real-World Scenarios

Benchmark Time Throughput
50K orders (18 fields, optionals) 765 ms ~65K rows/s
100K transactions (13 fields) 1.15 s ~87K rows/s
100K log entries (12 fields) 1.11 s ~90K rows/s
10K stress-quoted (nested quotes, newlines) 25 ms ~400K rows/s
50K Unicode-heavy rows 149 ms ~336K rows/s
1K rows (10KB fields) 154 ms ~6.5K rows/s
1K rows (200 columns wide) 91 ms ~11K rows/s

Encoding

Benchmark Time Throughput
1K rows 1.6 ms ~625K rows/s
10K rows 16 ms ~625K rows/s
100K rows 165 ms ~606K rows/s
1M rows 1.64 s ~610K rows/s
10K rows (500-byte fields) 96 ms ~104K rows/s
50K orders (18 fields, optionals) 291 ms ~172K rows/s
100K rows to Data 164 ms ~610K rows/s
100K rows to String 167 ms ~599K rows/s

Parallel Processing

Benchmark Sequential Parallel Speedup
Encode 100K rows 168 ms 61 ms 2.75x
Encode 100K to file - 66 ms -
Encode 1M rows - 609 ms -
Decode 100K rows 759 ms 897 ms 0.85x
Decode 100K from file - 935 ms -
Decode 1M rows (parallel) - 17.3 s -

Mixed Workloads (Real-World Simulation)

Benchmark Time
Decode + Transform + Encode 10K 51 ms
Filter + Aggregate 100K orders 760 ms

Raw High-Performance API (Codable Bypass)

For performance-critical tasks (pre-processing, filtering, or massive datasets), you can bypass Codable overhead entirely using the zero-copy CSVParser API. This achieves ~2x higher throughput.

Safe Usage: Use the CSVParser.parse(data:) wrapper to ensure memory safety.

let data = Data(contentsOf: bigFile)

// Count rows where age > 18
let count = try CSVParser.parse(data: data) { parser in
    var validCount = 0
    for row in parser {
        // 'row' is a zero-allocation View
        // Access fields by index (0-based)
        if let ageStr = row.string(at: 1), let age = Int(ageStr), age > 18 {
            validCount += 1
        }
    }
    return validCount
}

This approach avoids allocating struct or class instances for every row, drastically reducing ARC traffic.

Raw API Benchmarks

Benchmark Time Throughput Speedup vs Codable
Raw Parse 1M rows (Iterate Only) 1.61 s ~621K rows/s 2.04x
Raw Parse 1M rows (Iterate + String) 1.71 s ~585K rows/s 1.92x
Raw Parse 100K Quoted (Iterate Only) 122 ms ~820K rows/s -
Raw Parse 100K Quoted (Iterate + String) 154 ms ~649K rows/s -

Special Strategies (1K rows)

Benchmark Time Throughput
snake_case key conversion 3.3 ms ~307K rows/s
Flexible date parsing 142 ms ~7.0K rows/s
Flexible number parsing 222 ms ~4.5K rows/s

Run benchmarks locally:

swift run -c release CSVCoderBenchmarks

License

MIT License

Description

  • Swift Tools 6.2.0
View More Packages from this Author

Dependencies

Last updated: Sun Apr 19 2026 07:31:52 GMT-0900 (Hawaii-Aleutian Daylight Time)