This project's purpose was to educate me (jjrscott) on how the LZW works. While build this project I realised that the LZW algorithm as originally states has two distinct parts:
- encoding characters into an alphabet and pre-populating a prefix table,
- (the part normally though of as LZW) expanding the prefix table and encoding table indexes into an output stream.
I can use Swift's enum
to abstract away the need for a pre-populated string table / alphabet to reveal the algorthim in a more pure form.
From Rosetta Code:
The Lempel-Ziv-Welch (LZW) algorithm provides loss-less data compression.
You can read a complete description of it in the Wikipedia article on the subject. It was patented, but it entered the public domain in 2004.
For some reason all demos use TOBEORNOTTOBEORTOBEORNOT
as the input string. The following example takes a string and produces an integer array, just as the original demos do.
import LZW
let original = "TOBEORNOTTOBEORTOBEORNOT"
let compressed = LZW.compress(original.utf8.map({$0}))
print("compressed: \(compressed)")
let encoded = compressed.map { unit in
switch unit {
case .element(let value):
return UInt(value)
case .index(let index):
return UInt(UInt8.max) + UInt(index)
}
}
print("encoded: \(encoded)")
Output:
compressed: [.element(84), .element(79), .element(66), .element(69), .element(79),
.element(82), .element(78), .element(79), .element(84), .index(0),
.index(2), .index(4), .index(9), .index(3), .index(5), .index(7)]
encoded: [84, 79, 66, 69, 79, 82, 78, 79, 84, 256, 258, 260, 265, 259, 261, 263]
A Technique for High-Performance Data Compression:
Initialize table to contain single-character strings
Read first input character K:
K → prefix string ω;
Step: Read next input character K
If no such K (input exhausted):
code(ω) → output;
EXIT.
If ωK exists in string table:
ωK → ω;
repeat Step.
else ωK not in string table:
code(ω) → output;
ωK → string table;
K → ω;
repeat Step.