running-length-encoding
Running-length encoding, or RLE, is a very simple kind of encoding/compression method. The idea is to convert data into datum-count pairs. For example the string "AAABBBBCC" would become something like "3A4B2C", and the string "ABC" would become "1A1B1C".
One could easily see the cases where one would benifit from this are those where the data contains lots of repeating "single-width" patterns; "single-width", as in the width of the repeating pattern is of the width of a single datum. (For example: the string "ABABABABABABABAB" (16 characters) would become something like "8AB" (3 characters) when the width of a single datum is two characters but will become "1A1B" repeated 8 times (32 characters) when the width of a single datum is one character.) For this reason, it was (and probably still is) very prominent in encoding bitmap textures and images, where one would often have big swashes of single color.
Encoding
Encoding problem (that are similar to those in other compression method, e.g. LZ77) exists. First of all, it obviously would be a waste to encode every single-width datum as "1 + datum" (which, in the worst case, would lead to 1/W percent extra cost, where W is the width of a single datum). A common way is to have two commands of forms (LIT, length, data) and (RLE, length, datum) respectively.
Consider this example encoding scheme of RLE:
| Command | Width (Bytes) | Format | Semantics |
|---|---|---|---|
| LIT | 1 + n | 0xxxxxxx … | Encode a literal sequence of length 1~128 (denoted with x). The data follows after the first byte. |
| RLE | 1 + 1 | 1xxxxxxx b | Encode a repeating sequence of length 1~128 (denoted with x) of the byte b. |
It has the following properties:
- In the worst case it introduces
1/Mpercent extra cost than original, whereMis the maximum length allowed in anLITcommand. - It doesn't make sense to encode repeating sequence of length equal or below 3, since the introduction of an
RLEcommand introduces 2 or 2+1 bytes of cost (2 for theRLEcommand, and 1 for the beginning of the possible nextLITcommand).
2024.8.17