Original post on reddit: https://old.reddit.com/r/VoxelGameDev/comments/9yu8qy/palettebased_compression_for_chunked_discrete/

Disclaimer:
This is the very first time I am writing an article about anything at all, and with a rather large amount of text to boot. The writing may not be coherent in some places, so if you find any mistakes (grammar, code/logic, etc.) or have a suggestion to improve it, throw it right in my face! I need the learning experience... ;D

This also happens to be my first post here, so hello /r/VoxelGameDev!


Palette-based compression for chunked discrete voxel data.

With the release of Minecraft 1.13, Mojang added lossless compression for voxel/block data to the game, based upon the basic concept of color palettes (think GIF's!).

This 'new' type of compression has numerous advantages...

There are probably more things that are possible with this new system (and maybe some things are made impossible?)... but these three points should be enough to make one ask "how does it work?"...


How does it work?

To make this work, it is necessary to separate block groups/chunks from their actual storage of block data. Doing so results in a new structure henceforth called BlockStorage.

The BlockStorage data-structure is basically a GIF-Image, but instead of storing RGB-colors in the palette, we store block types & states in it!

At its simplest, it is a combination of a buffer holding variable-bit-length indices, and a palette (array) into which these indices point, whose entries hold the actual block type & state data, including a reference counter, to track how often the block type is being used.

The magic of this technique is, that the less types of blocks are in a certain area, the less memory that area actually needs, both at runtime and when serialized. It is important to remember that a world of discrete voxels (eg: Minecraft), generally consists of large amounts of air and stone, with a few blocks mixed in; these regions thus only need one or two bits for every block stored. They are balanced out by regions of 'high entropy', like forests of any kind, player-built structures and overall the 'surface' of the world.

Below is a semi-complete implementation of BlockStorage, expressed in java-ish pseudocode:

// a group of blocks
class Chunk {
  public static final int CHUNK_SIZE = ...;
  private BlockStorage storage = new BlockStorage(CHUNK_SIZE pow 3);

  public void setBlock(x, y, z, BlockType type) {
     int index = /* turn x/y/z into a index */;
     storage.setBlock(index, type);
  }

  public BlockType getBlock(x, y, z) {
     int index = /* turn x/y/z into a index */;
     return storage.getBlock(index);
  }
}

// the actual storage for blocks
class BlockStorage {
  private final int size;
  private BitBuffer data;
  private PaletteEntry[] palette;
  private int paletteCount;
  private int indicesLength;

  public BlockStorage(int size) {
          this.size = size;
          this.indicesLength = 1;
          this.paletteCount = 0;
          this.palette = new PalleteEntry[2 pow indicesLength];
          this.data = new BitBuffer(size * indicesLength); // the length is in bits, not bytes!
  }

  public void setBlock(index, type) {
    int paletteIndex = data.get(index * indicesLength, indicesLength);
    PaletteEntry current = palette[paletteIndex];

    // Whatever block is there will cease to exist in but a moment...
    current.refcount -= 1;

    // The following steps/choices *must* be ordered like they are.

    // --- Is the block-type already in the palette?
    int replace = palette.indexOf((entry) -> {entry.type.equals(type)});
    if(replace != -1) {
      // YES: Use the existing palette entry.
      data.set(index * indicesLength, indicesLength, replace);
      palette[replace].refcount += 1;
      return;
    }

    // --- Can we overwrite the current palette entry?
    if(current.refcount == 0) {
      // YES, we can!
      current.type = type;
      current.refcount = 1;
      return;
    }

    // --- A new palette entry is needed!

    // Get the first free palette entry, possibly growing the palette!
    int newEntry = newPaletteEntry();

    palette[newEntry] = new PaletteEntry(1, type);
    data.set(index * indicesLength, indicesLength, newEntry);
    paletteCount += 1;
  }

  public BlockType getBlock(index) {
    int paletteIndex = data.get(index * indicesLength, indicesLength);
    return palette[paletteIndex].type;
  }

  private int newPaletteEntry() {
     int firstFree = palette.indexOf((entry) -> {entry == null || entry.refcount == 0});

     if(firstFree != -1) {
       return firstFree;
     }

     // No free entry?
     // Grow the palette, and thus the BitBuffer
     growPalette();

     // Just try again now!
     return newPaletteEntry();
  }

  private void growPalette() {
    // decode the indices
    int[] indices = new int[size];
    for(int i = 0; i < indices.length; i++) {
      indices[i] = data.get(i * indicesLength, indicesLength);
    }

    // Create new palette, doubling it in size
    indicesLength = indicesLength << 1;
    PaletteEntry[] newPalette = new PaletteEntry[2 pow indicesLength];
    System.arrayCopy(palette, 0, newPalette, 0, paletteCount);
    palette = newPalette;

    // Allocate new BitBuffer
    data = new BitBuffer(size * indicesLength); // the length is in bits, not bytes!

    // Encode the indices
    for(int i = 0; i < indices.length; i++) {
      data.set(i * indicesLength, indicesLength, indices[i]);
    }
  }

  // Shrink the palette (and thus the BitBuffer) every now and then.
  // You may need to apply heuristics to determine when to do this.
  public void fitPalette() {
    // Remove old entries...
    for(int i = 0; i < palette.length; i++) {
      if(palette[i].refcount == 0) {
        palette[i] = null;
        paletteCount -= 1;
      }
    }

    // Is the palette less than half of its closest power-of-two?
    if(paletteCount > powerOfTwo(paletteCount)/2) {
      // NO: The palette cannot be shrunk!
      return;
    }

    // decode all indices
    int[] indices = new int[size];
    for(int i = 0; i < indices.length; i++) {
      indices[i] = data.get(i * indicesLength, indicesLength);
    }

    // Create new palette, halfing it in size
    indicesLength = indicesLength >> 1;
    PaletteEntry[] newPalette = new PaletteEntry[2 pow indicesLength];

    // We gotta compress the palette entries!
    int paletteCounter = 0;
    for(int pi = 0; pi < palette.length; pi++, paletteCounter++) {
      PaletteEntry entry = newPalette[paletteCounter] = palette[pi];

      // Re-encode the indices (find and replace; with limit)
      for(int di = 0, fc = 0; di < indices.length && fc < entry.refcount; di++) {
        if(pi == indices[di]) {
          indices[di] = paletteCounter;
          fc += 1;
        }
      }
    }

    // Allocate new BitBuffer
    data = new BitBuffer(size * indicesLength); // the length is in bits, not bytes!

    // Encode the indices
    for(int i = 0; i < indices.length; i++) {
      data.set(i * indicesLength, indicesLength, indices[i]);
    }
  }
}

// Ideally, this would be a struct.
class PaletteEntry {
  public int refcount = 0;
  public BlockType type;
}

Notes:

interface BitBuffer {
  void set(int bitIndex, int bitLength, int bits);
  int get(int bitIndex, int bitLength);
}

Additional things that can be done to increase performance and reduce memory usage:


Things one should avoid doing: