Skip to content

Implementation Details

This page lists suggestions for when the hash cache is eventually implemented.

No Embedded Hash Table

The Hash Cache format does not include an embedded HashTable.

Instead, we generate one at runtime after loading.

Populating a HashTable with all the file name hashes is cheap because:

  • The total number of items is known.
  • All of the names are sequential in the file, thus we have high cache hit rate.

With a non-embedded hashtable, it's also possible to use a more efficient implementation; for example, SwissTable works better in usize groups due to high NEON latency.

Finding an Entry

Implementation tip for Rust HashBrown HashTable.

An entry could be stored as:

pub struct TableEntry
{
    pub key: XXH3, // u64
}

And query with:

table.find(KEY_HASH, |&val| val.key == KEY_HASH);

This performs a fast key only query.

File Path Size Optimization

The file entries can be rearranged to optimize for compression ratios.

The entries can be sorted such that the strings in the File Paths Section are lexicographically sorted before compression. This far increases the likelihood of similar paths being grouped together; leading to a smaller file, and thus faster load.