Skip to content

Optimizations

Some parts of this section will compare against the original unreleased C# implementation

This is to give an idea of where things can be improved while translating some existing code.

Strings

All strings internally are stored in the platform's native string format.

Encoding

Unix-based Systems

On Linux and other unix-based systems, a file or folder is usually represented by a sequence of bytes.

We'll assume UTF-8, as that's the case on pretty much all modern distros.

Windows

On Windows, that is a sequence of 16-bit characters (UTF-16).

Although Windows has ANSI (A) and Unicode (W) APIs side by side, under the hood, Windows converts everything to UTF-16.

In order to speed things up, it is planned to use UTF-16 strings internally for non-ASCII input to avoid re-encoding them like in the original C# implementation.

This may change, in Rust version, we may investigate the tradeoff.

Case Conversion

In Windows, expect case insensitive paths.

We have to convert paths to uppercase inside various hooks.

We choose uppercase because some languages e.g. greek have special rules for lowercase

This saves us from having to deal with those rules, improving performance.

Previous Implementation

In the C# implementation, we backported a ToUpper function from one of the .NET 8 Previews. Implementation Here. (This function is faster than the one which shipped in .NET 8 proper)

In said implementation, the code is optimized for 99% of the cases where the paths are pure ASCII; with rare other cases falling to a slower implementation.

Rust Implementation

In Rust we can use the make_ascii_uppercase function if we can trust the paths to be ASCII for comparisons. Alternatively there's also functions like to_uppercase.

We may want to copy out the implementation of to_uppercase to return result if string is ASCII

Rather than explicitly checking in another step, which is expensive.

These are not vectorised in the Rust source, but thanks to LLVM magic, they do get vectorised, so there is generally no need to write our own. The to_uppercase function is generally limited to unrolled SSE2, while make_ascii_uppercase will use rolled AVX2 if possible on CPU target.

Hashing

A hardware accelerated string hash function is used for file paths.

In the original C# implementation, this was a vectorized function which now lives here Reloaded.Memory String Hash.

For the Rust implementation, it's we will use AHash. See the Common/Hashing page for more details.

  • All strings stored as Wide Strings.
    • Windows APIs use Wide Strings under the hood, even for ANSI APIs.
    • Therefore we save time by not having to widen them again.

HashMaps

HashMaps need to be able to query string slices.

In original C# implementation, it was necessary to use a custom dictionary that could query string slices as keys, due to limitation of the built in one.

In Rust, this won't be needed; as thanks to the Equivalent trait. Or using the HashTable API raw.

Hashbrown (Swisstable) is also faster.

Memory Storage

In the Rust implementation, we will pack strings together for efficiency.

In C#, some code was limited to the standard String type due to interoperability with existing APIs. With some additional foresight and experience, we can do a bit better in Rust.

The idea is to use string pooling and custom string headers. Rather than having an allocation for each string, we can perform a single memory allocation and then store all of the strings inline in a single buffer.

// Pseudocode
pub struct StringEntry
{
    /// Truth Table for char_count_and_flag:
    ///
    /// | Bit Position (from LSB)        | 0 (IsAscii flag) | 1-15 (char_count)    |
    /// |--------------------------------|------------------|----------------------|
    /// | Value when ASCII               | 1                | ASCII char count     |
    /// | Value when non-ASCII (Windows) | 0                | UTF-16 char count    |
    /// | Value when non-ASCII (Unix)    | 0                | UTF-8 byte count     |
    ///
    /// - The least significant bit (LSB, position 0) is used as the `IsAscii` flag.
    ///   If it's 1, the string is ASCII. If it's 0, the string is non-ASCII.
    ///
    /// - The remaining 15 bits (positions 1-15) are used to store the character count.
    ///   If the string is ASCII, this is the count of ASCII characters.
    ///   If the string is non-ASCII, on Windows this is the count of UTF-16 characters,
    ///   and on Unix this is the count of UTF-8 bytes.
    char_count_and_flag: u16,

    // Inline in memory, right after header
    // Or u8 if using ANSI strings.
    data: u16[length]
}

In 99% of cases, file paths will map to ASCII, meaning we only need 1 byte per character.

On Windows, where paths are internally UTF-16, we can then dynamically widen these paths to UTF-16 when needed, as that operation is very cheap for ASCII strings.

Godbolt example.

#[no_mangle]
pub fn ascii_to_utf16(ascii_str: &str, buffer: &mut [u16]) {
    let bytes = ascii_str.as_bytes();
    let len = bytes.len();
    for i in 0..len {
        buffer[i] = bytes[i] as u16;
    }
}

Auto-vectorizes nicely, thanks to LLVM.

Pooled Strings in HashTables

And for Hashbrown's low level HashTable primitive.

// 16 bytes, nicely aligns with cache lines
pub struct TableEntry
{
    // 64-bit key
    pub key: u64,

    // Offset for directory string in pool.
    // We deduplicate strings by directory, reducing memory usage further.
    // On average there's typically 7 files per 1 folder in Sewer's Steam folder.
    pub dir: u32,

    // Offset for file string in the pool.
    pub file: u32,
}

// If additional metadata is required, trim `dir` and `file` to 30 bits,
// and use remaining space for metadata.

And to search, we should compare key at minimum

// To search (hash at minimum, as shown here)
table.find(KEY_HASH, |&val| val.key == KEY_HASH);

// TODO: Add string comparison here.

Lookup Tree

After creating the 'Redirection Tree' we create a 'Lookup Tree' for translating file paths

More details on both structures can be found in 'Implementation Details' section.

The LookupTree allows us to resolve file paths in O(3) time, at expense of some RAM.