2024 Cyber Apocalypse: Crushing

Challenge Information

Attribute	Details
Event	2024 Cyber Apocalypse
Category	Reverse Engineering
Challenge	Crushing
Difficulty	Easy

Crushing is a reverse engineering challenge requiring analysis of a custom compression/serialization algorithm. The binary reads input, builds linked list structures for each byte value, and serializes this structure to a file. By reverse engineering this process, the original message can be recovered.

Analysis

Algorithm Overview

The binary implements a custom serialization scheme:

Input Processing: Reads character input and tracks positions
Data Structure: Creates a map (array of 256 buckets) where each bucket contains a linked list
Linked List Building: For each character at position pos, a node is created with {pos, next}
Serialization: Writes to file in a specific format

Serialization Format

For each of the 256 possible byte values:

Write the count of occurrences (8-byte little-endian integer)
Write each position where that byte appeared (8-byte little-endian integers)

Example:

Byte 0x41 (A) appears at positions [0, 5, 10]:
- Write: 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (count = 3)
- Write: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 0)
- Write: 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 5)
- Write: 0x0A 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 10)

Solution

Step 1: Understand the Reverse Process

To decode, we need to:

Read the occurrence count for each byte value (0-255)
Read the positions where each byte appeared
Reconstruct the original data by placing each byte at its correct position

Step 2: Implement the Decoder

def deserialize_and_reconstruct(encoded_file):
    with open(encoded_file, 'rb') as f:
        # Step 1: Read and reconstruct the map
        byte_occurrences = {}
        for byte_val in range(256):
            # Read the length (number of occurrences) for this byte value
            length = int.from_bytes(f.read(8), 'little')
            if length > 0:
                byte_occurrences[byte_val] = []
                for _ in range(length):
                    # Read each occurrence order
                    order = int.from_bytes(f.read(8), 'little')
                    byte_occurrences[byte_val].append(order)

        # Step 2: Reconstruct the original sequence
        original_data = [None] * sum(len(orders) for orders in byte_occurrences.values())
        for byte_val, orders in byte_occurrences.items():
            for order in orders:
                original_data[order] = byte_val

    # Convert byte values back to bytes
    original_bytes = bytes(original_data)
    return original_bytes

Step 3: Execute the Decoder

def main():
    encoded_file = 'message.txt.cz'
    original_data = deserialize_and_reconstruct(encoded_file)

    # Output the original data
    with open('decoded_message.txt', 'wb') as f:
        f.write(original_data)
    print("Decoded message written to decoded_message.txt")
    print(original_data.decode())

if __name__ == '__main__':
    main()

Complete Decoder Script

from struct import unpack

def deserialize_and_reconstruct(encoded_file):
    with open(encoded_file, 'rb') as f:
        byte_occurrences = {}
        for byte_val in range(256):
            length = int.from_bytes(f.read(8), 'little')
            if length > 0:
                byte_occurrences[byte_val] = []
                for _ in range(length):
                    order = int.from_bytes(f.read(8), 'little')
                    byte_occurrences[byte_val].append(order)

        # Reconstruct original sequence
        max_pos = max(max(orders) for orders in byte_occurrences.values() if orders)
        original_data = [None] * (max_pos + 1)

        for byte_val, orders in byte_occurrences.items():
            for order in orders:
                original_data[order] = byte_val

    return bytes(original_data)

if __name__ == '__main__':
    encoded_file = 'message.txt.cz'
    original_data = deserialize_and_reconstruct(encoded_file)

    with open('decoded_message.txt', 'wb') as f:
        f.write(original_data)

    print("Decoded message:")
    print(original_data.decode())

Key Takeaways

Understanding data structures and their serialization formats is crucial for reverse engineering
Linked lists and their in-memory representations differ from serialized formats
Custom compression/serialization schemes must be completely reverse engineered
Binary file format analysis requires understanding byte ordering (little-endian vs big-endian)
Reconstructing original data from serialized format requires careful position/index tracking
Testing the decoder on sample data validates the understanding

Flag: HTB{cr4sh1ng_1nt0_r3v3rs1ng}