2024 Cyber Apocalypse: Crushing

Challenge Information

AttributeDetails
Event2024 Cyber Apocalypse
CategoryReverse Engineering
ChallengeCrushing
DifficultyEasy

Summary

Crushing is a reverse engineering challenge requiring analysis of a custom compression/serialization algorithm. The binary reads input, builds linked list structures for each byte value, and serializes this structure to a file. By reverse engineering this process, the original message can be recovered.


Analysis

Algorithm Overview

The binary implements a custom serialization scheme:

  1. Input Processing: Reads character input and tracks positions
  2. Data Structure: Creates a map (array of 256 buckets) where each bucket contains a linked list
  3. Linked List Building: For each character at position pos, a node is created with {pos, next}
  4. Serialization: Writes to file in a specific format

Serialization Format

For each of the 256 possible byte values:

  1. Write the count of occurrences (8-byte little-endian integer)
  2. Write each position where that byte appeared (8-byte little-endian integers)

Example:

Byte 0x41 (A) appears at positions [0, 5, 10]:
- Write: 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (count = 3)
- Write: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 0)
- Write: 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 5)
- Write: 0x0A 0x00 0x00 0x00 0x00 0x00 0x00 0x00 (position 10)

Solution

Step 1: Understand the Reverse Process

To decode, we need to:

  1. Read the occurrence count for each byte value (0-255)
  2. Read the positions where each byte appeared
  3. Reconstruct the original data by placing each byte at its correct position

Step 2: Implement the Decoder

def deserialize_and_reconstruct(encoded_file):
with open(encoded_file, 'rb') as f:
# Step 1: Read and reconstruct the map
byte_occurrences = {}
for byte_val in range(256):
# Read the length (number of occurrences) for this byte value
length = int.from_bytes(f.read(8), 'little')
if length > 0:
byte_occurrences[byte_val] = []
for _ in range(length):
# Read each occurrence order
order = int.from_bytes(f.read(8), 'little')
byte_occurrences[byte_val].append(order)
# Step 2: Reconstruct the original sequence
original_data = [None] * sum(len(orders) for orders in byte_occurrences.values())
for byte_val, orders in byte_occurrences.items():
for order in orders:
original_data[order] = byte_val
# Convert byte values back to bytes
original_bytes = bytes(original_data)
return original_bytes

Step 3: Execute the Decoder

def main():
encoded_file = 'message.txt.cz'
original_data = deserialize_and_reconstruct(encoded_file)
# Output the original data
with open('decoded_message.txt', 'wb') as f:
f.write(original_data)
print("Decoded message written to decoded_message.txt")
print(original_data.decode())
if __name__ == '__main__':
main()

Complete Decoder Script

from struct import unpack
def deserialize_and_reconstruct(encoded_file):
with open(encoded_file, 'rb') as f:
byte_occurrences = {}
for byte_val in range(256):
length = int.from_bytes(f.read(8), 'little')
if length > 0:
byte_occurrences[byte_val] = []
for _ in range(length):
order = int.from_bytes(f.read(8), 'little')
byte_occurrences[byte_val].append(order)
# Reconstruct original sequence
max_pos = max(max(orders) for orders in byte_occurrences.values() if orders)
original_data = [None] * (max_pos + 1)
for byte_val, orders in byte_occurrences.items():
for order in orders:
original_data[order] = byte_val
return bytes(original_data)
if __name__ == '__main__':
encoded_file = 'message.txt.cz'
original_data = deserialize_and_reconstruct(encoded_file)
with open('decoded_message.txt', 'wb') as f:
f.write(original_data)
print("Decoded message:")
print(original_data.decode())

Key Takeaways

  • Understanding data structures and their serialization formats is crucial for reverse engineering
  • Linked lists and their in-memory representations differ from serialized formats
  • Custom compression/serialization schemes must be completely reverse engineered
  • Binary file format analysis requires understanding byte ordering (little-endian vs big-endian)
  • Reconstructing original data from serialized format requires careful position/index tracking
  • Testing the decoder on sample data validates the understanding

Flag: HTB{cr4sh1ng_1nt0_r3v3rs1ng}