Requesting Senior Feedback on HexSpell: My Executable Parser Library

Hi Rustaceans,

I'm looking for a code review of my Rust library, HexSpell, an open source tool for parsing and modifying PE (Windows), ELF (Linux) and Mach-O (macOS) binaries. The goal is to provide a flexible, low-dependency solution for handling executable file formats across platforms.

Context

I am relatively new to programming and Rust, and when I started this project, I knew very little about executable structures. I've focused on learning, and I've invested a lot of time in it.

While I'd welcome feedback on the Rust code itself, I'm especially interested in a review of how I've managed the development process - advice from experienced developers on how I can improve. This includes:

  • How I structured my commits and merges. Name formatting, etc...
  • How I documented the code. The fact of docstrings and their formatting.
  • The testing techniques I used
  • My use of GitHub shares for CI.
  • Any general comments on project organisation and management.

Some examples and code show

Now that I've put you in context, I'll share certain areas as examples and the link to the repository at the end of this post.

Structure

This is the folder src. I had 3 folders for type PE, ELF and Mach-O and the standar files on the src/ root.

β”œβ”€β”€β”€src
β”‚   β”‚   errors.rs
β”‚   β”‚   field.rs
β”‚   β”‚   lib.rs
β”‚   β”‚   utils.rs
β”‚   β”‚
β”‚   β”œβ”€β”€β”€elf
β”‚   β”‚       header.rs
β”‚   β”‚       mod.rs
β”‚   β”‚       program.rs
β”‚   β”‚       section.rs
β”‚   β”‚
β”‚   β”œβ”€β”€β”€macho
β”‚   β”‚       header.rs
β”‚   β”‚       load_command.rs
β”‚   β”‚       mod.rs
β”‚   β”‚       segment.rs
β”‚   β”‚
β”‚   └───pe
β”‚           header.rs
β”‚           mod.rs
β”‚           section.rs

Commits

For my commits, I use a syntax like ACTION FILE: DESCRIPTION. Here’s an example of a few commits:

  • Merge pull request #3 from M3str3/0.0.2
  • Update cargo.toml: 0.0.1 -> 0.0.2
  • Update readme.md: New examples of PE,ELF & mach-O
  • format utils: Formatting with cargo fmt
  • refactor tests/macho: Using is_empty instead !=""
  • refactor tests/elf: Using is_empty instead !=""
  • chore pe/mod: unnecessary conversion
  • refactor pe/header: to_string to fmt::Display trait
  • Refactor utils: simplify extract functions with map for byte conversion
  • Refactor Field: update buffer parameter from Vec to &mut [u8]
  • Format: formatting code with cargo fmt
  • New rustfmt.toml: define code formatting rules

Is there a better way to format or organize my commits?

Testing

In the tests folder, I have a separate test file for each file type (PE, ELF, and Mach-O). I also maintain a tests.toml file, which contains information about the binaries I use for testing, including entry points and manually extracted values.

└───tests
    β”‚   elf.rs
    β”‚   macho.rs
    β”‚   pe.rs
    β”‚   readme.md
    β”‚   tests.toml
    β”‚
    β”œβ”€β”€β”€samples
    β”‚       linux
    β”‚       machO-OSX-x86-ls
    β”‚       sample1.exe
    β”‚       sample2.dll
    β”‚
    β”œβ”€β”€β”€scripts
    β”‚       generator.py
    β”‚
    └───source
            machO-OSX-x86-ls
            sample1.c++
            sample2.c
            sample3.c

The binaries used in the tests are compiled from C or C++ code (I don’t yet have any Rust binaries :confused:). The source code for each binary is stored in source, while the binaries being tested are in samples.

The tests ensure that the structure of the parsed PE matches the data in tests.toml.


What you think about this testing technique?

Code quality

I’ll provide examples from pe/mod.rs and field.rs, as they are crucial to understanding the structure of the project.

pe/mod.rs

use std::fs;
use std::io::{self, Write};

use crate::errors::FileParseError;
use crate::field::Field;
use crate::utils::{extract_u16, extract_u32, extract_u64};

pub mod header;
pub mod section;

/// Struct to define a PeFile from attr
pub struct PE {
    pub buffer: Vec<u8>,
    pub header: header::PeHeader,
    pub sections: Vec<section::PeSection>,
}

impl PE {
    /// Write `self.buffer` on file.
    pub fn write_file(&self, output_path: &str) -> io::Result<()> {
        let mut file: fs::File = fs::File::create(output_path)?;
        file.write_all(&self.buffer)?;
        Ok(())
    }

    /// Calculate the checksum for a PE file, ignoring the checksum field itself
    ///
    /// # Examples
    /// ```
    /// use hexspell::pe::PE;
    /// let pe = PE::from_file("tests/samples/sample1.exe").unwrap(); // Sample checksum has to be the correct
    /// let calculed_check:u32 = pe.calc_checksum();
    /// assert_eq!(pe.header.checksum.value, calculed_check);
    /// ```
    pub fn calc_checksum(&self) -> u32 {
        let mut checksum: u64 = 0;
        let len = self.buffer.len();
        let mut i = 0;

        while i < len {
            // Skip checksum field
            if i == self.header.checksum.offset {
                i += self.header.checksum.size;
                continue;
            }

            if i + 1 < len {
                let word = u16::from_le_bytes([self.buffer[i], self.buffer[i + 1]]);
                checksum += word as u64;
                i += 2;
            } else {
                checksum += self.buffer[i] as u64;
                break;
            }
        }

        checksum = (checksum & 0xFFFF) + (checksum >> 16);
        checksum += len as u64;
        checksum = (checksum & 0xFFFF) + (checksum >> 16);
        checksum = (checksum & 0xFFFF) + (checksum >> 16);
        checksum as u32
    }

    /// Parses a PE file from a specified file path.
    ///
    /// # Arguments
    /// * `path` - A string slice that holds the path to the PE file.
    ///
    /// # Returns
    /// A `Result` that is either a `PeFile` on success, or a `FileParseError` on failure.
    ///
    /// # Example
    /// ```
    /// use hexspell::pe::PE;
    /// let pe_file = PE::from_file("tests/samples/sample1.exe").unwrap();
    /// ```
    pub fn from_file(path: &str) -> Result<PE, FileParseError> {
        let data: Vec<u8> = fs::read(path).map_err(|e: std::io::Error| FileParseError::Io(e))?;
        PE::from_buffer(data)
    }

    /// Parses a PE file from a byte vector.
    ///
    /// # Arguments
    /// * `buffer` - A byte vector containing the PE file data.
    ///
    /// # Returns
    /// A `Result` that is either a `PeFile` on success, or a `FileParseError` on failure.
    ///
    /// # Example
    /// ```
    /// use hexspell::pe::PE;
    /// let data = std::fs::read("tests/samples/sample1.exe").expect("Failed to read file");
    /// let pe_file = PE::from_buffer(data).unwrap();
    /// ```
    pub fn from_buffer(buffer: Vec<u8>) -> Result<Self, FileParseError> {
        if buffer.len() < 64 || buffer[0] != 0x4D || buffer[1] != 0x5A {
            return Err(FileParseError::InvalidFileFormat);
        }

        let e_lfanew_offset = 0x3C;
        let pe_header_offset = extract_u32(&buffer, e_lfanew_offset)? as usize;
        let optional_header_offset = pe_header_offset + 24;
        let optional_header_size = extract_u16(&buffer, pe_header_offset + 20)?;
        let sections_offset = optional_header_offset + optional_header_size as usize;
        let number_of_sections = extract_u16(&buffer, pe_header_offset + 6)?;

        let header = header::PeHeader {
            architecture: Field::new(
                header::Architecture::from_u16(extract_u16(&buffer, pe_header_offset + 4)?)
                    .to_string(),
                pe_header_offset + 4,
                2,
            ),
            entry_point: Field::new(
                extract_u32(&buffer, pe_header_offset + 40)?,
                pe_header_offset + 40,
                4,
            ),
            size_of_image: Field::new(
                extract_u32(&buffer, pe_header_offset + 80)?,
                pe_header_offset + 80,
                4,
            ),
            number_of_sections: Field::new(number_of_sections, pe_header_offset + 6, 2),
            checksum: Field::new(
                extract_u32(&buffer, pe_header_offset + 88)?,
                pe_header_offset + 88,
                4,
            ),
            file_alignment: Field::new(
                extract_u32(&buffer, optional_header_offset + 36)?,
                optional_header_offset + 36,
                4,
            ),
            section_alignment: Field::new(
                extract_u32(&buffer, optional_header_offset + 32)?,
                optional_header_offset + 32,
                4,
            ),
            base_of_code: Field::new(
                extract_u32(&buffer, optional_header_offset + 20)?,
                optional_header_offset + 20,
                4,
            ),
            base_of_data: Field::new(
                extract_u32(&buffer, optional_header_offset + 24)?,
                optional_header_offset + 24,
                4,
            ),
            image_base: Field::new(
                match extract_u16(&buffer, optional_header_offset)? {
                    0x10B => header::ImageBase::Base32(extract_u32(
                        &buffer,
                        optional_header_offset + 28,
                    )?),
                    0x20B => header::ImageBase::Base64(extract_u64(
                        &buffer,
                        optional_header_offset + 24,
                    )?),
                    _ => return Err(FileParseError::InvalidFileFormat),
                },
                optional_header_offset + 28,
                8,
            ),
            subsystem: Field::new(
                extract_u16(&buffer, optional_header_offset + 68)?,
                optional_header_offset + 68,
                2,
            ),
            dll_characteristics: Field::new(
                extract_u16(&buffer, optional_header_offset + 70)?,
                optional_header_offset + 70,
                2,
            ),
            size_of_headers: Field::new(
                extract_u32(&buffer, optional_header_offset + 60)?,
                optional_header_offset + 60,
                4,
            ),
            pe_type: match extract_u16(&buffer, optional_header_offset)? {
                0x10B => header::PEType::PE32,
                0x20B => header::PEType::PE32Plus,
                _ => return Err(FileParseError::InvalidFileFormat),
            },
        };

        let mut sections = Vec::with_capacity(number_of_sections as usize);
        let mut current_offset = sections_offset;
        for _ in 0..number_of_sections {
            let section = section::PeSection::parse_section(&buffer, current_offset)?;
            sections.push(section);
            current_offset += 40;
        }

        Ok(PE {
            buffer,
            header,
            sections,
        })
    }
// More code here ....
}

field.rs

#[derive(Debug, Clone, Copy)]
pub struct Field<T> {
    pub value: T,
    pub offset: usize,
    pub size: usize,
}

impl<T> Field<T> {
    pub fn new(value: T, offset: usize, size: usize) -> Self {
        Field {
            value,
            offset,
            size,
        }
    }
}

impl Field<u64> {
    /// Updates the buffer at the specified offset with the new value for u64.
    pub fn update(&mut self, buffer: &mut [u8], new_value: u64) {
        self.value = new_value;
        let bytes = new_value.to_le_bytes();
        buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
    }
}

impl Field<u32> {
    /// Updates the buffer at the specified offset with the new value for u32.
    pub fn update(&mut self, buffer: &mut [u8], new_value: u32) {
        self.value = new_value;
        let bytes = new_value.to_le_bytes();
        buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
    }
}
impl Field<u16> {
    /// Updates the buffer at the specified offset with the new value for u16.
    pub fn update(&mut self, buffer: &mut [u8], new_value: u16) {
        self.value = new_value;
        let bytes = new_value.to_le_bytes();
        buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
    }
}

impl Field<String> {
    /// Updates the buffer at the specified offset with the new UTF-8 encoded string.
    pub fn update(&mut self, buffer: &mut [u8], new_value: &String) {
        self.value = new_value.to_string();
        let bytes: &[u8] = self.value.as_bytes();
        if bytes.len() > self.size {
            panic!("New string value exceeds the allocated field size.");
        }

        buffer[self.offset..self.offset + bytes.len()].copy_from_slice(bytes);
    }
}

Issues

You can see how I document issues by checking out Adding a new section causes a part of the code of the first section to be overwritten. Β· Issue #4 Β· M3str3/HexSpell Β· GitHub or the issue #5.

Repository

HexSpell on github

Thank You

Thank you for taking the time to review my project. Any suggestions or feedback (preferably constructive :wink:) are welcome. I'm relatively new to Rust and programming in general, with only two years of professional experience, so I greatly appreciate any advice you can offer.

1 Like

I simplified your loop ...

  let mut i = self.header.checksum.offset + self.header.checksum.size;   // Skip checksum field

    while i < len {
        checksum += self.buffer[i] as u64;
        i += 1;
    }

Hey man, thanks for your reply! I appreciate the suggestion, but I'm not getting the same results with that loop. The main issue is due to little-endian format. In my original loop, I'm grabbing 16 bits (2 bytes) at a time, which allows me to correctly interpret each word in little-endian . In your loop, since it's working on 8-bit (byte-by-byte) sums, it doesn't take into account how the bytes are combined into words, so the result won't match the expected checksum.

Tested with tests/sample1.exe

numbers are on decimal

File checksum(manual check with peviewer): 28934
Checksum with original code: 28934
Checksum with ur loop: 1735811

Thanks for ur time anyways!

I hoped you realize that this line is quite awkward:
´´´

  let word = u16::from_le_bytes([self.buffer[i], self.buffer[i + 1]]);

Let me start with 'word'. Word is platformdependent. It has a given byte oerder and length. So word says, that you didn't change anything. Word has the native byte order (LE, BE) and the native lenght (8, 16, 32, 64 bytes).
Use something like my_16.
Next, you advance the index by two. So the array is organized in u16. But then, how comes that at the end, you get a single byte? I doubt that (but I absolutely don't know).

This is much nicer:

use byteorder::{BigEndian, LittleEndian, ReadBytesExt}; // 1.3.4
    let my_16 = self.buffer.read_u16::<LittleEndian>();

... as it advances automagically by 2 bytes.
And also, you might consider skipping all the byte-swapping, depending on what platform your program is running.