Hi Rustaceans,
I'm looking for a code review of my Rust library, HexSpell, an open source tool for parsing and modifying PE (Windows), ELF (Linux) and Mach-O (macOS) binaries. The goal is to provide a flexible, low-dependency solution for handling executable file formats across platforms.
Context
I am relatively new to programming and Rust, and when I started this project, I knew very little about executable structures. I've focused on learning, and I've invested a lot of time in it.
While I'd welcome feedback on the Rust code itself, I'm especially interested in a review of how I've managed the development process - advice from experienced developers on how I can improve. This includes:
- How I structured my commits and merges. Name formatting, etc...
- How I documented the code. The fact of docstrings and their formatting.
- The testing techniques I used
- My use of GitHub shares for CI.
- Any general comments on project organisation and management.
Some examples and code show
Now that I've put you in context, I'll share certain areas as examples and the link to the repository at the end of this post.
Structure
This is the folder src. I had 3 folders for type PE, ELF and Mach-O and the standar files on the src/ root.
ββββsrc
β β errors.rs
β β field.rs
β β lib.rs
β β utils.rs
β β
β ββββelf
β β header.rs
β β mod.rs
β β program.rs
β β section.rs
β β
β ββββmacho
β β header.rs
β β load_command.rs
β β mod.rs
β β segment.rs
β β
β ββββpe
β header.rs
β mod.rs
β section.rs
Commits
For my commits, I use a syntax like ACTION FILE: DESCRIPTION
. Hereβs an example of a few commits:
- Merge pull request #3 from M3str3/0.0.2
- Update cargo.toml: 0.0.1 -> 0.0.2
- Update readme.md: New examples of PE,ELF & mach-O
- format utils: Formatting with cargo fmt
- refactor tests/macho: Using is_empty instead !=""
- refactor tests/elf: Using is_empty instead !=""
- chore pe/mod: unnecessary conversion
- refactor pe/header: to_string to fmt::Display trait
- Refactor utils: simplify extract functions with map for byte conversion
- Refactor Field: update buffer parameter from Vec to &mut [u8]
- Format: formatting code with cargo fmt
- New rustfmt.toml: define code formatting rules
Is there a better way to format or organize my commits?
Testing
In the tests
folder, I have a separate test file for each file type (PE, ELF, and Mach-O). I also maintain a tests.toml
file, which contains information about the binaries I use for testing, including entry points and manually extracted values.
ββββtests
β elf.rs
β macho.rs
β pe.rs
β readme.md
β tests.toml
β
ββββsamples
β linux
β machO-OSX-x86-ls
β sample1.exe
β sample2.dll
β
ββββscripts
β generator.py
β
ββββsource
machO-OSX-x86-ls
sample1.c++
sample2.c
sample3.c
The binaries used in the tests are compiled from C or C++ code (I donβt yet have any Rust binaries ). The source code for each binary is stored in source
, while the binaries being tested are in samples
.
The tests ensure that the structure of the parsed PE matches the data in tests.toml
.
What you think about this testing technique?
Code quality
Iβll provide examples from pe/mod.rs
and field.rs
, as they are crucial to understanding the structure of the project.
pe/mod.rs
use std::fs;
use std::io::{self, Write};
use crate::errors::FileParseError;
use crate::field::Field;
use crate::utils::{extract_u16, extract_u32, extract_u64};
pub mod header;
pub mod section;
/// Struct to define a PeFile from attr
pub struct PE {
pub buffer: Vec<u8>,
pub header: header::PeHeader,
pub sections: Vec<section::PeSection>,
}
impl PE {
/// Write `self.buffer` on file.
pub fn write_file(&self, output_path: &str) -> io::Result<()> {
let mut file: fs::File = fs::File::create(output_path)?;
file.write_all(&self.buffer)?;
Ok(())
}
/// Calculate the checksum for a PE file, ignoring the checksum field itself
///
/// # Examples
/// ```
/// use hexspell::pe::PE;
/// let pe = PE::from_file("tests/samples/sample1.exe").unwrap(); // Sample checksum has to be the correct
/// let calculed_check:u32 = pe.calc_checksum();
/// assert_eq!(pe.header.checksum.value, calculed_check);
/// ```
pub fn calc_checksum(&self) -> u32 {
let mut checksum: u64 = 0;
let len = self.buffer.len();
let mut i = 0;
while i < len {
// Skip checksum field
if i == self.header.checksum.offset {
i += self.header.checksum.size;
continue;
}
if i + 1 < len {
let word = u16::from_le_bytes([self.buffer[i], self.buffer[i + 1]]);
checksum += word as u64;
i += 2;
} else {
checksum += self.buffer[i] as u64;
break;
}
}
checksum = (checksum & 0xFFFF) + (checksum >> 16);
checksum += len as u64;
checksum = (checksum & 0xFFFF) + (checksum >> 16);
checksum = (checksum & 0xFFFF) + (checksum >> 16);
checksum as u32
}
/// Parses a PE file from a specified file path.
///
/// # Arguments
/// * `path` - A string slice that holds the path to the PE file.
///
/// # Returns
/// A `Result` that is either a `PeFile` on success, or a `FileParseError` on failure.
///
/// # Example
/// ```
/// use hexspell::pe::PE;
/// let pe_file = PE::from_file("tests/samples/sample1.exe").unwrap();
/// ```
pub fn from_file(path: &str) -> Result<PE, FileParseError> {
let data: Vec<u8> = fs::read(path).map_err(|e: std::io::Error| FileParseError::Io(e))?;
PE::from_buffer(data)
}
/// Parses a PE file from a byte vector.
///
/// # Arguments
/// * `buffer` - A byte vector containing the PE file data.
///
/// # Returns
/// A `Result` that is either a `PeFile` on success, or a `FileParseError` on failure.
///
/// # Example
/// ```
/// use hexspell::pe::PE;
/// let data = std::fs::read("tests/samples/sample1.exe").expect("Failed to read file");
/// let pe_file = PE::from_buffer(data).unwrap();
/// ```
pub fn from_buffer(buffer: Vec<u8>) -> Result<Self, FileParseError> {
if buffer.len() < 64 || buffer[0] != 0x4D || buffer[1] != 0x5A {
return Err(FileParseError::InvalidFileFormat);
}
let e_lfanew_offset = 0x3C;
let pe_header_offset = extract_u32(&buffer, e_lfanew_offset)? as usize;
let optional_header_offset = pe_header_offset + 24;
let optional_header_size = extract_u16(&buffer, pe_header_offset + 20)?;
let sections_offset = optional_header_offset + optional_header_size as usize;
let number_of_sections = extract_u16(&buffer, pe_header_offset + 6)?;
let header = header::PeHeader {
architecture: Field::new(
header::Architecture::from_u16(extract_u16(&buffer, pe_header_offset + 4)?)
.to_string(),
pe_header_offset + 4,
2,
),
entry_point: Field::new(
extract_u32(&buffer, pe_header_offset + 40)?,
pe_header_offset + 40,
4,
),
size_of_image: Field::new(
extract_u32(&buffer, pe_header_offset + 80)?,
pe_header_offset + 80,
4,
),
number_of_sections: Field::new(number_of_sections, pe_header_offset + 6, 2),
checksum: Field::new(
extract_u32(&buffer, pe_header_offset + 88)?,
pe_header_offset + 88,
4,
),
file_alignment: Field::new(
extract_u32(&buffer, optional_header_offset + 36)?,
optional_header_offset + 36,
4,
),
section_alignment: Field::new(
extract_u32(&buffer, optional_header_offset + 32)?,
optional_header_offset + 32,
4,
),
base_of_code: Field::new(
extract_u32(&buffer, optional_header_offset + 20)?,
optional_header_offset + 20,
4,
),
base_of_data: Field::new(
extract_u32(&buffer, optional_header_offset + 24)?,
optional_header_offset + 24,
4,
),
image_base: Field::new(
match extract_u16(&buffer, optional_header_offset)? {
0x10B => header::ImageBase::Base32(extract_u32(
&buffer,
optional_header_offset + 28,
)?),
0x20B => header::ImageBase::Base64(extract_u64(
&buffer,
optional_header_offset + 24,
)?),
_ => return Err(FileParseError::InvalidFileFormat),
},
optional_header_offset + 28,
8,
),
subsystem: Field::new(
extract_u16(&buffer, optional_header_offset + 68)?,
optional_header_offset + 68,
2,
),
dll_characteristics: Field::new(
extract_u16(&buffer, optional_header_offset + 70)?,
optional_header_offset + 70,
2,
),
size_of_headers: Field::new(
extract_u32(&buffer, optional_header_offset + 60)?,
optional_header_offset + 60,
4,
),
pe_type: match extract_u16(&buffer, optional_header_offset)? {
0x10B => header::PEType::PE32,
0x20B => header::PEType::PE32Plus,
_ => return Err(FileParseError::InvalidFileFormat),
},
};
let mut sections = Vec::with_capacity(number_of_sections as usize);
let mut current_offset = sections_offset;
for _ in 0..number_of_sections {
let section = section::PeSection::parse_section(&buffer, current_offset)?;
sections.push(section);
current_offset += 40;
}
Ok(PE {
buffer,
header,
sections,
})
}
// More code here ....
}
field.rs
#[derive(Debug, Clone, Copy)]
pub struct Field<T> {
pub value: T,
pub offset: usize,
pub size: usize,
}
impl<T> Field<T> {
pub fn new(value: T, offset: usize, size: usize) -> Self {
Field {
value,
offset,
size,
}
}
}
impl Field<u64> {
/// Updates the buffer at the specified offset with the new value for u64.
pub fn update(&mut self, buffer: &mut [u8], new_value: u64) {
self.value = new_value;
let bytes = new_value.to_le_bytes();
buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
}
}
impl Field<u32> {
/// Updates the buffer at the specified offset with the new value for u32.
pub fn update(&mut self, buffer: &mut [u8], new_value: u32) {
self.value = new_value;
let bytes = new_value.to_le_bytes();
buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
}
}
impl Field<u16> {
/// Updates the buffer at the specified offset with the new value for u16.
pub fn update(&mut self, buffer: &mut [u8], new_value: u16) {
self.value = new_value;
let bytes = new_value.to_le_bytes();
buffer[self.offset..self.offset + self.size].copy_from_slice(&bytes[..self.size]);
}
}
impl Field<String> {
/// Updates the buffer at the specified offset with the new UTF-8 encoded string.
pub fn update(&mut self, buffer: &mut [u8], new_value: &String) {
self.value = new_value.to_string();
let bytes: &[u8] = self.value.as_bytes();
if bytes.len() > self.size {
panic!("New string value exceeds the allocated field size.");
}
buffer[self.offset..self.offset + bytes.len()].copy_from_slice(bytes);
}
}
Issues
You can see how I document issues by checking out Adding a new section causes a part of the code of the first section to be overwritten. Β· Issue #4 Β· M3str3/HexSpell Β· GitHub or the issue #5.
Repository
Thank You
Thank you for taking the time to review my project. Any suggestions or feedback (preferably constructive ) are welcome. I'm relatively new to Rust and programming in general, with only two years of professional experience, so I greatly appreciate any advice you can offer.