I was working with nom
to write an assembler, and found that some of the parsers return a tuple of my output type &str
. This made the code either very obnoxious to work with, or required cloning each string slice into an owned String
to concatenate the tuple of string slices.
So I wrote this abomination to transform a pair of string slices into a single string slice:
#![deny(clippy::all)]
#![deny(clippy::pedantic)]
/// Merge two string slices into one.
///
/// # Panics
///
/// This function will panic if the `start` and `end` string slices are not in contiguous memory.
pub fn merge_str<'a>(start: &'a str, end: &'a str) -> &'a str {
// Safety:
// We are guaranteeing that the string slices are in contiguous memory, and that the resulting
// string slice will contain valid UTF-8.
unsafe {
// Ensure string slices are in contiguous memory
if start.as_ptr().add(start.len()) != end.as_ptr() {
panic!("String slices must be in contiguous memory");
}
// Convert the two string slices into a single byte slice
let s = std::slice::from_raw_parts(start.as_ptr(), start.len() + end.len());
// Convert the byte slice into a string slice
std::str::from_utf8_unchecked(s)
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::panic::catch_unwind;
#[test]
fn test_merge_str() {
let s = "Hello, world!";
let t = "Hello, world.";
assert_eq!(merge_str(&s[..4], &s[4..]), s);
assert_eq!(merge_str(&s[..0], &s[..0]), &s[..0]);
assert_eq!(merge_str(&s[..0], &s[..1]), &s[..1]);
assert_eq!(merge_str(&s[..1], &s[1..1]), &s[..1]);
// This is a weird edge case which should fail but does not, due to Rust's memory layout.
// assert!(catch_unwind(|| merge_str(&s, &t)).is_err());
assert!(catch_unwind(|| merge_str(&t, &s)).is_err());
assert!(catch_unwind(|| merge_str(&s, &s)).is_err());
assert!(catch_unwind(|| merge_str(&s[..4], &s[5..])).is_err());
assert!(catch_unwind(|| merge_str(&s[..5], &s[4..])).is_err());
assert!(catch_unwind(|| merge_str(&s[..4], &t[4..])).is_err());
}
}
This function maintains zero-copy parsing semantics, and allows writing parsers like this:
#[derive(Debug, PartialEq)]
pub enum Inst<'a> {
GlobalLabel(&'a str),
}
fn is_word_start(input: char) -> bool {
input.is_ascii_alphabetic() || input == '_'
}
fn is_word(input: char) -> bool {
input.is_ascii_alphanumeric() || input == '_'
}
/// Recognize global labels.
fn global_label(input: &str) -> IResult<&str, Inst> {
terminated(
pair(take_while_m_n(1, 1, is_word_start), take_while(is_word)),
tag(":"),
)(input)
.map(|(rest, (start, end))| (rest, Inst::GlobalLabel(merge_str(start, end))))
}