I ended up representing the strings continuously using two vectors buffer: Vec<u8>
and splits: Vec<usize>
, to avoid fragmentation:
strings: ["foo", "bar", "baz"]
buffer: b"foobarbaz"
^ ^ ^
splits: [ 0, 3, 6 ]
Then, I used a vector links: Vec<Option<usize>>
to represent the concatenation. For example, given the input
2 1
3 0
2 3
(zero-based for simplicity), the process goes like follows.
# initially
link: [ None, None , None , None ]
# after processing "2 1", 2 is linked to 1
link: [ None, None , Some(1), None ]
# after processing "3 0", 3 is linked to 0
link: [ None, None , Some(1), Some(0) ]
# after processing "2 3", 1 is linked to 3
# (note that the previous links are followed)
link: [ None, Some(3), Some(1), Some(0) ]
I used the sum trick (with wrapping) to figure out which string is left at the end.
Here's the code:
use anyhow::{anyhow, Result};
use std::io::{prelude::*, BufWriter};
#[derive(Clone, Debug)]
struct Data {
buffer: Vec<u8>,
splits: Vec<usize>,
}
impl Data {
fn n_strings(&self) -> usize {
self.splits.len()
}
}
fn next_line(mut reader: impl BufRead) -> Result<String> {
let mut input = String::new();
reader.read_line(&mut input)?;
input.pop(); // remove newline character
Ok(input)
}
fn read_data(mut reader: impl BufRead) -> Result<Data> {
let n_strings: usize = next_line(&mut reader)?.parse()?;
let mut buffer = Vec::new();
let mut splits = Vec::with_capacity(n_strings);
for _ in 0..n_strings {
splits.push(buffer.len());
reader.read_until(b'\n', &mut buffer)?;
buffer.pop(); // remove newline character
}
Ok(Data { buffer, splits })
}
pub fn run(mut reader: impl BufRead, writer: impl Write) -> Result<()> {
let data = read_data(&mut reader)?;
let n_strings = data.n_strings();
let mut links = vec![None; n_strings];
let mut sum_of_unlinked_indexes = if n_strings % 2 == 0 {
(n_strings / 2).wrapping_mul(n_strings - 1)
} else {
((n_strings - 1) / 2).wrapping_mul(n_strings)
};
for _ in 1..data.n_strings() {
let next_line = next_line(&mut reader)?;
let mut indexes = next_line
.split_ascii_whitespace()
.map(|s| s.parse::<usize>());
let mut a = indexes.next().ok_or_else(|| anyhow!("wrong format"))?? - 1;
let b = indexes.next().ok_or_else(|| anyhow!("wrong format"))?? - 1;
while let Some(next) = links[a] {
a = next;
}
links[a] = Some(b);
sum_of_unlinked_indexes = sum_of_unlinked_indexes.wrapping_sub(b);
}
let mut writer = BufWriter::new(writer);
let mut link = Some(sum_of_unlinked_indexes);
while let Some(index) = link {
let split = data.splits[index];
let section = match data.splits.get(index + 1) {
Some(next_split) => &data.buffer[split..*next_split],
None => &data.buffer[split..],
};
writer.write_all(section)?;
link = links[index];
}
writer.write_all(b"\n")?;
Ok(())
}
There are a lot of things to clean up in the code; I'm feeling lazy.
Here's the benchmark (I/O included) on a file with 10^5
strings, each of length 10
, that I generated randomly:
test tests::bench ... bench: 55,350,300 ns/iter (+/- 9,862,579)
~0.06 seconds isn't too bad, I would say.