I am trying to parse XML using quick_xml into some struct with zero allocation. I think this is MRE of my problem:
enum Results {
One,
Two,
}
struct Reader<R> {
reader: R,
}
impl Reader<&[u8]> {
fn read_into<'a>(&mut self, buf: &'a mut Vec<u8>) -> Results {
todo!()
}
fn read_text(&mut self) -> &str {
""
}
}
struct Doc<'a> {
id: &'a str,
}
fn read_root<'a>(rd: &'a mut Reader<&'a [u8]>, doc: &'a mut Doc<'a>) {
let mut buf: Vec<u8> = Vec::new();
loop {
match rd.read_into(&mut buf) {
Results::One => {
read_child_1(rd, doc);
}
Results::Two => {
read_child_2(rd);
}
}
buf.clear();
}
}
fn read_child_1<'a>(rd: &'a mut Reader<&'a [u8]>, doc: &'a mut Doc<'a>) {
doc.id = rd.read_text();
}
fn read_child_2(rd: &mut Reader<&[u8]>) {}
The compiler complains about multiple mutable borrowing when "read_child_1" is not commented out, but doesn't if it is commented out. The difference is second argument. What am I missing here?
The problem is because of your definition of read_child_1
which conflicts with the definition of the Doc
struct:
fn read_child_1<'a>(rd: &'a mut Reader<&'a [u8]>, doc: &'a mut Doc<'a>) {
doc.id = rd.read_text();
}
Here you are saying that you are borrowing Doc
for a lifetime 'a
, but this lifetime comes from the time when you borrow it:
match rd.read_into(&mut buf) {
Results::One => {
read_child_1(rd, doc);
}
Thus effectively making the 'a
lifetime that you defined inside read_child_1
to equal the 'a
lifetime defined at read_root
.
I would start by not making Doc
hold a lifetime.
Can you explain? Where exactly should I do that?
Yes, that I know. Changing everything to owned types fixes everything. But this is not what I want. I don't want to allocate. I already have an owned XML string, I don't want to create a second copy of it.
Also, if I could process everything in the "parent" function that would work too. But my XML is very complex and I need to call multiple "child" functions to process it.
If zero allocation is impossible in my case, it is fine; I just want to make sure it is, before giving up on it.
vague
October 16, 2023, 3:57pm
6
As pointed out above, &'a mut Struct<'a>
should never be used in Rust. So the simplest solution is to reduce lifetimes somewhere to have less trouble.
If doc insists to have a lifetime, the lifetime annotation should be
fn read_root<'id>(rd: &'id mut Reader<&[u8]>, doc: &mut Doc<'id>) {
let mut buf: Vec<u8> = Vec::new();
// loop {
...
// }
}
fn read_child_1<'id>(rd: &'id mut Reader<&'_ [u8]>, doc: &'_ mut Doc<'id>) { doc.id = rd.read_text(); }
and for simplicity, successful code without loop: Rust Playground
Then, based on the code above, one pattern with loops can be Rust Playground
fn read_root(rd: &mut Reader<&[u8]>) {
let mut buf: Vec<u8> = Vec::new();
loop {
match rd.read_into(&mut buf) {
Results::One => {
let mut doc = Doc { id: /* temprary lifetime 'id */ };
read_child_1(/*&'id mut*/rd, &mut doc); // &mut Reader<&[u8]> reborrows with 'id
}
Results::Two => {
read_child_2(rd);
}
}
buf.clear();
}
}
fn read_child_1<'id>(rd: &'id mut Reader<&'_ [u8]>, doc: &'_ mut Doc<'id>) {
doc.id = rd.read_text();
}
I guess this won't be your expected code. But this may be the best result with &mut
.
You can go a little further along &
road, since it's Copy, and &'id
can be everywhere (at the cost of interior/shared mutability): Rust Playground
struct Reader<R> {
reader: Mutex<R>,
}
fn read_root<'id>(rd: &'id Reader<&[u8]>, doc: &mut Doc<'id>) {
let mut buf: Vec<u8> = Vec::new();
loop {
match rd.read_into(&mut buf) {
Results::One => {
read_child_1(rd, doc);
}
Results::Two => {
read_child_2(rd);
}
}
buf.clear();
}
}
fn read_child_1<'id>(rd: &'id Reader<&'_ [u8]>, doc: &'_ mut Doc<'id>) {
doc.id = rd.read_text();
}
4 Likes
Overlapping with @vague ’s answer (they were faster than me), but here is a slightly different working version I got in the playground: Rust Playground
enum Results {
One,
Two,
}
struct Reader<R> {
reader: R,
}
impl<'a> Reader<&'a [u8]> {
fn read_into(&mut self, _buf: &mut Vec<u8>) -> Results {
todo!()
}
fn read_text(&mut self) -> &'a str {
""
}
}
struct Doc<'a> {
id: &'a str,
}
fn read_root<'a>(rd: &mut Reader<&'a [u8]>, doc: &mut Doc<'a>) {
let mut buf: Vec<u8> = Vec::new();
loop {
let results = rd.read_into(&mut buf);
match results {
Results::One => {
read_child_1(rd, doc);
}
Results::Two => {
read_child_2(rd);
}
}
buf.clear();
}
}
fn read_child_1<'a>(rd: &mut Reader<&'a [u8]>, doc: &mut Doc<'a>) {
doc.id = rd.read_text();
}
fn read_child_2(_rd: &mut Reader<&[u8]>) {}
Reiterating this: beware of &'a mut Struct<'a>
3 Likes
Thank you; looks promising. Let me try this with real code.
Beautiful! I used this signature for both parent and child functions and everything works great now:
fn <'a>(rd: &mut Reader<&'a [u8]>, doc: &mut Doc<'a>)