Hi everyone.
TLDR: I'm writing the xml support for rust-serde. I'd like some comments on what others expect from Rust<->Xml conversions.
Now, I'm not an expert when it comes to the xml-standard and xsd-schemata. I'm making most of this up as I go.
Some information about xml:
- every xml document must have exactly one root element
- every xml document must have some info at the start about the xml version and stuff
- looks like
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
- I don't think that's relevant for Rust
- I'm simply assuming it's always utf-8
- I haven't implemented reading that
- I'm probably going to simply ignore such a prefix
- xsd-schemata allow the description of xml documents
- sequences, optional elements, choices between elements...
- ordered and unordered child-elements (basically struct fields in order or out of order)
Current state
I can parse any int, float, char or string from an xml like <start_tag>value</end_tag>
. Note: start_tag
and end_tag
can be arbitrary, since xml does not support simple values.
I can parse structs containing struct fields whose types can be sequences (any tuple, [T;N]
, Vec
), ints, floats, chars, strings, Option
s, or other structs. An example:
#[derive(PartialEq, Debug, Serialize, Deserialize)]
struct Inner {
a: (),
b: (usize, String, i8),
c: Vec<String>,
}
#[derive(PartialEq, Debug, Serialize, Deserialize)]
struct Outer {
inner: Option<Inner>,
}
Outer {
inner: Some(Inner {
a: (),
b: (2, "boom".to_string(), 88),
c: vec![
"abc".to_string(),
"xyz".to_string(),
]
})
}
<Outer>
<inner>
<c>abc</c>
<c>xyz</c>
<a/>
<b>2</b>
<b>boom</b>
<b>88</b>
</inner>
</Outer>
As you can see, sequences in xml are simply the same element repeated over and over (with changing contents).
Future
Xml-Attributes
An encoded instance of struct A { x: String }
could look like <A x="foo" />
instead of <A><x>foo</x></A>
Rust-Enums
The enum kind could be encoded as it's own tag or just an attribute to the outer tag:
<A>
<x xsi:type="Cake" />
<x><Cake/></x>
</A>
The issue is that using a special attribute would interfere with struct fields being encoded as attributes. It would be a requirement that the xsi:type
attribute always comes first. Otherwise we cannot decide which enum type it is before trying to parse the enum contents and that would require infinite (until the next >
) lookahead.
Encoding the enum kind as its own tag is the clean way when looking at it from a parser-designing point of view. But would be incompatible to xsd.
check all closing tags
Currently closing tags (</foo>
) are not compared to their opening tags, to see if the name matches. This is just a nice-to have, but isn't possible yet in serde without doing heap allocations.
serialization
Once the deserializer does its job for all relevant rust types it would be nice to also serialize stuff to xml.
mixed content
example xml:
<root>hi <b>you</b><i>!</i></root>
I have no idea what I should do with that. Is that a sequence of
enum Mixed<T> { Text(String), Element(T) }
Or should this be the String "hi <b>you</b><i>!</i>
"?
Root element
In case of structs, should the root element be named after the struct name? Or don't we really care about those since the parser doesn't require them to figure out what it's parsing?
Not deserializable Xml
Can you think of any xml that could not be deserialized to a Rust type but should be?