Parsing multiple str's into various types, am I doing it right?

Please tell me if I'm doing it correctly & elegantly enough or not.

I had this code that extracts XML attributes into a HashMap and then assembles them into an object. Some members must be parsed into their types (i64, u32, bool).

The structs (are the same in all iterations)

pub type ParsedAttrs = HashMap<Arc<str>, Arc<str>>;

#[derive(Debug, Clone)]
pub struct Attrs {
	pub id: Option<i64>,
	pub timestamp: Option<Arc<str>>,
	pub uid: Option<i64>,
	pub user: Option<Arc<str>>,
	pub visible: Option<bool>,
	pub deleted: Option<bool>,
	pub version: Option<u32>,
	pub changeset: Option<u64>,
}

(I need Optional members, because I want to create new Attrs and leave some members empty.)

Conversion code, version 1:

impl TryFrom<&ParsedAttrs> for Attrs {
	type Error = ParseError;
	fn try_from(pa: &ParsedAttrs) -> Result<Self, Self::Error> {
		Ok(Self {
			changeset: match pa.get("changeset") {
				Some(v) => Some(v.parse()?),
				None => None
			},
			deleted: match pa.get("deleted") {
				Some(v) => Some(v.parse()?),
				None => None
			},
			id: match pa.get("id") {
				None => None,
				Some(v) => Some(v.parse()?)
			},
			timestamp: match pa.get("timestamp") {
				Some(v) => Some(v.clone()),
				None => None
			},
			uid: match pa.get("uid") {
				Some(v) => Some(v.parse()?),
				None => None
			},
			user: match pa.get("user") {
				Some(v) => Some(v.clone()),
				None => None
			},
			version: match pa.get("version") {
				Some(v) => Some(v.parse()?),
				None => None
			},
			visible: match pa.get("visible") {
				Some(v) => Some(v.parse()?),
				None => None
			},
		})
	}
}

This is very repetitive, but you can't put it into a loop or a function, because only several members should be parsed.

So I made a generic function that accepts Fn.

Version 2:

impl TryFrom<&ParsedAttrs> for Attrs {
	type Error = ParseError;
	fn try_from(pa: &ParsedAttrs) -> Result<Self, Self::Error> {

		Ok(Self {
			changeset: opt_attr(pa, "changeset", try_parse)?,
			deleted: opt_attr(pa, "deleted", try_parse)?,
			id: opt_attr(pa, "id", try_parse)?,
			timestamp: opt_attr(pa, "timestamp", do_nothing)?,
			uid: opt_attr(pa, "uid", try_parse)?,
			user: opt_attr(pa, "user", do_nothing)?,
			version: opt_attr(pa, "version", try_parse)?,
			visible: opt_attr(pa, "visible", try_parse)?,
		})
	}
}

fn opt_attr<T, F>(pa: &ParsedAttrs, k: &str, postprocess: F) -> Result<Option<T>, ParseError>
	where F: Fn(Arc<str>) -> Result<T, ParseError>
{
	match pa.get(k) {
		Some(v) => Ok(Some(postprocess(v.clone())?)),
		None => Ok(None)
	}
}

fn do_nothing<E>(a: Arc<str>) -> Result<Arc<str>, E> {
	Ok(a)
}

fn try_parse<T>(v: Arc<str>) -> Result<T, ParseError>
where T: FromStr, ParseError: From<<T as FromStr>::Err> {
                                 // ^^^^^^^^^^^^^^^^^^ compiler suggested this
	Ok(v.parse::<T>()?)
}

I thought Option::transpose() was inappropriate, but then saw it would do what I needed.

pa.get() returns Option<Arc<str>>, then .map() turns it into Option<Result<T>>, which can be turned into Result<Option<T>, E>.

Version 3:

impl TryFrom<&ParsedAttrs> for OsmElementAttrs {
	type Error = ParseError;
	fn try_from(pa: &ParsedAttrs) -> Result<Self, Self::Error> {

		Ok(Self {
			changeset: 	pa.get("changeset")	.map(|v| v.parse::<u64>())	.transpose()?,
			deleted: 	pa.get("deleted")	.map(|v| v.parse::<bool>())	.transpose()?,
			id: 		pa.get("id")		.map(|v| v.parse::<i64>())	.transpose()?,
			timestamp: 	pa.get("timestamp")	.map(|v| v.clone()),
			uid: 		pa.get("uid")		.map(|v| v.parse::<i64>())	.transpose()?,
			user: 		pa.get("user")		.map(|v| v.clone()),
			version: 	pa.get("version")	.map(|v| v.parse::<u32>())	.transpose()?,
			visible: 	pa.get("visible")	.map(|v| v.parse::<bool>())	.transpose()?,
		})
	}
}

Can this be compacted further?

I like version 2 the most, but I would inline postprocess since there's only two option anyway. e.g.

fn opt_attr_parse<T>(pa: &ParsedAttrs, k: &str) -> Result<Option<T>, Error> { ... }
fn opt_attr(pa: &ParsedAttrs, k: &str) -> Option<Arc<str>> { ... }
1 Like

You could use the and_then method of Option.

impl TryFrom<&ParsedAttrs> for Attrs {
	type Error = ();
	fn try_from(pa: &ParsedAttrs) -> Result<Self, Self::Error> {
		Ok(Self {
			id: pa.get("id").and_then(|i| i.parse().ok()),
			changeset: pa.get("changeset").and_then(|i| i.parse().ok()),
			deleted: pa.get("deleted").and_then(|i| i.parse().ok()),
			timestamp: pa.get("timestamp").map(Arc::clone),
			uid: pa.get("uid").and_then(|i| i.parse().ok()),
			user: pa.get("user").map(Arc::clone),
			version: pa.get("version").and_then(|i| i.parse().ok()),
			visible: pa.get("visible").and_then(|i| i.parse().ok()),
		})
	}
}

It doesn't report the error, but my brain still isn't fully on. If you want to keep the parse error I believe you should use ok_or_else.

But, then again, could you not deserialize directly into this struct with serde ?

Edit: simply using map and transpose is enough.

5 Likes
  1. I thought of halting the process immediately if there's any error, that's why I tried to propagate it. But it may be a better idea to collect all attribute/value errors and output them in the end, so that user can fix all the errors in XML (e.g. if it's generated by a script).
  2. I wanted to parse huge files, of several hundred megs, and serde will use xml-rs which is slower and unmaintaned.

quick-xml is maintained, supports serde and an events based API and claims pretty good performance, though I've never used XML (hence never used any XML library) in any rust project so I can't say...

3 Likes

Oh, I actually use quick-xml for this, and I didn't notice that it supports serde. Thanks a lot!

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.