Feedback request: yaml-rust: The missing YAML 1.2 parser for pure Rust

yaml-rust

The missing YAML 1.2 implementation for Rust. Actually, as far as I know, it is the ONLY working YAML parser for Rust 1.0.0-final curently.

yaml-rust is a pure Rust YAML 1.2 implementation without any FFI and crate dependencies, which enjoys the memory safe property and other benefits from the Rust language. The parser is heavily influenced by libyaml and yaml-cpp.

It also provide a ruby-like API for common YAML document access.

Links

Homepage

Source code & Document

Specification compliance

The pasrser can correctly parse almost all examples in the specification, see tests.

Example

extern crate yaml_rust;
use yaml_rust::{YamlLoader, YamlEmitter};

fn main() {
    let s =
"
foo:
    - list1
    - list2
bar:
    - 1
    - 2.0
";
    let docs = YamlLoader::load_from_str(s).unwrap();

    // Multi document support, doc is a yaml::Yaml
    let doc = &docs[0];

    // Debug support
    println!("{:?}", doc);

    // Index access for map & array
    assert_eq!(doc["foo"][0].as_str().unwrap(), "list1");
    assert_eq!(doc["bar"][1].as_f64().unwrap(), 2.0);

    // Chained key/array access is checked and won't panic,
    // return BadValue if they are not exist.
    assert!(doc["INVALID_KEY"][100].is_badvalue());
    
    // Dump the YAML object
    let mut out_str = String::new();
    {
        let mut emitter = YamlEmitter::new(&mut out_str);
        emitter.dump(doc).unwrap(); // dump the YAML object to a String
    }
    println!("{}", out_str);
}
3 Likes

Awesome!

I'm wondering what you mean, when you say almost all examples?

Anyway, word of the wise, the standard way to emit Yaml into programming language is considered unsafe. My advice, is that before the library reaches 1.0 version is to deal with it.

Also try to avoid the mistake PyYaml did and call their safe and standard method for loading safe_load and load. Those defaults are terrible. Call them load and unsafe_load.

See:

Thanks!

By 'almost all examples', I mean examples in the 1.2 specification except for the following:

test_ex7_10_plain_characters
test_ex7_17_flow_mapping_separate_values
test_ex7_21_single_pair_implicit_entries
test_ex7_2_empty_nodes
test_ex8_2_block_indentation_header

These examples, includes implicit empty plain scalar in some seldom used context, seems to pose some ambiguities in the tokenizer. I will fix these problems later.

However, the library definitely needs more tests!

safety

It is a young project, safety/security is not the most urgent problem that I'm currently dealing with. But it definitely needs to be dealt with when the library comes to be stable.

But thanks to the memory safety guarded by rust-lang, the parser should just emit an error instead of crash.

Anyway, I think the library is ready for config file parsing now, but we need more time and collaborators to make it strong enough to accept arbitrary user input.

PS. thanks for the reference!

Great to hear :slight_smile: I know it's probably too early to start bogging down with safety.

I'm not sure Rust memory guarantees will help here. The issue was that you could send various malformed YAML, which would allow attacker to turn a string like say:

 --- !ruby/hash:ClassBuilder
 "foo; end; puts %(I'm in yr system!); def bar": "baz"

into an executable class on victim's system. What might help is that Rust type system isn't as extensible as Ruby/Python.

Oh, I read your reference, the rust parser just map YAML to basic data structure like

  • i64
  • f64
  • bool
  • String
  • Vec<_>
  • HashMap<_, _>

and TAGs except the tag:yaml.org,2002 namespace are ignored. So it should be equivalent to load_safe

We need to implement the RustcDecodable trait to map YAML to a Rust struct. But I think it will be safe.

2 Likes

A great project ! Part of me wished this would have been available two weeks ago when I decided to start on serde-yaml just to get a stab at getting YAML support sometime soon.

It's good to see that yaml-rust is moving forward at a much faster pace, but in any case I believe serde-yaml will benefit greatly from it (especially when it's about implementing deserialization.

Something I like in particular are the decent documentation, and the vast amount of tests.

yaml-rust also provides low-level API to emit parsing events, although few document in this part is provided yet.

So it is easy to integrate it into a deserializer. I have planned to implement the standard RustcEncodable & RustcDecodable traits for basic serialization.

@Byron Is project like serde needed any YAML specific features (e.g. Tag namespace) to deserialize objects?

No, serde is generalized and doesn't know about any of its implementation, e.g. json, xml, or yaml. Therefore I see only limited to no support for custom application tags. But this is me, now, and there is a chance I remedy myself as the implementation progresses.