Low-level JSON library 'Struson' version 0.2.0 released

A few days ago I released version 0.2.0 of my low-level JSON library called 'Struson'. It is still experimental, but any feedback and suggestions regarding API and implementation are welcome!

Compared to version 0.1.0 the new version 0.2.0 adds methods for reading values (JSON strings, numbers and member names) as borrowed strings. This now allows reading JSON data without any additional allocations in many cases.

Examples

  • Reading:

    let json = r#"
        {
            "first": 1.23e4,
            "second": [
                true,
                "value"
            ]
        }
    "#;
    
    let mut json_reader = JsonStreamReader::new(json.as_bytes());
    json_reader.begin_object()?;
    
    assert_eq!(json_reader.next_name()?, "first");
    assert_eq!(json_reader.next_number_as_str()?, "1.23e4");
    
    assert_eq!(json_reader.next_name()?, "second");
    json_reader.begin_array()?;
    assert_eq!(json_reader.next_bool()?, true);
    assert_eq!(json_reader.next_str()?, "value");
    json_reader.end_array()?;
    
    json_reader.end_object()?;
    json_reader.consume_trailing_whitespace()?;
    
  • Writing:

    let mut writer = Vec::<u8>::new();
    let mut json_writer = JsonStreamWriter::new_custom(
        &mut writer,
        WriterSettings {
            pretty_print: true,
            ..Default::default()
        },
    );
    
    json_writer.begin_object()?;
    
    json_writer.name("first")?;
    json_writer.number_value(123)?;
    
    json_writer.name("second")?;
    json_writer.begin_array()?;
    json_writer.bool_value(true)?;
    json_writer.string_value("value")?;
    json_writer.end_array()?;
    
    json_writer.end_object()?;
    json_writer.finish_document()?;
    
    assert_eq!(
        String::from_utf8(writer)?,
        "{\n  \"first\": 123,\n  \"second\": [\n    true,\n    \"value\"\n  ]\n}"
    );
    
  • Seeking and transferring from reader to writer:

    let json = r#"
        {
            "first": 1.23e4,
            "second": [
                true,
                "value"
            ]
        }
    "#;
    
    let mut json_reader = JsonStreamReader::new(json.as_bytes());
    
    let mut writer = Vec::<u8>::new();
    let mut json_writer = JsonStreamWriter::new(&mut writer);
    
    // Skip to path in JSON data, here to JSON string `"value"`
    json_reader.seek_to(&json_path!["second", 1])?;
    
    // Optionally write enclosing data for transferred value
    json_writer.begin_object()?;
    json_writer.name("transferred")?;
    
    json_reader.transfer_to(&mut json_writer)?;
    // Finish reading
    json_reader.skip_to_top_level()?;
    json_reader.consume_trailing_whitespace()?;
    
    json_writer.end_object()?;
    json_writer.finish_document()?;
    
    assert_eq!(String::from_utf8(writer)?, "{\"transferred\":\"value\"}");
    
  • Serde interoperability (requires serde feature):

    let json = r#"
        {
            "outer": {
                "a": 1,
                "b": "value"
            }
        }
    "#;
    
    #[derive(serde::Deserialize, PartialEq, Debug)]
    struct MyStruct {
        a: u32,
        b: String,
    }
    
    let mut json_reader = JsonStreamReader::new(json.as_bytes());
    json_reader.seek_to(&json_path!["outer"])?;
    
    let deserialized: MyStruct = json_reader.deserialize_next()?;
    assert_eq!(
        deserialized,
        MyStruct {
            a: 1,
            b: "value".to_owned()
        }
    );
    
    // Finish reading
    json_reader.skip_to_top_level()?;
    json_reader.consume_trailing_whitespace()?;
    
5 Likes

I just (minutes ago!) found your library, and I think it can also be used to parse truncated JSON: Parsing truncated (i.e. invalid) JSON. Is that an expected use case?

I miss something which combines peek() and ValueType into next_value(), i.e. moving this block into the library and adding a Value enum which can contain the String, bool, null, etc.

 match j.peek() {
        Ok(ValueType::String) => { j.next_string(); }
        Ok(ValueType::Number) => { j.next_number_as_str(); }
        Ok(_) => todo!(),
        Err(e) => return Err(e),
    };

This Value would be similar to serde_json::Value, but with no content in the Array and Object/Map cases. Maybe also copy the Number type so the decision between u8, i64 or f64 etc. can be made later. Or store it as next_number_as_str() would.

Also, this library seems to panic too much, and in functions which already return Result. Maybe the panic when peek()-ing when only end_object() is allowed could become a specific "expected ..." error or even ValueType::MustBeEndObject?

I answered it on your other post: Probably not something which will be added directly as feature to Struson, but something you can already achieve by implementing a custom JsonReader on top of JsonStreamReader.

For now I will probably not add such an enum. The use case I had in mind is that you

  • either know in advance the value type you are expecting and then call the corresponding method, such as next_bool (which returns an error if the JSON data does not match)
  • or call peek() and then handle the value in a way which is appropriate for your use case

I guess next_value() would be bit more convenient in some cases, but it can also be less efficient and can take away the choice from the user. For example I assume it would store string values and names as owned String, but a borrowed string might suffice for some users as well. Or it would fully read a string and return Value::String(String), and then the user code determines that it does not actually expect a string; in that case it would have been more efficient to fail fast, or to use JsonReader::skip_value.

The library should only panic when the API is used in an unintended way. Though possibly the documentation does not make the intended way clear enough or the "intended way" is too inconvenient. Do you have some more examples where you didn't expect a panic?

At the end of an object (or in general when a next object member may follow) you cannot call peek() because it is only for determining the type of the next value. Instead when you are processing JSON object members you would have to:

  1. Call has_next(), and if the result is true:
    1. Consume the member name with next_name() (or a similar method)
    2. Consume the member value (optionally first calling peek()), for example with next_number_as_str()

(Or you omit the has_next() call if you expect that there are more object members, and directly call next_name(), which returns the error UnexpectedStructure if there are unexpectedly no more object members.)

So normally you would write a loop for this:

let mut json_reader = JsonStreamReader::new(r#"{"a": 1, "b": 2}"#.as_bytes());
json_reader.begin_object()?;
// Process all object members
while json_reader.has_next()? {
    // For each member first consume its name
    let name = json_reader.next_name_owned()?;
    // And then its value
    let value = json_reader.next_number_as_str()?;

    println!("{name}: {value}");
}
// After the last member has been reached, close the object
json_reader.end_object()?;

This is done to keep the responsibilities of peek() and has_next() separate. Adding a new ValueType::MustBeEndObject might make the use case you mentioned easier, but it makes other use cases more cumbersome or confusing. For example, you would also have to handle MustBeEndObject then when you call peek() inside a JSON array, even though it should be impossible there.

I hope that explanation makes sense. If you have any suggestions though for different naming of methods, improvement of documentation, or other suggestions then feel free to share them. I can't make any promises that I will implement those, but it definitely helps me understanding how and when users are using the library and for which tasks, limitations of the library, or cases where usage is not intuitive.

Thank you very much for your feedback!

Without looking too deeply, I wonder if it may be helpful to show the intended use (methods X, Y should be called after method Z) by refactoring some methods into a facade type:
Playground

Effectively, promote the rules of "what should be called when" to the type level, by exposing more types that are really decorations on the base type.
Then the compiler can help people "hold it the right way", without relying too heavily on panics and deeply understanding the docs.

In the linked playground, you would still need to handle the "end of object" somehow, so it's not a complete solution. Rather than some Drop impl on the decorator type, maybe the base struct could track the object as pending and read the object close when subsequent methods are called.

3 Likes

I see. This would use Struson as a token generator which just yields the next AST element, but with JSON validation so that { }} fails once it is encountered. This might too low level. However, regarding efficiency, these nodes could just contain &str, which only get parsed on demand.

The lines of code need to be aware of the state machine which is being implicitly manipulated here. Just switching two lines can lead to a runtime panic. In my opinion, that is not idiomatic Rust, especially when Result is available.

I ran into a few panics when experimenting, this makes exploratory programming a bit harder.

This also breaks the assumption that "when it compiles, it runs". Thus when this code is in a rarely used branch off the "happy path", the only thing that can help is full test coverage. Maybe these methods need to consume self and return new types which only have have methods which are actually callable, i.e. do encode the state machine in the types.

1 Like

Thanks a lot for your and @danjl1100's feedback! Those are some good points and the current API is indeed error-prone especially (but not only) when you are not that familiar with it yet.

However, I did not want to completely rewrite the API. One reason was probably that it would have been too much work, but also that the current API is as flexible as possible. I assume (but have not investigated that in depth) that any other safer API implicitly imposes some restrictions on the functionality that can be offered.

So I decided to provide an additional "simple API" which internally delegates to the existing API. For JSON arrays and objects it uses function arguments / closures to consume array items and object members so that it is explicit when the array or object is started (call of the method) and when it is ended (when method returns), and it does not use Drop which panics on errors or discards them silently.
The new API enforces correct usage at compile time and should be completely panic-free; if you notice a panic it would be great if you could create a bug report.

Here are some examples:

  • Reading:

    use struson::reader::simple::*;
    
    let json_reader = SimpleJsonReader::new(
        r#"["a", 1, "short", 2, "example"]"#.as_bytes()
    );
    let mut words = Vec::<String>::new();
    json_reader.next_array_items(|mut item_reader| {
        match item_reader.peek()? {
            ValueType::String => {
                let word = item_reader.next_string()?;
                words.push(word);
            }
            _ => {} // ignore
        }
        Ok(())
    })?;
    assert_eq!(words, vec!["a", "short", "example"]);
    
    use struson::reader::simple::*;
    
    let json_reader = SimpleJsonReader::new(
        r#"{"a": 1, "b": 2, "c": 3, "d": 4}"#.as_bytes()
    );
    json_reader.next_object(|name, value_reader| {
        match name.as_str() {
            "a" | "b" => {
                let value: u64 = value_reader.next_number()??;
                println!("{name}: {value}");
            },
            _ => {}, // ignore
        }
        Ok(())
    })?;
    
  • Writing:

     use struson::writer::simple::*;
    
     let mut writer = Vec::<u8>::new();
     let json_writer = SimpleJsonWriter::new(&mut writer);
     json_writer.object_value(|object_writer| {
         object_writer.number_member("a", 1)?;
         object_writer.bool_member("b", true)?;
         Ok(())
     })?;
    
     let json = String::from_utf8(writer)?;
     assert_eq!(json, r#"{"a":1,"b":true}"#);
    

You can find more examples in the documentation and the tests tests/simple_reader.rs and tests/simple_writer.rs.

This version is not released yet, but it would be great if you could give it a try[1] and provide feedback either here or on this issue. This is only the first prototype, so there is probably room for improvements, but any feedback regarding what you like, what you don't like, what you think could be improved, what is cumbersome, ... is highly appreciated!


  1. For testing your could specify a Git dependency ↩︎

1 Like

Have released version 0.4.0 now which includes this 'simple API'. For that I also created a separate announcement post here.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.