AJSON - a JSON Parser for Rust - Get JSON values quickly

important! I have to change the project name to ajson

Hello, everyone, I wrote a json parsing library, can quickly and easily get the value, I hope you can try.

project github

playground

Inspiration comes from gjson in golang

Installation

Add it to your Cargo.toml file:

[dependencies]
ajson = "0.1"

Then add it to your code:

extern crate ajson;

A simple example

AJSON get json value with specified path, such as project.name or project.version. When the path matches, it returns immediately!

let data = r#"
{
  "project": {
    "name": "ajson",
    "maintainer": "importcjj",
    "version": 0.1,
    "rusts": ["stable", "nightly"]
  }
}
"#;

let name = ajson::get(data, "project.name");
println!("{}", name.as_str()); // ajson

Syntax

JSON example

{
    "name": {"first": "Tom", "last": "Anderson"},
    "age":37,
    "children": ["Sara","Alex","Jack"],
    "fav.movie": "Deer Hunter",
    "friends": [
        {"first": "Dale", "last": "Murphy", "age": 44, "nets": ["ig", "fb", "tw"]},
        {"first": "Roger", "last": "Craig", "age": 68, "nets": ["fb", "tw"]},
        {"first": "Jane", "last": "Murphy", "age": 47, "nets": ["ig", "tw"]}
    ]
}

basic

Below is a quick overview of the path syntax, for more complete information please check out GJSON Syntax.

A path is a series of keys separated by a dot. A key may contain special wildcard characters '*' and '?'. To access an array value use the index as the key. To get the number of elements in an array or to access a child path, use the '#' character. The dot and wildcard characters can be escaped with ''.

name.last        >> "Anderson"
age              >> 37
children         >> ["Sara","Alex","Jack"]
children.#       >> 3
children.1       >> "Alex"
child*.2         >> "Jack"
c?ildren.0       >> "Sara"
fav\.movie       >> "Deer Hunter"
friends.#.first  >> ["Dale","Roger","Jane"]
friends.1.last   >> "Craig"

Escape character

Special purpose characters, such as ., *, and ? can be escaped with .

fav\.movie             "Deer Hunter"

Arrays

The # character allows for digging into JSON Arrays.To get the length of an array you'll just use the # all by itself.

friends.#              3
friends.#.age         [44,68,47]

queries

You can also query an array for the first match by using #(...), or find all matches with #(...)#. Queries support the ==, !=, <, <=, >, >= comparison operators and the simple pattern matching % (like) and !% (not like) operators.

friends.#(last=="Murphy").first   >> "Dale"
friends.#(last=="Murphy")#.first  >> ["Dale","Jane"]
friends.#(age>45)#.last           >> ["Craig","Murphy"]
friends.#(first%"D*").last        >> "Murphy"
friends.#(nets.#(=="fb"))#.first  >> ["Dale","Roger"]

construct

Basically, you can use selectors to assemble whatever you want, and of course, the result is still a json :wink:

{name.first,age,"murphys":friends.#(last="Murphy")#.first}
[name.first,age,children.0]
ajson::get(json, "name.[first,last]").as_array();
ajson::get(json, "name.first"); 
ajson::get(json, "name.last");

io::Read

Not only string, AJSON also can parse JSON from io::Read.

use std::fs::File;

let f = file::Open("path/to/json").unwrap();
let json = ajson::parse_from_read(f);
let value = json.get("a.b");
println!("{}", value.as_str());

Value

Value types.

enum Value {
    String(String),
    Number(Number),
    Object(String),
    Array(String),
    Boolean(bool),
    Null,
    NotExist,
}

Value has a number of methods that meet your different needs.

value.as_str() -> &str
value.as_u64() -> u64
value.as_i64() -> i64
value.as_f64() -> f64
value.as_bool() -> bool
value.as_array() -> Vec<Value>
value.as_map() -> HashMap<String, Value>
value.get(&str) -> Value
value.exsits() -> bool
value.is_number() -> bool
value.is_string() -> bool
value.is_bool() -> bool
value.is_object() -> bool
value.is_array() -> bool
value.is_null() -> bool

Sometimes you need to check if value exists, you can use exsits(). Notice that when used for null values, exsits returns true.


let v = ajson::get(json, "path");
if v.exsits() {
  println!("got it {}", value);
}

get or parse?

Parse needs to read a complete json element, but get returns the result immediately, so get is recommended if you want to simply get a value

Validate

AJSON can help you get the desired value from flawed JSON, but it's worth being more careful because of its looseness.

be careful!!!

Maybe need a validate function :thinking:

Performance

$ cargo bench

  • ajson
    *[serde_json
  • rust-json
ajson benchmark         time:   [6.7000 us 6.8023 us 6.9081 us]                             
                        change: [-1.8368% -0.4152% +1.0466%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

serde_json benchmark    time:   [48.196 us 48.543 us 48.947 us]                                  
                        change: [+2.9073% +4.4909% +6.3532%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

json-rust benchmark     time:   [24.540 us 24.773 us 25.061 us]                                 
                        change: [+4.8288% +6.0452% +7.4633%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
  • MacBook Pro (13-inch, 2018, Four Thunderbolt 3 Ports)
  • 2.7 GHz Intel Core i7
  • 16 GB 2133 MHz LPDDR3

problems

AJSON has just been finished, there may be some bugs and shortcomings, please feel free to issue. Also, Rust is a new language for me, and maybe ajson isn't rust enough, so I hope you have some suggestions.

License

MIT License.

2 Likes

While impressive, the benchmarks seem a little unfair to the other two libraries. While it seems the intended way to use this library is to get properties directly from the string of data, the other libraries require the user to explicitly parse the data first.

The benchmarks repeatedly parse the data, despite not being idiomatic for those libraries.

3 Likes

I updated the benchmarks to not do redundant parsing, and my results were 7.7729 μs for gjson, 12.245 μs for serde, and 5.8147 μs for json - so mostly comparable speeds (though because of its structure, gjson is still doing multiple parses.)

I hadn't seen gjson before -- this sort of simple path-string querying can be pretty useful, and the alternatives have often seemed a bit heavy-weight. JSONPath's syntax is apparently too arcane to ever stick in my head. :wink: JSON Pointer has come in handy, but in many ways isn't powerful enough. gjson seems to strike a nice balance between readability and complexity.

Thanks! (also, exists is misspelled as exsits in this post and your repo README!)̨

Thank you for your advice.

It also supports getting values from a value

let a = ajson::get(data, "name");
let b = a.get("last");

Or more advanced,

let v = ajson::get(data, r#"friends.["1.first", "2.nets"]"#);

Sometimes, we might just want to read a few values in json, and parsing the whole json is wasteful.
Anyway, the project is just starting, I will update benchmark

1 Like

Sometimes, we might just want to read a few values in json, and parsing the whole json is wasteful.
Anyway, the project is just starting, I will update benchmark

Thanks :slight_smile:

The serde derive handles this by efficiently ignoring data that isn't included in the output type.

For the same benchmark (pull out "widget.window.name", "widget.image.hOffset", "widget.text.onMouseUp", "widget.debug", "widget.text", and "widget.menu.#(sub_item>7)#.title" from the sample object in the benchmark), on my machine I get:

  • 5.90 μs for ajson (parsing part of the input 5 separate times)
  • 3.38 μs for json-rust (parsing all of the input 1 time and extracting values)
  • 1.89 μs for serde_json + serde_derive (parsing only the required parts of the input in 1 pass)
1 Like

Some criticisms:

  • Of these names, as_str is the only one that makes sense. Things that return stuff like Vec ought to be to_array or to_vec.
  • I don't understand the naming scheme behind as_array and as_map. The rust names would be vec and map. The JSON names would be array and object.
  • Pairs of is_xyz() and as_xyz() functions are unidiomatic in Rust compared to an xyz() method that returns Option<Xyz>.
    • It looks like your implementation doesn't panic, but instead chooses to do some seemingly arbitrary things (like treating null as an empty vec, and putting other non-arrays into a singleton vec). Not only is this undocumented, but it seems most unusual for Rust. Is this some sort of established standard for working with JSON? Is it just what gjson does?
  • get is a very unconventional name for a parsing function. It should be called parse or from_str. Better yet, you could implement the FromStr trait.

Furthermore, when I look at the signature of get, I am puzzled by what it must do in the case of an error. Looking at the library in more detail, I see you have a variant Value::NotExist. This should not exist (pun intended)! You should return Result<Value, SomeErrorType> Option<Value>.

Thanks for your criticisms!
I'm a newcomer to the rust, and some of the functions may be misnamed. I might adjust them, and Maybe you can help me.

As for get, first of all, there is a parse function which is not quite the same as get. Then get and NotExist exist, because I want to implement a clean chain call. such as:

let a = ajson::get("a").get("b").get("c");

I just think it's better than get("a")?.get("b")?.get("c") .

Rust API guidelines is a good start for understanding how idiomatic libraries could be designed.

1 Like

Thanks for the test. What if you only want the value of widget.window.name...? Regular json parsing might require defining complex structs or take more time.

Thanks! I will take a look.

I added a bench that does the same thing as the ajson benchmark using serde_derive (using throwaway structs for each call), and I'm very surprised to report that serde_derive is in fact slower!

You can see my benchmark here.

ajson benchmark         time:   [6.7076 us 6.7177 us 6.7289 us]                             
                        change: [-7.3541% -7.0182% -6.6969%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

serde derive            time:   [9.3054 us 9.3284 us 9.3534 us]                          
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Might this be because I need to allocate a Vec<Item> for the last statement? Perhaps it'd be useful to have a benchmark without that bit.

Edit: Nope, even without that part, I get 3.5us for ajson and 6.7us for serde derive.

To clarify: that is when deserializing the same input 5x to different types. Typically with serde_derive you would deserialize just once and get all the fields you need from that.

Right. I did that because my theory was that serde_derive would be faster than ajson even in the use case of extracting a single field, which your benchmark (which put them all in one struct) didn't compare.

1 Like

Perhaps I could add a get_many function that recognizes the same path prefix, which should improve the efficiency of multi-field queries. Like this:

let values = ajson::get_many("widget.image.src", "widget.text.data");

This will only query the widget once