[Serde-JSON] Deserializing and Transposing Yahoo Finance data

Moderator note: moved from [Serde-YAML] Deserialize inner block only

Of course. Here's the URL, no authentication needed:

https://query1.finance.yahoo.com/v8/finance/chart/^GDAXI?interval=1h&period1=1690549200&period2=1690814838

Inside the JSON are fields timestamp and quote, which should go into a vector of this:

pub struct Candle {
  pub timestamp:  i64,
  pub open:       f32,
  pub close:      f32,
  pub high:       f32,
  pub low:        f32,
  pub volume:     f32,
}

Everything else in this JSON should be ignored.

Don't mind if that's too much for a quick coding, my implementation works fine already. Just not using serde.

P.S.: sometimes, reason unknown, quote fields are null.

OK, so your problem is basically that you want to transpose the data. The API returns it in structure-of-arrays format, and you want it in array-of-structures order instead.

That this isn't supported out of the box doesn't seem to be the fault of Serde. Arbitrary transformation of data (including transposition) is out of scope for a serialization library, because it's a highly domain-specific task. It's very easy to derive the appropriate deserializers, and then write a separate function that goes over the data and performs the transposition in an additional pass: Playground.

The transpose() function is a whopping 24 lines, while the derived deserializers take up 28 lines and they don't even need any sort of #[serde(…)] attributes related to structural changes.

I couldn't exactly tell from your description at exactly which level the nulls occur, but at this point, it would merely be a matter of wrapping the correct fields in Option and using something like unwrap_or_default() in transpose() to turn them into empty arrays instead, or handle missing values in any other way you deem appropriate.

doh, someone beat me to the punch again. but here's what I came up with: Rust Playground

nearly the same as the other solution, I just wrote a different transposition method.

I'll skip the other points, both because the premise behind your points is flawed, (it's based on a broken way of modeling the data, and once that's a fait accompli the system becomes gerbage in, garbage out) and because I feel they've been addressed adequately by others, but your last point merits a response.

Maybe not for small projects.
But I used to work on a crypto code base for one of the bigger exchanges. My job consisted of porting over PHP to Rust code.

Writing Rust was the easy part.
The hard part was encountering code like:

$thingamabob[$other_obj['key1']] 

And figuring out what the hell nonsense like that is supposed to do. That's literal reverse engineering because it's impossible to say what each of those intermediate values are, let alone the value of the expression as a whole, without lots of runtime instantiation and debugging of various forms (which wasn't helped by the fact that the code base bungled something as basic as preventing race conditions when writing logs, leading to corrupted logs and some rather infuriating work days).

Now, the urge may rise to make the argument "but that's just bad PHP!".
But that's irrelevant, because it's real-world PHP.

And the fact is, Rust is worlds better than the sort of thing in the example, both because it gives you actual types to work with (leading to a natural ontology, at least within the context of a crate), and because it actively encourages leveraging those types, meaning I don't have to deal with what I'll call Mystery Meat Objects (MMO's).
That's what the entire argument around serde is about: leveraging the type system (and macros for ergonomicity).

If I never have to deal with a dynamic key instantiated from runtime data that's then used to index a hashmap in my life, it'll be too soon.

2 Likes

What's so difficult with this? This code looks up $other_obj['key1'], then uses the result as an index into $thingamabob[]. This isn't even "bad PHP", it simply uses a well defined language feature.

Strongly typed languages exist since programming languages exist, even good old C is one. Not exactly an argument for Rust, rather than an argument for strongly typed languages in general.

Nevertheless some 2/3 of all the code out there is written in a less than strongly typed language, so these languages obviously have advantages. Perhaps it isn't even their lack of strong typing, but something else, like code written more along the lines how humans think and operate. I'd say it's worth an effort to get the best of both approaches.

First, thanks for the code. This confirms that I was right with my assumption above: one has to write a structure matching incoming JSON, then map this into the structure used in Rust code.

I just don't see why one would see Serde at fault of anything. Great tool. If it fits, use it. If something else fits better, use that.

In your playground I count 49 lines of code for the parsing (without defining Candle). Here I count 20 lines of code, doing the same:

let parsed_json = json::parse(JSON).unwrap();
let mut candles: Vec<Candle> = Vec::new();

let result = &parsed_json["chart"]["result"][0];
let timestamps = &result["timestamp"];
let indicators = &result["indicators"]["quote"][0];
let opens = &indicators["open"];
let closes = &indicators["close"];
let highs = &indicators["high"];
let lows = &indicators["low"];
let volumes = &indicators["volume"];

for i in 0 .. timestamps.len() {
  candles.push(Candle {
    timestamp:  timestamps[i].as_i64().unwrap_or(0),
    open:       opens[i].as_f32().unwrap_or(0.),
    close:      closes[i].as_f32().unwrap_or(0.),
    high:       highs[i].as_f32().unwrap_or(0.),
    low:        lows[i].as_f32().unwrap_or(0.),
    volume:     volumes[i].as_f32().unwrap_or(0.),
  });
}

Both define mapping between JSON and Rust struct exactly once (Serde in all the structs, above code in, well, the code), so maintenance effort should be the same. Call it PHP-ish if you wish, I call it easily readable and thus easily maintainable code.

Sorry for not making a playground. Last time I tried, I didn't find out how to make it using the json crate. Above is my working code with plausibility checks removed, to match what your playground does.

playground doesn't support all crates. from the help page:

The playground provides the top 100 most downloaded crates from crates.io, the crates from the Rust Cookbook, and all of their dependencies. ... See the complete list of crates to know what’s available.

Also, for what it's worth, serde_json also allows you to work on loosely typed json. This code is nearly identical to what you wrote, with a few minor syntax changes where needed (the value enum only provides 64 bit numbers): Rust Playground

The reason this is generally avoided is because error handling becomes prohibitive in this case. Every index operation ([]) is a potential panic point. If you use the non_pannicing index method (get()), each one of these returns an option. In my opinion it's better to know at parse time that the data is unexpected, rather than in the middle of building up your new type. In this version, it isn't quite "one mapping". You index into the json with ["chart"]["result"][0], all three of which may fail. Repeat for every line of code.

Your code also does this, you create a parsed json data structure, then you convert that json structure into a new struct that better fits your needs. Its just that in this case the intermediate step is loosely typed, instead of statically typed.

This is, in my opinion, much harder to maintain than some struct definitions. You weren't super specific on how some of the fields were null, but I'll imagine it's because sometimes there is no trading data, for example, on weekends or outside of certain hours. Ok, so you change the struct like so:

#[derive(Deserialize)]
struct Quote {
    open: Vec<Option<f64>>,
    close: Vec<Option<f64>>,
    high: Vec<Option<f64>>,
    low: Vec<Option<f64>>,
    volume: Vec<Option<f64>>,
}

Now I get an error:

error[E0308]: mismatched types
  --> src/main.rs:68:33
   |
68 |                         volume: *volume,
   |                                 ^^^^^^^ expected `f64`, found `Option<f64>`
   |
   = note: expected type `f64`
              found enum `std::option::Option<f64>`

Hm... okay, so I can change my struct like so:

#[derive(Debug)]
struct Candle {
    timestamp: i64,
    open: Option<f64>,
    close: Option<f64>,
    high: Option<f64>,
    low: Option<f64>,
    volume: Option<f64>,
}

Operating on the loosely typed data simply doesn't get you the same ergonomics. If the upstream data changes its format, then you need to look through your code carefully to make sure you don't have any runtime panics.

4 Likes

I'm not talking about what it does at the language level, that's kid's stuff. The point is, in a code base you've never seen before, what Rust code do you translate this this to? What should the types be called? What are their field names and types? What data can the struct contain? What are its failure modes? What kinds of data is valid, and what kind is invalid? If you can do all that with ease I'd like to have you prove it because talk is cheap on this one. It used to take me hours just pouring over those broken logs I mentioned, just to get a sense of what those boundaries could be, and it involved a lot of runtime debugging to see what was actually being stored in various PHP variables. And even then I couldn't be sure, because there could always be other ways that code was being used, completely changing the semantics i.e. constary to using static typing, I could never be 100% sure that the model was complete. That liability is thanks to the dynamicity of such PHP code, and given the fact it had to be reverse engineered yet still had business value, it was obviously the wrong way to model the data in retrospect.

Reverse engineering is one of the hardest skills to learn and use effectively in the Software Engineering world, and those who can actually do it get paid a really pretty penny for it.

Sure. What's your point?
Rust isn't the only language to which this applies, but if you'll indulge me, what were we talking about again precisely?

Easier and more convincing explanation: people use what they know, not what is necessarily better for their use case.
Python for example is easy to pick up, despite the fact that some of its features become liabilities in larger code bases. That obviously hasn't stopped anyone from doing so. This very topic is a small case study of that very fact, with OP opting for something they recognize and thus are able to work with, rather than the alternative, even if that alternative has fewer failure modes and doesn't go "against the grain of the language", so to speak.

That's just plainly untrue, I'm living proof of the fact that there exist people for whom statically typed languages reduce cognitive effort required rather than add to it.

It's not just your opinion, that's pretty much the whole point I've been trying to get across to @Traumflug the entire time: static typing is largely about reducing maintenance, i.e. it's about what happens after the code is initially written. The PHP example I gave fails to account for this fact of life, thereby making life much harder for the company (and for me) than it otherwise would have been. The company know this, which is why they're phasing out PHP.

But all this has gone pretty far off-topic. As far as I'm concerned this thread can be closed, if anyone is so inclined they can open up a thread with a proper topic on this.
Unfortunately I don't have moderation powers to do that myself, so.... @quinedot?

2 Likes

Oh come on, this can't be true. Serde allowing what was so evil just two posts above? Ha ha.

Anyways, thank's for the playground.

Out of curiosity I changed one of the fields in @H2CO3's playground. What happens? Well, it panics with a custom message, almost identical behavior.

That said, panicking is the right thing to do here. If JSON no longer fits, it's likely because the data provider changed the data format and an overall review of parsing is required anyways.



Chances are good you were the first to think about this. This isn't how PHP developers work. More typical PHP developers write what they think is right, test it, and if it works, they move on to the next task. And yes, I wrote PHP code for a living for several years, so I do have experience with how these folks approach tasks. Only more experienced developers start to introduce some structuring and, since PHP7, "strict" typing.

Perhaps part of the problem is that personal opinions aren't particularly convincing to me. Python developers would defend just as eagerly the exact opposite opinion about strong typing and data structures.

Another part of the problem is that you try to make the argument that a rather complex structure with some 50 lines of code would be easier to re-recognize after 5 years than a simple series of 20 lines of code. Quite a number of times I've seen structures so complex that it took me many hours to piece together which is related to what. Spread across many files or even dependencies. Similar to your PHP trouble, just the opposite way. Which is why I'm all for strong typing, still keep structures at a minimum. Works well so far.

If it's literature you wanted, you should have said so. Here you go.

If you think that's my argument then I'm sorry, but you haven't been paying attention.

It's not about the number of lines of code, it never was¹. It's about modeling the world in a way that tooling takes real work out of the developer's hands, leading not just to productivity gains through automation, but also to increased correctness². In particular, consistency and correctness checking where possible. That in turn is about keeping the model (and the developer's mind!) sane over time, as changes are made to the code base.

But you're right in the sense that you (and others who prefer a lack of static typing) probably have a fundamentally different way of relating to code relative to how people oriented towards static type systems are. There is a saying both in linguistics and in programming: your language shapes your thoughts. And that's no less true for a language embedded in another language that exists to express and verify metadata.

¹ The number of lines of code is pretty crude and dumb way to measure source code by. It doesn't really convey any useful information, not even a rough estimation of the size of the code, let alone anything more detailed than that.

² I have yet to meet the human being who, despite their best intentions, has never made a mistake.

1 Like

True. Nevertheless two issues here.

  • You don't have this "world", you don't know why Yahoo sends what is does. Which means, your model is just a guess, as error prone as any other code.
  • Both code variants do about the same consistency and correctness checking. In case you didn't notice: something like parsed_json["chart"]["result"][0] does check the structure. It's also static, because these index strings are static.

The code in question doesn't panic, it returns an error. Huge difference. As an aside, technically it could still panic in the transposition function. I believe this version is entirely panic free: Rust Playground

panic crashes the program, result allows you to recover. You don't want to end your api server when encountering malformed data.

P.S. have to share this, my guess about which fields are sometimes null was spot on! this is the output of my playground code (you'll have to run it on your machine, playground disallows network access):

warning: `serde_demo` (bin "serde_demo") generated 2 warnings (run `cargo fix --bin "serde_demo"` to apply 1 suggestion)
    Finished release [optimized] target(s) in 1.05s
     Running `target/release/serde_demo`
[src/main.rs:6] candles = [
    Candle {
        timestamp: 1690549200,
        open: Some(
            16452.75,
        ),
        close: Some(
            16455.44921875,
        ),
        high: Some(
            16465.939453125,
        ),
        low: Some(
            16424.779296875,
        ),
        volume: Some(
            0.0,
        ),
    },

    ...truncated...

    Candle {
        timestamp: 1690812000,
        open: None,
        close: None,
        high: None,
        low: None,
        volume: None,
    },
    Candle {
        timestamp: 1690903800,
        open: Some(
            16240.400390625,
        ),
        close: Some(
            16240.400390625,
        ),
        high: Some(
            16240.400390625,
        ),
        low: Some(
            16240.400390625,
        ),
        volume: Some(
            0.0,
        ),
    },
]

Such projects are unmaintainable when scaling up in the number of developers. It'll work for a personal website, but not for that platform that literal millions of people entrust real money to. And that's not an opinion. If you think otherwise you've never worked in a team consisting of hundreds of people, all working on a single code base.

This is true here, because Yahoo didn't do the work to make that ontology/schema. Which means it's reverse engineering no matter the language used. That's not a problem with static typing or Rust, it's a problem with the data provider.

Now you're either really stretching the meaning of static type checking, or you genuinely don't know what it refers to. This expression checks that field and potentially blows up at runtime, whereas the point of static type checking is catching issues at compile time. With a proper model in place, that is no small difference.

There are problems a static type checker cannot catch; they're not magic, after all.
In particular, a static type system wouldn't have caught such an issue, but I gave the reason why earlier in this very post.

However that it doesn't catch this doesn't matter; you could keep on going all day creating straw man arguments in favor of dynamic code. My question is, why? If you're so hellbent on that way of working, that's perfectly fine, but then why go to a Rust forum to extoll its virtues rather than just using whatever you want to?

Free speech aside, it kind of feels to me like going to a a Christian church and then talking to the convent members about the virtues of Zen Buddhism. Not illegal by any means, but quite strange Indeed.

1 Like

Of course they're hard to maintain. Still they're used in large projects where people entrust millions of money into. Think WordPress, think Joomla, think Drupal, think Magento, think PrestaShop, and lots more. At least half the WWW is running PHP. Almost 80% of all web CMS run on PHP.

Somehow you always confuse static typing with building structures. At least for me, we're not discussing static typing here, but how often or how much one should build data structures aka. data models. Static typing is always given; it's Rust, after all.

You can't catch at compile time issues with a blob of JSON downloaded at runtime. Within Rust code, after parsing, both approaches here are exactly identical, both use the exactly same data structure.

Yeah, the transformation part, but not the deserialization part. I was lazy and assumed that the arrays have the same length; the proper solution is to use get() instead.

No; panics are for irrecoverable internal errors, because they kill the thread and there's no reliable way to handle them. Thus, you must never panic on external conditions, and especially not on untrusted input data. Use Result for that.

"Possible" != "recommended". serde_json contains a Value enum which you can use as if you were indexing a Python dict.

That's not its primary purpose, though. The point of a loosely-typed object representation is not consumption by the end user or domain modeling. Instead, you should use it when generic operations over arbitrary JSON are required (eg., you are writing the equivalent of the jq tool), and you don't want to operate on raw JSON strings. That would incur continuously re-parsing and re-emitting JSON for no reason.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.