Need help to show difference between statistics obtained as vec of structs

I got (JSON) data that I obtain from a source, and parse into a vec of structs.

The first question I have is: how can I obtain the data again, and show the difference for each MetricValue.value field and for the MetricLatency.total_count and total_sum fields in the Metrics.metric vec, where Metrics.metric_type and id match?

Second question: is there an elegant way to loop over and show the data by Metric.metric_type, and then by the metricValue and metric latency structs (where the value and total_count != 0, and where the difference != 0 too)?

Do you have an example of what the result should look like? It's still not clear to me what e.g. your first question means:

The differences between what? And what should metric_type and id match? Should they match each other within the same NamedMetrics instance? Or across different instances? Or should they both match something else simultaneously? It would be great if you could include e.g. a mock Debug or JSON representation of the output you are seeking.

Sorry about that.

I parsed JSON data, which results in a vec of structs, for which each struct has a vec for metrics. The metric vec inside the struct contains two struct types, a name-value pair and a struct with name, total_count and total_sum (number of occurrences and the the total time of these occurrences, along with some extra statistical data, for now I am interested in total_count and total_sum). The name-value and name-total_count,total_sum statistics represent that statistic at a certain point in time.

At a later point in time, I request the same JSON structure, and get a JSON response, is likely to have an identical structure, but some altered values, based on server usage for value, total_count and total_sum. I say likely to have an identical structure, the JSON structure represents objects in a database, and objects can have be removed or added.

Based on JSON structure, I have a struct with the metric_type server, which represents a server, about which I want to display data, and a metric_type table, which represents a table, for which there can be several, and a metric_type tablet, which contains one or more shards which are connected to a table.

What I want to achieve, is collect the JSON data in two different periods of time, and then present the metrics as new minus old for: value, total_count and total_sum per metric_type, so:

Metric_type: server, id: yb.tabletserver
name:                                         value
mem_tracker_Compressed_Read_Buffer_Receive     5555
name:                                         count   sum
handler_latency_outbound_call_time_to_response 5000 666666

Metric_type: table, table_name: benchmark_table
name:                                         value
log_gc_running                                 5555
name:                                         count   sum
rocksdb_write_raw_block_micros                 5000 666666

Metric_type:tablet, table_name: benchmark_table, id: 53c653c79e664b11a61aef1950132c7e
name:                                         value
rocksdb_l0_hit                                 5555
rocksdb_l0_hit                                 5555

Metric_type:tablet, table_name: benchmark_table, id: 327da7d8f53445d5a09f962dec529de5
name:                                         value
rocksdb_l0_hit                                 5555
rocksdb_l0_hit                                 5555

I put a minimal amount of metrics in the example, in reality there are a lot more per struct, and there are many more tables and tablets. I only want to see the metrics that have been changed and/or are non-zero.

1 Like

OK so if I understand correctly, you want to perform a 3-level nested grouping by metric type, ID, and metric name, and get the difference of the corresponding leaves, only taking common paths along the tree into account, while also filtering differences of 0 (unchanged metrics).

Here's one possible implementation based on nested maps. Building the maps runs in linear time (proportional to the number of leaves/named metrics), and the same is true for looking up the corresponding metrics by type-id-name.

I was already going to suggest in my previous post, but this is much more complicated to perform in memory than needed. I was going to recommend you to use a real relational database with aggregated queries, but as per your description, these data already come from a database. This just reinforces my feeling that instead of doing all the JSON encoding-decoding-aggregation-computation dance in memory, you would be much better off performing the aggregation in the DB itself, using its own query language, and only serializing the result to JSON. This would obviate the need for reading the entire (supposedly large) structure into memory and serializing/deserializing it at least twice or thrice. The code would be much simpler to read and write, too, as this is basically a single inner JOIN on three fields, and DB query languages are optimized for expressing relationships succintly.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.