I'm working on reimplementing a relatively simple Python REST API proxy server in Rust. The aim of the rewrite is to save compute resources – the workload is completely CPU bound at the moment. The proxy server translates between two different APIs. It takes JSON requests from clients, transforms the JSON structure, and passes the transformed request on to the backend. The reply from the backend is again transformed and returned to the client. So the main work this proxy is doing is deserializing, restructuring and serializing JSON.
The JSON is deeply nested and contains a lot of strings of vastly different lengths, but usually the same field in each struct will contain a string of similar length for each request. I believe that the proxy server will spend a significant amount of its time doing memory allocations (I haven't measured yet, but I'm prematurely thinking about solutions already anyway).
My main idea to reduce the number of heap allocations is approximately this:
- Create thread-local pools for the incoming JSON structs, so we can reuse them across requests.
- Use
deserialize_in_place()
to deserialize the incoming JSON (both from client and backend). - Use string slices referencing the original JSON for the transformed JSON.
While deserialize_in_place()
works great for String
fields, and will indeed reuse the allocations, I don't think it will help much with Vec<SomeSubStruct>
or even just Option<String>
, which we have a lot of. One top-level structure for JSON replies from the server for example looks like this:
#[derive(Deserialize)]
pub struct DecisionResponse {
pub decisions: HashMap<String, Option<Vec<Decision>>>,
}
The hash map will contain exactly one key most of the time, but the value for that key will sometimes be null
, and sometimes a list of many Decision
s, which themsleves are rather nested objects and contain a lot of String
s and Option<String>
s. Whenever the DecisionResponse
gets overwritten with a resonse having null
as the value, the vector and all Decision
s in it will be deallocated, which means if it gets overwritten again with a long list of Decision
s, all of them and the strings in them will need to be allocated again.
One idea to solve this is to implement my own data structures that keep all allocations around, e.g.
struct KVec<T> {
data: Vec<T>,
length: usize,
}
impl<T> Deref for KVec<T> {
type Target = [T];
fn deref(&self) -> &Self::Target {
&self.data[..self.length]
}
}
Then I could implement deserialize_in_place()
for this in a way to only reduce length
if there are few items then last time, but without actually dropping the unneeded elements. I'd need similar data structures for Option
and HashMap
, and all of them would need custom Serialize
implementations as well.
Overall, I believe the approach I outlined would work, but it's kind of cumbersome to implement. Is there any easier solution to reduce the number of allocations in this scenario? Or is there an easier way to make the deserialize_in_place()
approach work?