How to separately process specific Message types in protobuf-rust

I have a message A that is used everywhere.

message A {
      u32 define_type = 1;
      bytes sub_message = 2;
}

the sub_message is protobuf message which is dynamic and the type is determined by define_type

the message A is used as an attribute in many other messages

// normal
message B {
      u32 foo = 1;
      A  a = 2;
}
// nested
message C {
      u32 foo = 1;
      B b = 2;
}

every time I use protobuf_json_mapping::print_to_string to convert rust-protobuf message to JSON string. I need to get the real message behind sub_message.

In js. protobuf.wrappers solves my problem.

const protobuf = require('protobufjs')
protobuf.wrappers['.pkg.A'] = {
    toObject(message, options) {
        const originOutput = this.toObject(message, options)
        if (map.has(originOutput.define_type)) {
            const message_type = map.get(originOutput.define_type;
            originOutput.sub_message = message_type.decode(Buffer.from(originOutput.sub_message, 'base64'))
        }
        return originOutput
    },
    fromObject(object) {
        return this.fromObject(object)
    },
}

What is the solution similar to protobuf.wrappers in rust-protobuf?

Exactly the same way as in JavaScript. Get the message type from a mapping of define_type to message type and then decode.

But why don't you use oneof instead of the manual tagged union with define_type in your message definitions? This would map nicely to Rust enums. See here for oneof documentation.

thanks for your reply. This community is very active, and someone will always reply to you, unlike issues on GitHub.

maybe you didn't get my point. let me explain.

I have a private protocol based on TCP. In that body are bytes of protobuf messages like message B and message C. In the protocol header. I will get the info about which type of body is. You can image I have a map

let map = HashMap<id, message>::new();

when I get a protocol frame. I would get a message type from id. then use parse_from_bytes to get the instance of the message.

let t = map.get(id).unwrap();
let instance = t.descriptor_dyn().parse_from_bytes(bytes).unwrap();

the instance contains message A in the form of bytes.
now I want to print the instance to JSON string and the contained message A too. not bytes.
protobuf_json_mapping ::print_to_string_with_options

and the protocol has been there for years. so no chance oneof

So you don't want to change the .proto, and want to deserialize from the bytes according to id. In this case, it is deserializing from sub_message ([Vec<u8> in Rust) according to define_type (i32 in Rust).

Here I give a concreter example:

In proto:

syntax = "proto3";

package foo;

message Msg {
    int32 type = 1;
    bytes buffer = 2;
}

In build.rs:

use protobuf_codegen::Codegen;

fn main() {
    Codegen::new()
        .pure()
        .include("proto")
        .input("proto/foo.proto")
        .cargo_out_dir("proto")
        .run()
        .unwrap();
    // below is used to remove all `#!` in generated code, 
    // because `#!` is not allowed in `include!` used after
    let path = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap()).join("proto/foo.rs");
    let gen = std::fs::read_to_string(&path).unwrap();
    let processed = gen.replace("#!", "//").replace("//!", "//");
    std::fs::write(path, processed).unwrap();
    println!("cargo:return-if-changed=proto");
    println!("cargo:return-if-changed=build.rs");
}

main.rs:

// --decrease the warnings
#![allow(unknown_lints)]
#![allow(clippy::all)]

#![allow(unused_attributes)]
#![cfg_attr(rustfmt, rustfmt::skip)]

#![allow(box_pointers)]
#![allow(dead_code)]
#![allow(missing_docs)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
#![allow(non_upper_case_globals)]
#![allow(trivial_casts)]
#![allow(unused_results)]
#![allow(unused_mut)]
// --decrease the warnings

// include generated code
include!(concat!(env!("OUT_DIR"), "/proto/foo.rs"));

use protobuf_json_mapping::PrintOptions;
static OPTION: PrintOptions = PrintOptions {
    enum_values_int: true,
    proto_field_name: true,
    always_output_default_values: true,
    _future_options: (),
};


fn main() {
    let msgs = [
        Msg {
            type_: 1,
            buffer: "hello".into(),
            ..Default::default()
        },
        Msg {
            type_: 2,
            buffer: vec![1, 2, 3],
            ..Default::default()
        },
    ];
    for m in msgs {
        let s = protobuf_json_mapping::print_to_string_with_options(&m, &OPTION).unwrap();
        println!("{}", s);
    }
}

The output is:

{"type": 1, "buffer": "aGVsbG8="}
{"type": 2, "buffer": "AQID"}

It's absolutely wrong. This is because print_to_string_with_options doesn't know how to convert the bytes into human readable str.

So, instead of using protobuf_json_mapping, we can seriliaze it by ourselves. (Maybe protobuf_json_mapping is better at deseriliazing)

include!(concat!(env!("OUT_DIR"), "/proto/foo.rs"));

use std::str::from_utf8_unchecked;
use serde::Serialize;
use serde::ser::SerializeMap;

impl Serialize for Msg {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let mut map = serializer.serialize_map(Some(2))?;
        map.serialize_entry("type", &self.type_)?;
        match self.type_ {
            1 => map.serialize_entry("buffer", unsafe{&from_utf8_unchecked(&self.buffer)})?,
            2 => map.serialize_entry("buffer", &self.buffer)?,
            _ => {}
        }
        map.end()
    }
}

fn main() {
    let msgs = [
        Msg {
            type_: 1,
            buffer: "hello".into(),
            ..Default::default()
        },
        Msg {
            type_: 2,
            buffer: vec![1, 2, 3],
            ..Default::default()
        },
    ];
    for m in msgs {
        let s = serde_json::to_string_pretty(&m).unwrap();
        println!("{}", s);
    }
}

This is the output:

{
  "type": 1,
  "buffer": "hello"
}
{
  "type": 2,
  "buffer": "[1, 2, 3]"
}

Above may be what you want! Both you and me may be not native speakers, hope that I understood your purpose.

1 Like

By the way, if you defined a wrapper in proto file like this:

...

message Wrapper {
    Msg inner = 1;
}

This can help.

impl Serialize for Wrapper {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let mut map = serializer.serialize_map(Some(1))?;
        map.serialize_entry("inner", &*self.inner)?;
        map.end()
    }
}

Now, the problem is I have hundreds of Wrappers. It's unlikely that I can implement Serialize on every wrapper

I am new to Rust. I don't know serde yet, I'll take a look first

You can do this:

TL;DR

  1. add #[derive(Seriliaze)] for all wrappers through Codegen::customize_callback;
  2. add fn extract_from_message_fileds<T: Serialize + Default, S: Serializer>(x: &protobuf::MessageField<T>, serializer: S) -> Result<S::Ok, S::Error> to main.rs;
  3. add #[serde(serialize_with = "extract_from_message_fileds")] to inner msgs;
  4. add #[serde(skip)] to skip special_fields;

In build.rs:

use protobuf::descriptor::field_descriptor_proto::Type;
use protobuf::reflect::{FieldDescriptor, MessageDescriptor};
use protobuf_codegen::{Codegen, Customize, CustomizeCallback};

fn main() {
    Codegen::new()
        .pure()
        .include("proto")
        .input("proto/foo.proto")
        .cargo_out_dir("proto")
        .customize_callback(MyCustomize)
        .run()
        .unwrap();
    let path = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap()).join("proto/foo.rs");
    let gen = std::fs::read_to_string(&path).unwrap();
    let processed = gen.replace("#!", "//").replace("//!", "//");
    std::fs::write(path, processed).unwrap();
    println!("cargo:return-if-changed=proto");
    println!("cargo:return-if-changed=build.rs");
}

struct MyCustomize;

impl CustomizeCallback for MyCustomize {
    fn message(&self, message: &MessageDescriptor) -> Customize {
        let c = Customize::default();
        match message.name() {
            "Msg" => c,
            _ => c.before("#[derive(serde::Serialize)]"),
        }
    }

    fn field(&self, field: &FieldDescriptor) -> Customize {
        let c = Customize::default();
        match field.proto().type_() {
            Type::TYPE_MESSAGE => {
                c.before("#[serde(serialize_with = \"extract_from_message_fileds\")]")
            }
            _ => c,
        }
    }

    fn special_field(&self, msg: &MessageDescriptor, _: &str) -> Customize {
        let c = Customize::default();
        match msg.name() {
            "Msg" => c,
            _ => c.before("#[serde(skip_serializing)]"),
        }
    }
}

In main.rs:

#![allow(unknown_lints)]
#![allow(clippy::all)]

#![allow(unused_attributes)]
#![cfg_attr(rustfmt, rustfmt::skip)]

#![allow(box_pointers)]
#![allow(dead_code)]
#![allow(missing_docs)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
#![allow(non_upper_case_globals)]
#![allow(trivial_casts)]
#![allow(unused_results)]
#![allow(unused_mut)]

include!(concat!(env!("OUT_DIR"), "/proto/foo.rs"));

use serde::ser::SerializeStruct;
use serde::Serialize;
use std::str::from_utf8_unchecked;

impl Serialize for Msg {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        let mut state = serializer.serialize_struct("Msg", 2)?;
        state.serialize_field("type", &self.type_)?;
        match self.type_ {
            1 => state.serialize_field("buffer", unsafe { &from_utf8_unchecked(&self.buffer) })?,
            2 => state.serialize_field("buffer", &self.buffer)?,
            _ => {}
        }
        state.end()
    }
}

fn extract_from_message_fileds<T: Serialize + Default, S>(x: &protobuf::MessageField<T>, serializer: S) -> Result<S::Ok, S::Error> where S: serde::Serializer {
    match x.as_ref() {
        Some(x) => x.serialize(serializer),
        None => T::default().serialize(serializer),
    }
}

fn main() {
    let msgs = [
        Msg {
            type_: 1,
            buffer: "hello".into(),
            ..Default::default()
        },
        Msg {
            type_: 2,
            buffer: vec![1, 2, 3],
            ..Default::default()
        },
    ];
    for m in msgs {
        let s = serde_json::to_string_pretty(&m).unwrap();
        println!("{}", s);
    }
    let wrapper =  Wrapper {
        inner: protobuf::MessageField::some(Msg {
            type_: 1,
            buffer: "hello".into(),
            ..Default::default()
        }),
        ..Default::default()
    };
    let s = serde_json::to_string_pretty(&wrapper).unwrap();
    println!("{}", s);
}

Cargo.toml:

[package]
name = "some_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
protobuf = "3.4.0"
protobuf-json-mapping = "3.4.0"
serde = { version = "1", features = ["derive"] }
serde_json = { version = "1" }

[build-dependencies]
protobuf-codegen = "3.4.0"
protobuf = "3.4.0"

In proto:

syntax = "proto3";

package foo;

message Msg {
    int32 type = 1;
    bytes buffer = 2;
}

message Wrapper {
    Msg inner = 1;
}

This can generate code like this:

#[derive(serde::Serialize)]
// @@protoc_insertion_point(message:foo.Wrapper)
#[derive(PartialEq,Clone,Default,Debug)]
pub struct Wrapper {
    // message fields
    #[serde(serialize_with = "extract_from_message_fileds")]
    // @@protoc_insertion_point(field:foo.Wrapper.inner)
    pub inner: ::protobuf::MessageField<Msg>,
    // special fields
    #[serde(skip_serializing)]
    // @@protoc_insertion_point(special_field:foo.Wrapper.special_fields)
    pub special_fields: ::protobuf::SpecialFields,
}
1 Like

非常感谢我看一下 thx a lot

@kingwingfly . I have a new problem.

as you said. I use serde to serialize messages.

protobuf_json_mapping would convert byte to a base64 string.

serde would convert byte to a number array.

due to 孤儿规则(orphan rule)I can't impl Serialize for Vec<u8>. what can I do?

I got an answer from here

so what I need to do is to set #[serde(with= "base64")] to bytes properties by default. is that right?

Alice 是个超级大佬,she is always right!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.