How to handle 64 MiB encoded JSON to Rust Native Messaging host?

EDIT

For absolute clarity, JSON Array means the form [91,...,44,...,93], where no JSON Object {} is expected; just flat Array format.

An array is an ordered collection of values. An array begins with [left bracket and ends with ]right bracket. Values are separated by ,comma.

Kindly keep in mind that "whitespace" is/can be counted as bytes; see Eliminate space counted as message length and Eliminate space counted as message length.

I'm working on implementing 64 MiB input support for the Native Messaging hosts I've written, and gotten a bunch of help writing.

So far I've completed JavaScript that works using node, deno, and bun; Bytecode Alliance's javy which depends on QuickJS (Rust crate) to compile JavaScript source to WASM; and AssemblyScript (see Parsing JSON manually - #11 by guest271314).

This is the protocol Native messaging  |  Chrome for Developers

Chrome starts each native messaging host in a separate process and communicates with it using standard input (stdin ) and standard output (stdout ). The same format is used to send messages in both directions; each message is serialized using JSON, UTF-8 encoded and is preceded with 32-bit message length in native byte order. The maximum size of a single message from the native messaging host is 1 MB, mainly to protect Chrome from misbehaving native applications. The maximum size of the message sent to the native messaging host is 64 MiB.

What I'm working wiith right now NativeMessagingHosts/nm_rust.rs at main · guest271314/NativeMessagingHosts · GitHub.

I am not a Rustacean; I don't write Rust everyday. I think getMessage() doesn't have to change; only sendMessage() needs to be modified to parse, extract, and send valid JSON (encoded as u8 in the working code I've got) back to the browser

pub fn sendMessage(message: &[u8]) -> io::Result<()> {
  let mut stdout = io::stdout();
  let length = message.len() as u32;
  stdout.write_all(&length.to_ne_bytes())?;
  stdout.write_all(message)?;
  stdout.flush()?;
  Ok(())
}

How would you go about doing that?

Related: How to implement a Native Messaging host using only Rust standard library?

Array's only. Because it's simpler to convert string characters to UTF8 code points and serialize to JSON Array.

Here's the algorithm I have working in AssemblyScript and QuickJS implementations of the algorithm.

How to convert this algorithm to Rust?

function sendMessage(message) {
  if (message.length > 1024 ** 2) {
    const json = message;
    const data = new Array();
    let fromIndex = 1024 ** 2 - 8;
    let index = 0;
    let i = 0;
    do {
      i = json.indexOf(44, fromIndex);
      const arr = json.subarray(index, i);
      data.push(arr);
      index = i;
      fromIndex += 1024 ** 2 - 8;
    } while (fromIndex < json.length);
    if (index < json.length) {
      data.push(json.subarray(index));
    }
    for (let j = 0; j < data.length; j++) {
      const start = data[j][0];
      const end = data[j][data[j].length - 1];
      if (start === 91 && end !== 44 && end !== 93) {
        const x = new Uint8Array(data[j].length + 1);
        for (let i2 = 0; i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        x[x.length - 1] = 93;
        data[j] = x;
      }
      if (start === 44 && end !== 93) {
        const x = new Uint8Array(data[j].length + 1);
        x[0] = 91;
        for (let i2 = 1; i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        x[x.length - 1] = 93;
        data[j] = x;
      }
      if (start === 44 && end === 93) {
        const x = new Uint8Array(data[j].length);
        x[0] = 91;
        for (let i2 = 1; i2 < data[j].length; i2++) {
          x[i2] = data[j][i2];
        }
        data[j] = x;
      }
    }
    for (let k = 0; k < data.length; k++) {
      const arr = data[k];
      const header = Uint32Array.from(
        {
          length: 4,
        },
        (_, index) => (arr.length >> (index * 8)) & 0xff,
      );
      const output = new Uint8Array(header.length + arr.length);
      output.set(header, 0);
      output.set(arr, 4);
      std.out.write(output.buffer, 0, output.length);
      std.out.flush();
      std.gc();
    }
  } else {
    const header = Uint32Array.from({
      length: 4,
    }, (_, index) => (message.length >> (index * 8)) & 0xff);
    const output = new Uint8Array(header.length + message.length);
    output.set(header, 0);
    output.set(message, 4);
    std.out.write(output.buffer, 0, output.length);
    std.out.flush();
    std.gc();
  }
}

Interesting results comparing the code I wrote above, and what Google Gemini spit out, based on a user on Discord feeding the program not only the code I wrote, but also the conversation we were having

Actual test results: Gemini

0    'nm_qjs'    4.778599999994039

guest271314

0    'nm_qjs'    37.553

I would prefer to deal with humans. Ironically, humans would prefer to not deal with me. Oh, well... I don't see myself breaking out an "AI" prompt on a regular or even intermitten basis.

guest271314 vs. Gemini

Here's what Google's Demini 3 spit out, based on my original code written in JavaScript, and that JavaScript optimized by the program, and that optimization implemented in Rust. It works

use std::io::{self, Read, Write};

const CHUNK_SIZE: usize = 1024 * 1024; // 1MB
const COMMA: u8 = b',';
const OPEN_BRACKET: u8 = b'[';
const CLOSE_BRACKET: u8 = b']';

fn main() -> Result<(), Box<dyn std::error::Error>> {
    loop {
        let msg = read_input()?;
        // eprintln!("{:?}", std::str::from_utf8(&msg));
        send_message(&msg)?;
    }
}

pub fn read_input() -> io::Result<Vec<u8>> {
    let mut instream = io::stdin();
    let mut length = [0; 4];
    instream.read_exact(&mut length)?;
    let mut buffer = vec![0; u32::from_ne_bytes(length) as usize];
    instream.read_exact(&mut buffer)?;
    Ok(buffer)
}

fn send_message(message: &[u8]) -> io::Result<()> {
    let stdout = io::stdout();
    let mut handle = stdout.lock(); // Locking stdout is faster for repeated writes

    if message.len() <= CHUNK_SIZE {
        write_chunk(&mut handle, message)?;
        return Ok(());
    }

    let mut index = 0;
    while index < message.len() {
        // 1. Determine split point
        let search_start = (index + CHUNK_SIZE).saturating_sub(8);
        let split_index = if search_start >= message.len() {
            message.len()
        } else {
            // Find next comma or end of slice
            message[search_start..]
                .iter()
                .position(|&b| b == COMMA)
                .map(|p| search_start + p)
                .unwrap_or(message.len())
        };

        let raw_chunk = &message[index..split_index];
        if raw_chunk.is_empty() { break; }

        let start_byte = raw_chunk[0];
        let end_byte = raw_chunk[raw_chunk.len() - 1];

        // 2. Determine necessary wrapping
        let mut needs_open = false;
        let mut needs_close = false;
        let mut body = raw_chunk;

        if start_byte == OPEN_BRACKET {
            if end_byte != CLOSE_BRACKET {
                needs_close = true;
            }
        } else if start_byte == COMMA {
            needs_open = true;
            body = &raw_chunk[1..]; // Skip the leading comma
            if body.last() != Some(&CLOSE_BRACKET) {
                needs_close = true;
            }
        }

        // 3. Calculate total length and write
        let total_payload_len = (if needs_open { 1 } else { 0 }) 
                              + body.len() 
                              + (if needs_close { 1 } else { 0 });

        // Write 4-byte Little Endian header
        handle.write_all(&(total_payload_len as u32).to_le_bytes())?;

        // Write the body parts (Vectorized-style I/O)
        if needs_open { handle.write_all(&[OPEN_BRACKET])?; }
        handle.write_all(body)?;
        if needs_close { handle.write_all(&[CLOSE_BRACKET])?; }

        handle.flush()?;
        index = split_index;
    }

    Ok(())
}

fn write_chunk<W: Write>(writer: &mut W, data: &[u8]) -> io::Result<()> {
    writer.write_all(&(data.len() as u32).to_le_bytes())?;
    writer.write_all(data)?;
    writer.flush()
}

Performance results of QuickJS and Rust implementation of the same algorithm

0	'nm_rust'	5.0467999999821185
1	'nm_qjs'	5.6456999999880795

It sure would be useful to put some human eyes on the Rust code Gemini spit out based on my original JavaScript code.

Same algorithm. Echoing 64 MiB of JSON over Native Messaging protocol. Rust compiled to native executable (Linux x86_64), JavaScript using QuickJS engine and CLI runtime, AssemblyScript executed by bun , Rust compiled to WASM executed by bun (bun is faster than wasmtime ), couple runs

0    'nm_rust'    4.28740000000596
1    'nm_wasm'    4.552199999988079
2    'nm_qjs'    4.657
3    'nm_assemblyscript'    4.83559999999404
0    'nm_rust'    3.7882000000178815
1    'nm_assemblyscript'    4.6855
2    'nm_wasm'    4.688
3    'nm_qjs'    4.918