Converting a C Handler interface into a Rust Result style

#1

I am currently trying to put a Rust wrapper around a C library, namely
libraptor:

http://librdf.org/raptor/libraptor.html

This is a parser, and it has a callback handler interface. A parser
has one handler for statements (i.e. the things we are trying to
parse) and a log handler (i.e. errors or warnings). The parser can
parse files, URL contents or strings, which can include pushing data
into the parser as a char buffer
(http://librdf.org/raptor/api/raptor2-section-parser.html#raptor-parser-parse-chunk)

I was thinking of turning this into a more rust like interface –
a parser with a method “read_event() -> Result<Statement,Error>”.
But if I do this, I cannot push large quantities of data at the
parser; I could store the events going to the statement handler, but I
can’t necessarily correlate these with those going to the log handler.

So the only solution I have come up with is to send one char at a time
to the parser. This seems to be working (I need to test it more
though!), but appears to be pretty slow (about 10x slower than passing
all the data in a single call). I haven’t worked out yet whether this
slowness is in Rust or the C library, although this is clearly next on
the list.

Am I going about this is the right way?

Also, my current mechanism for passing char at a time to raptor looks
like this:

let rust_rdf = include_str!("../src/test.rdf");
for ch in rust_rdf.chars(){
            let c_ch = CString::new(ch.to_string()).unwrap();
            let _rtn = raptor_parser_parse_chunk(rdf_parser,
                                                 c_ch.as_bytes_with_nul().as_ptr(),
                                                 c_ch.as_bytes().len(),
                                                 0);
        }

I am relatively sure that this is not a good way to do things. Any
hints about quicker ways would be good.

0 Likes

#2

Why not something like:

let rust_rdf = include_bytes!("../src/test.rdf");
if raptor_parser_parse_chunk(rdf_parser, rust_rdf.as_ptr(), rust_rdf.len(), 0) != 0 {
    return Err(/* whatever */);
}
0 Likes

#3

Because when I parse the chunk, the statement handler may or may not actually signal any new statements, depending on whether I have got to the end of a statement or not. So, if the chunk results in multiple statements, then what do I return here?

0 Likes

#4

Your method of passing chars is quite slow, because CString::new allocates for every char.
The parser takes length, so the \0 terminator doesn’t seem to be needed at all.

You could feed it one byte at a time this way:

let rust_rdf = include_str!("../src/test.rdf");
let rust_rdf_bytes = rust_rdf.as_bytes();

for idx in 0..rust_rdf_bytes.len() {
    raptor_parser_parse_chunk(rdf_parser, rust_rdf_bytes[idx..].as_ptr(), 1, 0);
}

but overall I’d recommend approaching the whole problem differently. You’re trying to do two things at the same time:

  1. Expose the parser to Rust
  2. Change the interface of the parser

A better approach may be to do it in two phases, 1 - expose the parser to Rust with the same parsing method that C uses. 2 - wrap parsing into something nicer while already using pure Rust.

Other possible Rusty interfaces:

  • Return Result<Vec<Statement>> allowing for 0 or more statements per chunk
  • Make a struct Parser<R: Read> {buffer: Vec<Statement>} that implements Iterator. In next() keep reading and parsing until the parser returns something. If it returns more than one thing, buffer it in the iterator, and return only first.
0 Likes

#5

I’ve already split the two parts – I’ve just used bindgen as the FFI (that’s where the `raptor_parser_parse_chunk’ function comes from). It seems to be working well and is in its own -sys crate.

Thanks for pointing out the slice method, it is 2-3 times faster. Overall, though, for a large file, performance using this method is 2-3 orders of magnitude slower than passing all of the data in a single function invocation. My guess is that this stems from the underlying C library, so I think I have to look at a different interface. Both of your suggestion seem reasonable.

Thanks for the advice!

0 Likes

#6

I think I am coming to the conclusion that I should provide a call back interface in Rust also. Anything else requires me to buffer up events that I generate from an initial “parse” call. Given that raptor includes methods for parsing entire files this means I would potentially have to buffer the full file in memory.

I guess a push interface which takes something like a Result would work okay.

0 Likes