Computer Vision in Rust?

Hi,

I'm currently working on computer vision algorithms, more specifically direct visual odometry (similar to DSO). As far as I know all these algorithms are implemented in C++ (or Matlab). I have the feeling that this project could be a good opportunity to try out Rust. I love the idea of having a very fast language, with a much better type system than C++.

To my knowledge, the requirements of such a project are:

  • Reading and showing images / videos ( / streams).
  • Manipulation of images as matrices / tensors.
  • Linear algebra with those matrices / tensors:
    • Standard linear algebra operations (additions, multiplications, decompositions, ...).
    • Specific linear algebra of rigid body motions, similarities, projective geometry, ...
  • Linear and non-linear solvers and optimization, like Gauss-Newton optimization.
  • Graph optimization (similar to g2o).
  • Appearance-based large-scale loop closure detection (similar to openFABMAP).
  • Manipulation of point clouds.
  • 3D scene visualization:
    • Visualization of point clouds.
    • Visualization of simple shapes, lines, meshes.
    • Scene navigation (being able to zoom, rotate, ...).

That makes a lot of requirements and I might have forgotten some. I'd rather do the project entirely in one language, either C++ or Rust but no mix (or max on library). So my questions are:

  • Do you think the Rust ecosystem is mature enough for a project like this? (available / missing libraries)
  • Are there hard stoppers, caveats, warnings I should be aware of?
  • Are there other computer vision projects in Rust?

Thanks in advance for any honest answer!

10 Likes

How secure do you need your library to be?

I suspect the ease of development in C++, the amount of mature libraries available in C++ ecosystem heavily outweigh the safety you get in Rust.

There are the typical vulnerabilities related to memory errors in C++ libraries(e.g. OpenCV), leading to arbitrary code execution, DoS, etc, but their initial development began when security was not a major concern. If you are writing your algorithms from scratch in C++ along with help of static and dynamic analyzers, then you probably will have easier time developing than in Rust and achieve roughly same level of safety depending on how critical your library/system needs to be.

2 Likes

Safety is not my main concern. What attracts me most is the expressiveness of the type system (ADT) and the ownership model. And I'd like to know how much I can reuse (available libraries) and how much I'd have to recode.

So far, I've identified:

  • PistonDevelopers/image and pcwalton/rust-media for image / video manipulation
  • rust-ndarray and nalgebra for matrices manipulation, projective geometry, linear solvers
  • bluss/petgraph for expression of graphs
  • l3ck/rust-3d or aepsil0n/acacia for point clouds
  • For visualization:
    • PistonDevelopers/conrod for GUI visualization
    • sebcrozet/kiss3d for graphics engine (visualize point clouds, cameras, etc.).
    • gfx for low-level graphics

I didn't find any library for:

  • non-linear optimization
  • optimization over graphs
  • loop-closure detection (image appearance algorithm)

Any feedback on these libraries about maturity, maintenance, inter compatibility?

I think a lib like opencv is important to computer vision in Rust. Yes, there is a opencv binding to Rust, which however has not updated for two years, and the document is not compariable to its origin C++ counterpart.

The data i/o lib is also far from usable. For example, many scientific data are stored as hdf5. The rust hdf5 lib is available, however the document is not very friendly, and it has not been updated for two years.

The corresponding reddit post for reference.

pcwalton/rust-media is not maintained. I recommend using gstreamer wrapper instead.

2 Likes

@kornel I'd really like to avoid c/c++ wrappers but if there is no choice thanks for the reference. Are there alternatives in Rust? Are there libraries with high level abstractions like what you'd do in OpenCV?

cap = cv2.VideoCapture(0);
frame = cap.read();

I'm not aware of any production-ready video codecs in Rust, so avoiding C wrappers will be very limiting.

C wrappers usually do add high level abstractions and a level of safety on top of the C APIs.

1 Like

I just watched this video from FOSDEM 18 about GStreamer and Rust so I'm just adding the reference here:

@mattpiz Did you find a path forward? I am interested in this topic as well.

I've had a limited experience so far, probably 2 months in cumulated time (doing some other stuff). It's been quite positive and encouraging on the potential of Rust but with some rough edges too to be honest. I'll be busy for a couple of days but I'll write some more detailed feedback on my experience this weekend if you're interested.

That would be great!

So "this weekend" turned out to be 17 days later XD.

I think I can split my experiments into 3 categories.

  1. Tried and successively used Rust to do it:
    • Most of the linear algebra I needed. For this I used nalgebra. It is very well designed and covers all the needs I had, like matrix manipulations, decompositions (cholesky, svd, ...), rigid body motions (rotations, translations, ...).
    • Image reading / writing with PistonDevelopers/image. For 16 bits png images though I had to do it directly with PistonDevelopers/image-png and I mentionned how I did it in this issue.
    • Meta data files reading / writing with serde.
  2. Tried and would consider it a failed attempt:
    • Graphics visualizations. The GUI lanscape in Rust is actually quite complicated. It's memory model makes it complex to have efficient and simple GUI apis. I've experimented with PistonDevelopers/conrod a bit. It felt too complicated for my needs. I've written my thoughts in this issue in which mitchmindtree redirected me to nannou that would potentially be easier. I've not digged into it. In any case, I strongly recommand that you read this excellent post about interfaces and Rust.
    • Rapid prototyping with plots. I haven't found a way to produces plots "quick and dirty" to visualize if I'm doing things properly. The library milliams/plotlib could be nice but in the end, I chose to "just" export to csv files and visualize what I want with some vega-lite dataviz code. Not optimal but kinda work.
    • Video reading. It just felt that the overhead to make it work over just making images out of the video and read/write images was not worth it.
  3. Did not try yet:
    • graph manipulation
    • appearance algorithms for image matching

That's basically it. My overall impression is that with nalgebra I can re-implement most of the algorithms I need but the tooling for exploring approaches and evaluate them quickly is not there yet.

Rust has also a steep learning curve in my opinion. I very much like it's type system but I'm often fighting with the compiler for memory management, lifetime annotations and trying to understand macros / traits issues. The good thing is that most of the time, when it compiles, it works.

6 Likes

Thanks for outlining this @mattpiz! It would be nice if there was even a small rust lib to handle some of these tasks.

Where are you with this, did you published a crate?

You could try imgui-rs to show the images

https://github.com/Gekkio/imgui-rs/blob/master/imgui-glium-examples/examples/custom_textures.rs

Actually I did ^^, few weeks ago, here it is: https://crates.io/crates/visual-odometry-rs

As explained in the readme, I focused on visual odometry, so it isn't generic computer vision. Some part of the library might still interest others. The published version (0.1) is functional but I'm still working on few research improvements so did not advertised it yet. A research paper is also in the making so not everything is explained in the readme, would be cumbersome and not suited.

But if you find things you like or dislike in there, don't hesitate to let me know anyway.

3 Likes

I've heard a lot of good about immediate mode GUIs. Currently though, I've managed to keep my code and dependencies as pure Rust (no bindings) so I'd like to keep it this way. I havn't been in an absolute need of visualizations, so I'll wait for the moment. From the research point of view, I mainly need to evaluate my work, and no need of visualizations for that.

Visualizations are great for advertising the work though, so I'll try to keep an eye on Rust GUIs experiments.

Not sure that could suit your needs, but I've released today a version of kiss3d (a pure-rust, WASM-compatible, 2D/3D graphics engine) with support of immediate-mode GUIs based on the conrod crate. I use it for my experiments with nphysics (a pure-rust physics engine). For a simple GUI example, see: https://www.nphysics.org/demo_all_examples3/.

Gstreamer bindings for Rust seems to be working quite efficiently indeed. Wish there were more documentation and working examples of the Gstreamer modules though.