New week, new rusty stuff. What are you folks up to?
I want to flesh out the scanners and marker types for
scan-rules for things like controlling whitespace (for parsing things like entirely space-delimited matrices), plus work on seekrit evil macro thing.
And last week I “stared” some Rust gihub repo that I’m planning to start contribute too as soon as possible.
Now that I’m mostly done with the porting of Rust to Illumos, I decided to take a small break and finish some of my old projects.
These include a pair of WebSocket-related crates: first is mio-websocket, a high-level library based on mio’s event loop to write high-performance WS clients and servers. And the second is websocket-essentials, a base library that implements some very basic common stuff like frames handling. It’s not tied to any particular implementation, so I hope it would be useful for any WebSocket-related project in Rust.
Both are very early-stage because first they were started with an educational purpose, as an illustration for my blog posts (Rust in Detail). And that’s another project that I want to work on this week - to continue with a stalled third part of the series that will concern with even more low-level implementation details.
So that’s what I’m working on this week. Eager to hear about other projects!
I have an online DFA engine working in the
regex crate, and it’s supa-fast! Most of what’s left is polishing and making sure, you know, the DFA doesn’t create an exponential number of states.
After that, I plan on working towards a 1.0 milestone.
As a demonstration of its utility, perhaps you could adapt https://github.com/cyderize/rust-websocket to use websocket-essentials!
I’m working on https://github.com/emoon/dynamic_reload which is crate for dynamic (re)loading of shared libraries. The intended use is for applications that loads dll/shared objects and want to reload the but not exit the main application. So that means that user can change some code in a shared library, hit compile and changes will appear directly in the main application without a restart.
One can see this as lite version of “live coding” which makes it possible to change running Rust code. I already have this working in https://github.com/emoon/ProDBG but I decided to separate out the code, make it more general. This also works for shared libraries written in any language (such as C/C++ also) as the code really doesn’t care about that.
Currently this is much WIP and the new code doesn’t work yet but I hope to have it up and running this week and that is also why there is very little info in the repro and what it actually does, more info to come later.
I’m starting to dig deep into OpenGL and put together a better rendering system for Gunship. My previous work trying to improve the gameplay layer hit a wall and I got fed up with not being able to make pretty pictures with the engine so I’ve changed focus for the time being.
Working on the FoxBox. I’m currently implementing a “If This Then That”-style decision engine dedicated to scripting SmartDevices in your home (or farm, or shop, or school)
Planning a hackathon for next week in the Bay Area for our Mirama smartglasses. During last year we switched to Rust for our API to build a proof of concept hardware+software and some demo apps. This experience plus seeing how the Rust community thrives gave us enough confidence to feel that this is the right way for us. More information will be posted on meetup.com and here later.
Very cool! I’ve been working a bit on some benchmarks for searching through large texts (currently the complete works of Mark Twain and the old testament in Greek), and your new branch does really well, beating RE2 much of the time.
I did notice a couple of things while working on
regex-dfa that might be useful:
when you look for prefixes for a dfa, you can ignore any transitions that get you back to the initial (dfa) state. With this optimization, the so-called
hardbenchmark gets a literal prefix and so it goes at the speed of
memchr! I don’t know if this affects any real-world regexes though…
memchrdoes much worse on texts in non-ASCII alphabets, because lots of codepoints have the same first byte. A better heuristic seems to be to use
memchrto search for the last byte of the first codepoint. (Longer term, you could use
pcmpestriinstructions to search for the whole codepoint.)
Ah! Right! Of course! Excellent observation. I think this also means that the optimization can be applied to the DFA running in reverse as well. (Which I had been flummoxed on until you mentioned this.)
Yup, those are on my TODO list. I think supporting pcmpestri in nightly only would be pretty cool. (The
jetscii crate does it.)
This is fantastic. Before I get my DFA merged I plan on completely rewriting the benchmarks w/ analysis, so that it’s easier to track which optimizations are kicking in. (The number of permutations of optimizations has grown considerably.)
Perhaps I can pick your brain some time about adding word boundaries to the DFA, because it’s a thorn in my side.
Do you know why my DFA branch beats your offline DFA on some of those benchmarks? (e.g.,
twain_06b) Are you still using Aho-Corasick? And if so, are you using
I’ve played in the past with this(when I saw this video of live code patching in Erlang) but I dropped this idea because I came to the conclusion that I cannot make it in a simple way, safe…
How do you handle situations like having a pointer into a structure that comes from a shared library and then the structure is modified in the shared library? What happens when you access that data?
TL;DR. I don’t. It is up to the application to handle that.
The way it will works is that before the shared library is about to get reloaded the application will get a callback telling it a library is about to get reloaded. It’s up to the application to decide what to do. In my case in ProDBG I will call a function in the library which will serialize it’s state (using an API provided by the application) and then after reload I will call it again with the serialized data so it can restore the state.
Also in the ProDBG case plugins are very self contained. They only talk to the outside by using a messages and the host application only holds one opaque pointer to the instance of each plugin. This pointer is never used by the application itself but only sent in when the instance needs an update.
So this will require that applications using this will need to think about how data is being accessed. Regarding safety shared libraries in Rust are right now always considered unsafe as they call into unknown code.
Despite these issue having ‘live’ change of code this way is nice. This way will never reach something like Common Lisp quality but Common Lisp works very differently from Rust (or C/C++) but still this is much better than nothing.
No, I took our Aho-Corasick because I’m almost as fast (most of the time) without it (and I think I’ll be just as fast if I optimize a bit more).
I thought quite a bit about word boundaries (they’re responsible for a big chunk of compexity in
regex-dfa) and I’m happy to share, but I don’t see how to fully support them in a dynamic DFA. I think you can do it when you have ASCII before and after the word boundary instruction though: break the word boundary instruction into two parallel instructions, one that accepts non-word/word and one that accepts word/non-word. (Often you can get rid of one of these, e.g. if
\b is preceded or followed by either a word or non-word char.)
Now let’s say
i is the index of the non-word/word instruction.
When you get an ASCII byte that moves you to instruction
i to the next state if and only if the byte was a non-word. When you start at a state containing
i, pretend that the state also contains
i.goto if and only if the current byte is a word byte.
Then there are a bunch of corner cases – if the empty look leads to a match, or if it’s the first instruction, or if there are other look instructions or epsilon instructions leading to or from the empty look…
Still working on my TCP tunneling library :’( This is my first time dealing with concurrency in programming, and it’s really hard to wrap my head around the concepts. Setting up the local end of the tunnel with mio seems easy enough, the part I’m having trouble wrapping my head around is making it non-blocking and not relying on timeouts. I haven’t decided if I should have the library write received data to a socket and have the parent program transparently encrypting it, or if I should just have it write the received data and tunnel headers to a buffer, then have the parent program encrypt the data and write it to a socket.
I’ve also started back at school trying to finally finish my CS degree, so my free time to try and figure this out has dropped to near 0
Working on an Elasticsearch Client for Rust. My focus is on 100% Rust for codegen, fully-featured implementations of Elasticsearch’s core types, and a json parsing macro for the REST API, so there’s no client-specific DSL.
It’s both my first serious Rust and open source project so it’s been great to explore some of GitHub & Friends features. The work has also taken me through a bunch of the rust compiler features; working with their ASTs and writing compiler plugins.
I’m working on a little image scraper written in Rust, mainly to test the dependencies and as an exercise in Rust. It uses hyper for the HTTP requests and kuchiki as CSS selector engine for getting the image sources.
Usage is simple, just run
./scrape URL "#main img" TARGET_DIR to download all the images that are in a container with id main into the TARGET_DIR. TARGET_DIR will be created recursively.