Rust to Nim: A Comparison

jlar · December 4, 2020, 10:11am

Over-simplification in exchange for convenience, eh? Bit of a rant below - probably too ignorant and not good for my first post here. Feel free to remove, but this thread seemed like an apt place.

If anything, end-users who work with data - in this case text - need to have more low-level knowledge, not less. Unicode is tricky. Utf-8 is tricky. Pre-unicode (well, pre-utf8), I think I had to to work with three (?) different local encodings for Japanese ca 2005. I'd take Utf-16 without a BOM over that.

Personal opinion, but I feel we are losing important knowledge in exchange for a bit of convenience far too often with digital data. And in my experience, digital data is the flimsiest, most fragile data form humanity has ever created. Practical? Absolutely, but cave paintings will probably outlive the recorded history of humanity post 1975 (or whatever the break-off point may be) if this kind of knowledge becomes that of the few. "Hey, my code flipped a bit in some random place and now my data is unreadable", vs "I smudged out one corner of this cave painting, but it still looks ok" or "The tape of this open reel broke, but we'll just glue it back together".

I work in academia (humanities) and sometimes it feels as if there's no end to the [unintended] malformed textual data that is used for models, statistics and to publish actual results on. This includes, but is not limited to, going from one encoding to utf-8 without checking that the output is actually correct to the human brain (gibberish may still be welformed utf-8, a computer doesn't care), to complete lack of understanding how bytes/code units/code points/glyphs/graphemes/what-have-you differ. As already concluded, it's complex.

Outside of academia, didn't someone exploit Github's normalisation-by-lowercase for user names by using low-frequency graphemes that were then "lowercased" into more common ones (e.g. lowercase isn't applicable to IPA, apparently some algorithms do not care)?

I do agree that string reversal is not that niche and absolutely has uses in my sector. But if the researcher does not take the care to do this properly it will produce the wrong results. Not unexpected, that may be a valid result, but wrong because the input was malformed. Perhaps it's just academia that's sometimes behind?

The more I get involved with data processing, the more I've come to appreciate Rust's stance on many things. But, sure I too reach for Python sometimes.

Topic		Replies	Views
First impressions with Rust	51	11199	July 3, 2022
Frank's Rust String Class	31	5980	January 12, 2023
Nim multithreading faster than Rust help	57	10350	April 9, 2020
How to explain slowdown with Sieve of Eratosthenes versus Nim and C? help	12	3354	January 12, 2023
Some subjective thoughts on Rust	24	6540	January 12, 2023

Rust to Nim: A Comparison

Related topics