Well, it's been a while, but I finally got around to implementing NaN-tagging. Here's a simplified version of what I did for those who may want to know later.
We'll store all the datatypes we want to pack in a Data
enum:
pub enum Data {
Real(f64),
String(String),
True,
False,
// snip ...
}
The IEEE standard for floating pointing numbers describes a 64-bit long data layout:
// S is sign, Q is quiet NaN flag, I is an intel flag
SExponent---QIMantissa------------------------------------------
When a f64 is a quiet Nan, only the following bits are set:
-1111111111111--------------------------------------------------
Look at all that free space! We can pack a pointer and some data in here with room to breathe.
// the first bit is a pointer flag, T is a tag
// more tag bits can be used for more variants - we only have true and false, so only 1 bit is needed
11111111111111---Pointer----------------------------------------
01111111111111-------------------------------------------------T
Let's define some constants. We'll use these to mask out specific data:
const QNAN: u64 = 0x7ffc_0000_0000_0000;
const P_FLAG: u64 = 0x8000_0000_0000_0000;
const P_MASK: u64 = 0x0000_FFFF_FFFF_FFFF;
const F_FLAG: u64 = 0x0000_0000_0000_0000;
const T_FLAG: u64 = 0x0000_0000_0000_0001;
Now, we'll implement the tag. A tag is a u64:
pub struct Tagged(u64);
We just need to write some methods to wrap the data into a tag, and unwrap a tag. We perform some bit-twiddling to put the right bits in the right places.
impl Tagged {
pub fn from(data: Data) -> Self {
match data {
Data::Real(f) => Tagged(f.to_bits()),
Data::False => Tagged(QNAN | F_FLAG),
Data::True => Tagged(QNAN | T_FLAG),
// box anything else, pack the pointer
other => Tagged(QNAN | P_FLAG | (
P_MASK & (Box::into_raw(Box::new(other))) as u64
)),
}
}
// cont ...
If we want to access the data packed into a tag, we need to check we have the right value, then reconstruct the data. Note that this isn't actually dereferencing as per the Deref
trait, because I can't figure out how to make this function return &Data
without the compiler complaining.
// ... cont
pub fn deref(&self) -> Data {
let Tagged(bits) = self;
match *bits {
n if (n & QNAN) != QNAN => Data::Real(f64::from_bits(n)),
f if f == (QNAN | F_FLAG) => Data::False,
t if t == (QNAN | T_FLAG) => Data::True,
// reconstruct box, follow pointer
p if (p & P_FLAG) == P_FLAG => {
let pointer = (bits & P_MASK) as *mut Data;
unsafe { *Box::from_raw(pointer) }
},
_ => unreachable!("Corrupt tag data"),
}
}
}
If we want, we can implement some methods on Tagged
to prevent extra data conversion for simple operations, like and-ing two tagged pieces of data that we know to be booleans.
Of course, this method comes with a few caveats that I haven't addressed in the above implementation here out of brevity, such as:
- Assuring the pointer only uses the bottom 48 bits. Ideally, you'd specify a layout to guarantee this.
- Implementing simple operations on tagged data.
- Dropping memory a tag points to when it is dropped.
- etc.
By the way, here's how you'd use it:
let string_a = Data::String("I lost the game".to_string());
let tagged = Tagged::from(a_string);
if let Data::String(string_b) = tagged.deref() {
assert_eq!(string_a, string_b);
}
That's all. If you have a question, or something's terribly wrong with my implementation*, please reach out to me.
*From the unit tests I've ran, it seems to be working as intended.