What are smart pointers? (Part 2 with definition that compiles)

Continuing the discussion from Deref trait implementation:

@CAD97, sorry for the wrong link. I corrected it; I meant this post of you:

I'm still not sure if it's a good idea to think in classic terms of "pointer" being a memory pointer (to the heap), even though it may contain extra data. Maybe it's least confusing to people who come from C or C++, but perhaps a generalized understanding has some advantages too (and might match the book and documentation better).

The ExtraMetadata would be consistent with Vec and String then. Yet Cow wouldn't be a smart pointer according to your refined definition, even though documentation claims it is:

Enum std::borrow::Cow

pub enum Cow<'a, B>
    where B: 'a + ToOwned + ?Sized,
{
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned),
}

A clone-on-write smart pointer.

Of course, the book and documentation (in several places) might be changed too, to reflect a narrower definition of smart pointers (to avoid confusion for people coming from C or C++).

But this topic could likely fill a whole new thread (and/or issue reports like #91004, pull-requests, and decisions to be made to get to a consistent state), I guess. And even then, opinions/preferences might still differ as up to some extent it's perhaps also a matter of taste and depend on the audience which wording ("smart pointer" or a different terminology) is to be used for which concept (the broader or narrower unerstanding of it).

That said, I like your refined version, and I guess it could be a good (and clear) definition to agree on a consistent use of the term "smart pointer" perhaps.

I decided to make this reply a new thread, in case someone wants to comment further on this issue. I also modified your definition such that it compiles. That turned out to be a harder exercise for me than I thought :sweat_smile:. (And not sure if my impls for Arc<T> and Vec<T> are correct.) It also was a possibility to test the new features vec_into_raw_parts (tracking issue #65816) and slice_ptr_len (tracking issue #71146), of which the former somewhat mismatches your trait definition as your trait would require *mut [T] and Vec::into_raw_parts gives you a (*mut T, usize, …); same as Vec::from_raw_parts expects a *mut T instead of *mut [T] as first argument.

//#![feature(vec_into_raw_parts)]
//#![feature(slice_ptr_len)]

use std::ops::Deref;

unsafe trait SmartPointer: Deref {
    type ExtraMetadata;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self;
}

use std::sync::Arc;

unsafe impl<T> SmartPointer for Arc<T> {
    type ExtraMetadata = ();
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        (Arc::into_raw(this) as *mut <Self as Deref>::Target, ())
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let () = extra_meta;
        Arc::from_raw(ptr)
    }
}

unsafe impl<T> SmartPointer for Vec<T> {
    type ExtraMetadata = usize;
    fn into_raw_parts(mut this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        let capacity: usize = this.capacity();
        let ptr: *mut [T] = &mut *this;
        // Or alternatively using #![feature(vec_into_raw_parts)]:
        //let (ptr, len, capacity) = this.into_raw_parts();
        //// We need to convert `*mut T` into a wide pointer of type `*mut [T]`:
        //let ptr: *mut [T] = unsafe { std::slice::from_raw_parts_mut(ptr, len) };
        (ptr, capacity)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let capacity = extra_meta;
        let slice: &mut [T] = &mut *ptr;
        Vec::from_raw_parts(slice.as_mut_ptr(), slice.len(), capacity)
        // Or alternatively using #![feature(slice_ptr_len)]:
        //Vec::from_raw_parts(ptr as *mut T, ptr.len(), capacity)
    }
}

(Playground) Edit: There is a mistake in the code, see the correction below.

P.S.: I don't think the trait needs to be unsafe, right? Wouldn't it be sufficient to mark from_raw_parts as being unsafe?

1 Like

Vec surely isn't: it drops the by-value this right before returning its pointer to the caller.

1 Like

Would this fix it?

         let ptr: *mut [T] = &mut *this;
+        std::mem::forget(this);

(Playground)

Yep, that should suffice.

1 Like

I really don't buy that definition of "smart pointer". By that definition pretty much every structure in my programs is a smart pointer, after all they typically own some data and allow manipulation of it. Being a pointer is not the primary purpose, even though they have pointers internally. The primary purpose of Vec and String is to implement an abstraction of a sequence of things, with various properties. They are "containers". The primary purpose of a smart pointer is to reference some data not contain data.

The archetypal smart pointer is a reference counting pointer that deallocates the thing it points at when the last instance of it deallocated. There can be thousands of instances, clones, of reference counted pointers all pointing at the same thing, which owns the memory?

So owning memory and allowing manipulation are not defining features of many things we call "smart pointers".

Maybe the term "smart pointer" has lost its usefulness outside of C++.

3 Likes

This is generally my view, as discussions that involve “smart pointers” tend to devolve into quibbling about the definition instead of the actual pros and cons of any given solution.

As far as the Deref<Target=T> trait goes, it seems like there’s two distinct situations that it gets used for:

  • Proxy types, which transparently provide an &’short T in situations that don’t play nicely with Rust’s static lifetime analysis (Cell::Ref, Cow, Rc, etc…)
  • Owned unsized types¹, which provide additional functionality beyond what is possible behind a general reference type (Vec, String, PathBuf, etc…)

¹ All of the examples I can think of are slices, but I suppose the same pattern could be used for trait objects.

4 Likes

There's a third (or maybe second-and-a-halveth) category: proxy types of which the purpose is not merely lifetime wrangling, but which actually do something useful in themselves. Often these are RAII objects that restore some invariant in their destructor, thus allowing for a more straightforward API instead of callbacks. Two examples are:

  • database transaction objects;
  • a guard object I recently wrote for a compiler that allows mutating part of the IR and then propagating this mutation to all copies of the IR node within the typing context, ensuring consistency (this is required because for various reasons, I need to keep around copies of some IR nodes, e.g. forward and reverse mappings between a resolved type and its full name/path both contain a copy of the type IR node).
1 Like

I can see the appeal for defining a "smart pointer" in terms of traits, but I think it's going down the wrong path. The answer to "what are smart pointers?" is a human one and not something that fits into a rigorous, self-consistent system like Rust's type system[1].

Instead, it depends on how people expect it to work and the different circumstances they expect it to "make sense". That's why you keep running into inconsistencies like how Arc<T> and Vec<T> fit your SmartPointer trait, but Cow<T> doesn't.

After all, if you look at ExtraMetadata, into_raw_parts(), and from_raw_parts() objectively, they don't say much about what dereferencing and converting to/from raw parts mean. It's only when you add the human that has read thousands of lines of code using "smart pointers" into the loop that you start to get a feel for what they are. Even then, you'll get ambiguities and inconsistencies because humans care about the communication of information and getting things done, not rigorous consistency.

That's why you see things like memory models bolted onto a language after it's already been in use for some time, and those models always have random edge cases or become so convoluted that you need a PhD in type theory to understand them.

Alas, that's probably not the answer you were looking for.


  1. I feel like you could draw some parallels to Gödel's incompleteness theorems here. ↩︎

4 Likes

OK, I got it.

A "dumb pointer", otherwise known as a "pointer", is nothing more than an address of a memory location. Said memory location may or may not contain anything valid or useful.

A "smart pointer" is anything else that happens to contain within its bowels a pointer.

What about making it even more vague saying it's "something that can give you a reference to something else"?

One example would be in string interning. You could imagine having an InternedString which is just a newtype around a u32, and in the Deref implementation it'll give you a reference to the corresponding string from a global string cache.

I think most people would call InternedString a smart pointer, but it doesn't "contain" any pointers internally.

2 Likes

The way I think of the term "smart pointer" is:

  • stores a pointer to another type T that lives elsewhere
  • operations on T delegate to T
  • uses T's public interface to automate some behavior (typically drop)

So that's things like Box, Rc, Arc.

In that sense, Vec is not a smart pointer to a slice, because it doesn't just delegate to a slice. It provides more functionality on the collection than what a slice can do. Box<[T]> is a smart pointer to a slice.

That's basically how wikipedia defines smart pointers.

In C++ world I have never seen anybody refer to std::vector or std::string or std::optional as "smart pointers". Those are "collections" or "wrappers".

Well, I was making my "understanding", in jest, based on the definition implied in Chapter 15 of The Book.

What is the essence of a pointer?

I suggest it is something that refers to another thing "over there". Rather than being the thing you want itself. A road sign pointing toward a town is not the town itself.

So Vec and String are not even pointers let alone smart pointers. They are the actual things you want, not references to them. Ignoring how they are implemented under the hood.

1 Like

On the contrary, it can be (so long as Owned is); set ExtraMetadata = Option<Owned::ExtraMetadata>.

Borrowed(&T) <=> (*mut T, None)
Owned(T::Owned) <=> (*mut T, Some(T::Owned::ExtraMetadata)

I agree that this is an "exposition only concept" though. The more general definition means that basically 100% of Deref types which deref an indirection can be considered a "smart pointer".

The stricter definition is this perhaps more useful, as it's a "pure" smart pointer with no other state.

Imho, the term "smart pointer" was useful back when RAII was first conceptualized, but for Rust the concept isn't really a useful concept anymore.

"Smart pointer" at it's core is just an owning raw pointer. A raw pointer plus RAII cleanup. That's the concept that actually matters: is a type "owning" or "borrowing".

Oh, and the use of "smart pointer" in "only implement deref for smart pointers" isn't talking about indirection (and thus why I think it's been removed from the docs? or at least is on slow track to be removed); it's referencing that Deref should be implemented for containers/wrappers that behave as-if they are a reference to the pointee.

2 Likes

The libs team has agreed in principle to rewriting that section of the docs without the “smart pointer” jargon, but is waiting for someone to propose a concrete draft before making a final decision.

I don't really understand what this means, seems like a tautology. Doesn't everything that implements Deref automatically behave "as if" it were a reference to the target type?

I think Deref should be implemented for types that want to delegate behavior to some target type. I don't think "reference" is the right wording, because a wrapper around i32 is not really a "reference" to an i32, it just owns the i32.

Not sure what you mean with "which deref an indirection".

I think it's possible to find a compromise here.

Consider that (formally) you can implement SmartPointer for any type T: Deref:

unsafe impl<T> SmartPointer for T
where
    T: Deref,
{
    type ExtraMetadata = Self;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        (&*this as *const _ as *mut _, this)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        extra_meta
    }
}

(Playground)

Though this would cause into_raw_parts().0 pointing to invalid memory. I guess a reasonable restriction would be that the returned pointer must be dereferenceable as long as the SmartPointer hasn't been restored yet (which couldn't work with Cow).

Furthermore, restoring should always work (which also doesn't work with Cow::Borrowed because the reference may have a non-static lifetime.

With these restrictions, I think Cow fails the (new) definition of a "smart pointer".

Maybe the term "smart pointer" is a bit misleading (edit: for the sake of Deref), and a better term would be "generalized reference" or "reference-like types".

Not totally:

trait Foo {}
impl Foo for &str {}

fn takes_str_ref(_arg: &str) {}
fn foo<T: Foo>(_arg: T) {}

fn main() {
    takes_str_ref("Hello World!");
    foo("Hello World!");
    takes_str_ref(&String::from("Boo!"));
    //foo(&String::from("Boo!"));
}

(Playground)

Regarding the tautology: I think it's possible to define "smart pointer" (or "reference-like type" to avoid confusion) somehow and say that Deref can or should be implemented for those; or you could simply describe what Deref does and say that anything that implements Deref is a "reference-like" type by definition. You just can't do both as then you'd have a circular definition.

I don't think @CAD97 meant "behave exactly like a reference in every scenario". If he did, then Rc also doesn't satisfy this criterion:

    let a: Rc<str> = "Boo!".into();
    // doesn't compile:
    // foo(a); 

Why not just say "is Deref" instead of using such vague terminology? We say "is Copy" rather than saying "primitive-like" or something.

1 Like

That is because in the originating thread it was disputed whether deref(erencing) is the right terminology for Deref:

Deref is sort-of an abbreviation for "dereferencing". Its name isn't really accidentally "Deref". Thus I think it's reasonable to call types that implement Deref "reference-like", because we can dereference them (or at least dereference references to them to references to the referee).

I meanwhile find the terminology "reference-like" better suited than "pointer-like" or "smart pointer" (unless there is really a pointer involed).

What I mean is this:

/* … */

unsafe trait SmartPointer: Deref {
    type ExtraMetadata;
    // Sound implementations must return a pointer that's dereferenceable until `from_raw_parts` is invoked:
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self;
}

/* … */

// This should be considered unsound:
unsafe impl<'a, T> SmartPointer for Cow<'a, T>
where
    T: ?Sized + ToOwned,
{
    type ExtraMetadata = Self;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        (&*this as *const _ as *mut _, this)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        extra_meta
    }
}

fn test_value<T>(smart_pointer: T)
where
    T: SmartPointer + Debug,
    <T as Deref>::Target: Debug,
{
    let (ptr, meta) = SmartPointer::into_raw_parts(smart_pointer);
    let inspect: &<T as Deref>::Target = unsafe { &*ptr };
    println!("Pointer dereference: {inspect:?}");
    let restored: T = unsafe { SmartPointer::from_raw_parts(ptr, meta) };
    println!("Restored smart pointer: {restored:?}");
}

fn main() {
    let arc = Arc::new(5);
    test_value(arc);
    let vec = Arc::new(vec![1, 2, 3]);
    test_value(vec);
    // This should be UB (but apparently it's difficult to show?):
    let cow: Cow<str> = Cow::Owned("Hello World!".to_string());
    test_value(cow);
}

(Playground)

Now it's also clear why SmartPointer needs to be an unsafe trait, because into_raw_parts must fulfil some extra property [1] (that cannot be enforced by the compiler).


  1. i.e. "sound implementations must return a pointer that's dereferenceable until from_raw_parts is invoked" ↩︎

Um, dereferencing a raw pointer is always unsafe, and so should be from_raw_parts(), therefore it's on the caller to ensure the pointer they provide/dereference is valid. There's no soundness issue even if SmartPointer is a safe trait.