What are smart pointers? (Part 2 with definition that compiles)

OK, I got it.

A "dumb pointer", otherwise known as a "pointer", is nothing more than an address of a memory location. Said memory location may or may not contain anything valid or useful.

A "smart pointer" is anything else that happens to contain within its bowels a pointer.

What about making it even more vague saying it's "something that can give you a reference to something else"?

One example would be in string interning. You could imagine having an InternedString which is just a newtype around a u32, and in the Deref implementation it'll give you a reference to the corresponding string from a global string cache.

I think most people would call InternedString a smart pointer, but it doesn't "contain" any pointers internally.

2 Likes

The way I think of the term "smart pointer" is:

  • stores a pointer to another type T that lives elsewhere
  • operations on T delegate to T
  • uses T's public interface to automate some behavior (typically drop)

So that's things like Box, Rc, Arc.

In that sense, Vec is not a smart pointer to a slice, because it doesn't just delegate to a slice. It provides more functionality on the collection than what a slice can do. Box<[T]> is a smart pointer to a slice.

That's basically how wikipedia defines smart pointers.

In C++ world I have never seen anybody refer to std::vector or std::string or std::optional as "smart pointers". Those are "collections" or "wrappers".

Well, I was making my "understanding", in jest, based on the definition implied in Chapter 15 of The Book.

What is the essence of a pointer?

I suggest it is something that refers to another thing "over there". Rather than being the thing you want itself. A road sign pointing toward a town is not the town itself.

So Vec and String are not even pointers let alone smart pointers. They are the actual things you want, not references to them. Ignoring how they are implemented under the hood.

1 Like

On the contrary, it can be (so long as Owned is); set ExtraMetadata = Option<Owned::ExtraMetadata>.

Borrowed(&T) <=> (*mut T, None)
Owned(T::Owned) <=> (*mut T, Some(T::Owned::ExtraMetadata)

I agree that this is an "exposition only concept" though. The more general definition means that basically 100% of Deref types which deref an indirection can be considered a "smart pointer".

The stricter definition is this perhaps more useful, as it's a "pure" smart pointer with no other state.

Imho, the term "smart pointer" was useful back when RAII was first conceptualized, but for Rust the concept isn't really a useful concept anymore.

"Smart pointer" at it's core is just an owning raw pointer. A raw pointer plus RAII cleanup. That's the concept that actually matters: is a type "owning" or "borrowing".

Oh, and the use of "smart pointer" in "only implement deref for smart pointers" isn't talking about indirection (and thus why I think it's been removed from the docs? or at least is on slow track to be removed); it's referencing that Deref should be implemented for containers/wrappers that behave as-if they are a reference to the pointee.

2 Likes

The libs team has agreed in principle to rewriting that section of the docs without the “smart pointer” jargon, but is waiting for someone to propose a concrete draft before making a final decision.

I don't really understand what this means, seems like a tautology. Doesn't everything that implements Deref automatically behave "as if" it were a reference to the target type?

I think Deref should be implemented for types that want to delegate behavior to some target type. I don't think "reference" is the right wording, because a wrapper around i32 is not really a "reference" to an i32, it just owns the i32.

Not sure what you mean with "which deref an indirection".

I think it's possible to find a compromise here.

Consider that (formally) you can implement SmartPointer for any type T: Deref:

unsafe impl<T> SmartPointer for T
where
    T: Deref,
{
    type ExtraMetadata = Self;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        (&*this as *const _ as *mut _, this)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        extra_meta
    }
}

(Playground)

Though this would cause into_raw_parts().0 pointing to invalid memory. I guess a reasonable restriction would be that the returned pointer must be dereferenceable as long as the SmartPointer hasn't been restored yet (which couldn't work with Cow).

Furthermore, restoring should always work (which also doesn't work with Cow::Borrowed because the reference may have a non-static lifetime.

With these restrictions, I think Cow fails the (new) definition of a "smart pointer".

Maybe the term "smart pointer" is a bit misleading (edit: for the sake of Deref), and a better term would be "generalized reference" or "reference-like types".

Not totally:

trait Foo {}
impl Foo for &str {}

fn takes_str_ref(_arg: &str) {}
fn foo<T: Foo>(_arg: T) {}

fn main() {
    takes_str_ref("Hello World!");
    foo("Hello World!");
    takes_str_ref(&String::from("Boo!"));
    //foo(&String::from("Boo!"));
}

(Playground)

Regarding the tautology: I think it's possible to define "smart pointer" (or "reference-like type" to avoid confusion) somehow and say that Deref can or should be implemented for those; or you could simply describe what Deref does and say that anything that implements Deref is a "reference-like" type by definition. You just can't do both as then you'd have a circular definition.

I don't think @CAD97 meant "behave exactly like a reference in every scenario". If he did, then Rc also doesn't satisfy this criterion:

    let a: Rc<str> = "Boo!".into();
    // doesn't compile:
    // foo(a); 

Why not just say "is Deref" instead of using such vague terminology? We say "is Copy" rather than saying "primitive-like" or something.

1 Like

That is because in the originating thread it was disputed whether deref(erencing) is the right terminology for Deref:

Deref is sort-of an abbreviation for "dereferencing". Its name isn't really accidentally "Deref". Thus I think it's reasonable to call types that implement Deref "reference-like", because we can dereference them (or at least dereference references to them to references to the referee).

I meanwhile find the terminology "reference-like" better suited than "pointer-like" or "smart pointer" (unless there is really a pointer involed).

What I mean is this:

/* … */

unsafe trait SmartPointer: Deref {
    type ExtraMetadata;
    // Sound implementations must return a pointer that's dereferenceable until `from_raw_parts` is invoked:
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self;
}

/* … */

// This should be considered unsound:
unsafe impl<'a, T> SmartPointer for Cow<'a, T>
where
    T: ?Sized + ToOwned,
{
    type ExtraMetadata = Self;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        (&*this as *const _ as *mut _, this)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        extra_meta
    }
}

fn test_value<T>(smart_pointer: T)
where
    T: SmartPointer + Debug,
    <T as Deref>::Target: Debug,
{
    let (ptr, meta) = SmartPointer::into_raw_parts(smart_pointer);
    let inspect: &<T as Deref>::Target = unsafe { &*ptr };
    println!("Pointer dereference: {inspect:?}");
    let restored: T = unsafe { SmartPointer::from_raw_parts(ptr, meta) };
    println!("Restored smart pointer: {restored:?}");
}

fn main() {
    let arc = Arc::new(5);
    test_value(arc);
    let vec = Arc::new(vec![1, 2, 3]);
    test_value(vec);
    // This should be UB (but apparently it's difficult to show?):
    let cow: Cow<str> = Cow::Owned("Hello World!".to_string());
    test_value(cow);
}

(Playground)

Now it's also clear why SmartPointer needs to be an unsafe trait, because into_raw_parts must fulfil some extra property [1] (that cannot be enforced by the compiler).


  1. i.e. "sound implementations must return a pointer that's dereferenceable until from_raw_parts is invoked" ↩︎

Um, dereferencing a raw pointer is always unsafe, and so should be from_raw_parts(), therefore it's on the caller to ensure the pointer they provide/dereference is valid. There's no soundness issue even if SmartPointer is a safe trait.

Of course SmartPointer::from_raw_parts needs to be unsafe, but that isn't what I meant.

SmartPointer::into_raw_parts is not unsafe (and doesn't need to be). But for SmartPointer to be useful, we must give another guarantee:

    // Sound implementations must return a pointer that's dereferenceable until `from_raw_parts` is invoked:
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);

Thus that this is can be sound

fn test_value<T>(smart_pointer: T)
where
    T: SmartPointer + Debug,
    <T as Deref>::Target: Debug,
{
    let (ptr, meta) = SmartPointer::into_raw_parts(smart_pointer);
    let inspect: &<T as Deref>::Target = unsafe { &*ptr };
    println!("Pointer dereference: {inspect:?}");
    let restored: T = unsafe { SmartPointer::from_raw_parts(ptr, meta) };
    println!("Restored smart pointer: {restored:?}");
}

… as long as all unsafe impl SmartPointer for /* … */ provide a proper implementation of into_raw_parts.

Compare with Arc::into_raw, which is also not unsafe. But Arc is a type and not a trait and thus we know that whatever Arc::into_raw produces can be fed back into Arc::from_raw (given the pointer is still valid and has the correct offset from the beginning of the Arc in memory).

When we deal with a trait, however, implementors of SmartPointer::into_raw_parts (which is safe!) could provide a broken implementation. By making the trait (not the method) unsafe, we can demand that sound code must provide a correct implementation of into_raw_parts.

Your "so that this can be sound" code, which importantly is the user of the trait, contains unsafe. That is exactly because you are trying to dereference the raw pointer you got from into_raw_parts() and because you are trying to call from_raw_parts(). Since dereferencing a raw pointer and calling from_raw_parts is always unsafe, there is no way to observe a dangling pointer without using unsafe even if the SmartPointer trait is safe. Thus, it doesn't need to be unsafe in order to be sound.

I think you've got your responsibilities backwards. unsafe code isn't allowed to rely on traits being implemented correctly. It must be paranoid and defensive and anticipate that (at least) 3rd-party non-std code will contain wrong implementations.

1 Like

Consider:

// This implementation is unsound:
unsafe impl<T> SmartPointer for Arc<T> {
    type ExtraMetadata = ();
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        let _ = this;
        (std::ptr::null_mut(), ())
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        // But this isn't unsound:
        panic!();
    }
}

(Playground)

If SmartPointer was not an unsafe trait, then the above implementation couldn't be considered unsound. When someone implements SmartPointer, we must be able to rely on the returned pointer to fulfil certain properties. This cannot be enforced by the compiler. Thus we should mark the trait as unsafe and put up certain rules how into_raw_parts should behave (we don't need to, but if we don't do, our trait isn't very useful).

And indeed, it isn't – as long as you want to do anything with the returned raw pointer that would invoke UB due to it being null, you would have to write unsafe.

I didn't mean that you can't actually create a dangling pointer (that's trivial in safe code anyway, by casting align_of::<T>() to *const/mut T). The point is you can't do anything non-trivial with it without writing unsafe.

Yes, we will always need unsafe to do something with the pointer; but if the trait isn't unsafe too, then it's impossible to soundly use unsafe later.

My idea was that the unsafe should be here:

unsafe trait SmartPointer: Deref {

Thus that this can be sound:

    let (ptr, meta) = SmartPointer::into_raw_parts(smart_pointer);
    let inspect: &<T as Deref>::Target = unsafe { &*ptr };

If the trait wasn't marked unsafe, then we had no way to soundly dereference the returned pointer (unless SmartPointer was a sealed or private trait).

Actually that's wrong. If we can store metadata, it's always possible to temporarily turn something into a raw pointer that's dereferencable:

use std::ops::Deref;

unsafe trait SmartPointer: Deref {
    type ExtraMetadata;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self;
}

unsafe impl<T> SmartPointer for T
where
    T: Deref,
{
    type ExtraMetadata = Box<Self>;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        let boxed = Box::new(this);
        (&**boxed as *const _ as *mut _, boxed)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        *extra_meta
    }
}

(Playground)

I agree now that the strict definition is more useful than the compromise that allows extra metadata. (Edit: Maybe not, see following post.) When using the strict definition, however, it's clear that Deref should also be implemented for some types that are not smart pointers (e.g. String). Following that, the documentation of Deref should (still) be updated.

But isn't there another difference between raw pointers and smart pointers than whether they are "owning" or "borrowing"? Opposed to a raw pointer, a smart pointer can be dereferenced in safe code. Thus I feel like many "smart pointers" should rather be named "smart references". And if we do so, we could just relax the overall definition anyway and allow anything that can be "dereferenced" (which isn't a plain reference) to be a smart reference.

Then the only question that would be remaining is when to implement Deref, i.e. when to make a type dereferencable. And that is (more or less):

However I deviate from @tczajka's view insofar as that I would then call the type dereferenceable (or being a "smart reference", except for the trivial case of &).

Mere wrappers generally shouldn't implement Deref. I think in particular the newtype pattern, for example, shouldn't implement Deref.

I see one way out to provide a more formal definition of "smart pointer" while allowing extra metadata. It would be to make extra demands in regard to implementation details of SmartPointer::into_raw_parts:

use std::ops::Deref;

// UNSAFE: Implementors must ensure that `SmartPointer::into_raw_parts.0`
// can be dereferenced at least until `SmartPointer::from_raw_parts` is
// used.
// Moreover, it should only be implemented if `into_raw_parts` is a cheap
// operation (i.e. doesnt allocate, for example).
unsafe trait SmartPointer: Deref {
    type ExtraMetadata;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata);
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self;
}

You are probably right. It may still help to phrase some parts of something in terms of Rust for a better understanding. The updated definition above contains some "human information" when to implement the trait.

It's not so uncommon. For example AsRef can be implemented for costly conversions, but it shoudn't:

If you need to do a costly conversion it is better to implement From with type &T or write a custom function.

So if we would do the following:

// This is sound, but would violate the semantics of `SmartPointer`:
unsafe impl<T> SmartPointer for T
where
    T: Deref,
{
    type ExtraMetadata = Box<Self>;
    fn into_raw_parts(this: Self) -> (*mut <Self as Deref>::Target, Self::ExtraMetadata) {
        let boxed = Box::new(this); // sound but violating semantics of `SmartPointer`
        (&**boxed as *const _ as *mut _, boxed)
    }
    unsafe fn from_raw_parts(ptr: *mut <Self as Deref>::Target, extra_meta: Self::ExtraMetadata) -> Self {
        let _ = ptr;
        *extra_meta
    }
}

(Playground)

It compiles, but would violate the intended semantics.

We could now say that:

  • A "smart pointer" is something that could implement the unsafe trait SmartPointer (as long as into_raw_parts could be made reasonably cheap; and of course the returned raw pointer is usable until the raw parts are converted back into the smart pointer).
  • A reference-like type is anything that implements Deref.
  • A "smart reference" is a reference-like type which isn't & or &mut.
  • Deref should be implemented when a reference to the type should be treated similarly as a reference to a target type (i.e. when/if we want deref-coercion).
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.