Which one of these should we keep?

We have these:

    /// Strips attributes from text. See [`Strip`] for which attributes can
    /// be stripped.
    ///
    /// # Examples
    ///
    /// ```no_run
    /// use hexchat_unsafe_plugin::{PluginHandle, Strip};
    ///
    /// /// Removes colors
    /// fn strip_colors(ph: &PluginHandle<'_>, s: &str) -> String {
    ///     ph.strip(s, Strip::new().colors(true))
    /// }
    /// ```
    #[cfg(feature = "nul_in_strip")]
    pub fn strip(&self, s: &str, strip: Strip) -> String {
        let ph = self.ph;
        // ironically the single API where we can pass embedded NULs.
        // we also don't need to worry about terminating it with NUL.
        let mut out = Vec::with_capacity(s.len());
        let in_ptr = s.as_ptr() as *const _;
        let in_len = s.len().try_into().unwrap();
        let flags = strip.flags();
        // NOTE: avoid panicking from here.
        let stripped = unsafe {
            ((*ph.ph).hexchat_strip)(ph.ph, in_ptr, in_len, flags)
        };
        // tho annoyingly we don't know the out len, so we need to count NULs.
        let in_nuls = s.as_bytes().into_iter().filter(|&&x| x == 0).count();
        let mut out_len = 0;
        for _ in 0..=in_nuls {
            while unsafe { *stripped.add(out_len) } != 0 {
                out_len += 1;
            }
            out_len += 1;
        }
        out.extend_from_slice(unsafe {
            // take out the extra NUL at the end.
            ::std::slice::from_raw_parts(stripped as *const _, out_len - 1)
        });
        unsafe {
            ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
        }
        // we can panic again here, tho this should never panic.
        String::from_utf8(out).unwrap()
    }

    /// Strips attributes from text. See [`Strip`] for which attributes can
    /// be stripped.
    ///
    /// # Examples
    ///
    /// ```no_run
    /// use hexchat_unsafe_plugin::{PluginHandle, Strip};
    ///
    /// /// Removes colors
    /// fn strip_colors(ph: &PluginHandle<'_>, s: &str) -> String {
    ///     ph.strip(s, Strip::new().colors(true))
    /// }
    /// ```
    #[cfg(not(feature = "nul_in_strip"))]
    pub fn strip(&self, s: &str, strip: Strip) -> String {
        let ph = self.ph;
        let s = CString::new(s).unwrap();
        let flags = strip.flags();
        // NOTE: avoid panicking from here.
        let stripped = unsafe {
            ((*ph.ph).hexchat_strip)(ph.ph, s.as_ptr(), -1, flags)
        };
        let mut s = s.into_string().unwrap_or_else(|_| unsafe {
            std::hint::unreachable_unchecked()
        });
        if let Ok(stripped) = unsafe { CStr::from_ptr(stripped) }.to_str() {
            s.clear();
            s.push_str(stripped);
        } else {
            unsafe {
                ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
            }
            unreachable!("broken hexchat");
        }
        unsafe {
            ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
        }
        s
    }

Which one should we keep?

You have two behaviors:

  1. Handle null bytes by counting them.
  2. Panic if the string has a null byte.

Currently, the above matches your implementations. However, regardless of which behavior you pick, I think you should be using the first implementation rather than the second one, because if you tweak the first version to panic on null bytes, you both avoid cloning the string data, and I'm pretty sure the result will be simpler than your current second function.

I can't tell you which of the two behaviors are correct for your application. If null bytes are important, then you should go for your first implementation of course, but it sounds like they aren't that important.

2 Likes

Yeah we mean, the first implementation is cleaner to us, but we can't really take advantage of any std capabilities like CString or CStr here. On the other hand the second implementation does let us use those, but then we get that extra allocation. At the same time, none of this code is performance-critical, which is why we immediately turn the hexchat-allocated *const c_char into a String (instead of trying to roll our own HexchatBox<'ph, str>).

But it's all just... awkward. If only there was a way to get the output length out of this API function... Also, none of the other API functions handle NULs.

You could use CStr to get the output length like this:

pub fn strip(&self, s: &str, strip: Strip) -> String {
    // check for null bytes in s here

    let ph = self.ph;

    let in_ptr = s.as_ptr() as *const _;
    let in_len = s.len().try_into().unwrap();
    let flags = strip.flags();

    let stripped = unsafe {
        ((*ph.ph).hexchat_strip)(ph.ph, in_ptr, in_len, flags)
    };

    let out = CStr::from_ptr(stripped).to_bytes().to_vec();

    unsafe {
        ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
    }

    String::from_utf8(out).unwrap()
}

Oh yeah, we do go out of our way to reuse the CString allocation in the second one. So uh.

... Clearly that's not very clear, huh.

Ah yes, I missed that. Regardless, I still prefer the snippet I just posted.

1 Like