Which one of these should we keep?

Soni · April 15, 2022, 8:39pm

We have these:

    /// Strips attributes from text. See [`Strip`] for which attributes can
    /// be stripped.
    ///
    /// # Examples
    ///
    /// ```no_run
    /// use hexchat_unsafe_plugin::{PluginHandle, Strip};
    ///
    /// /// Removes colors
    /// fn strip_colors(ph: &PluginHandle<'_>, s: &str) -> String {
    ///     ph.strip(s, Strip::new().colors(true))
    /// }
    /// ```
    #[cfg(feature = "nul_in_strip")]
    pub fn strip(&self, s: &str, strip: Strip) -> String {
        let ph = self.ph;
        // ironically the single API where we can pass embedded NULs.
        // we also don't need to worry about terminating it with NUL.
        let mut out = Vec::with_capacity(s.len());
        let in_ptr = s.as_ptr() as *const _;
        let in_len = s.len().try_into().unwrap();
        let flags = strip.flags();
        // NOTE: avoid panicking from here.
        let stripped = unsafe {
            ((*ph.ph).hexchat_strip)(ph.ph, in_ptr, in_len, flags)
        };
        // tho annoyingly we don't know the out len, so we need to count NULs.
        let in_nuls = s.as_bytes().into_iter().filter(|&&x| x == 0).count();
        let mut out_len = 0;
        for _ in 0..=in_nuls {
            while unsafe { *stripped.add(out_len) } != 0 {
                out_len += 1;
            }
            out_len += 1;
        }
        out.extend_from_slice(unsafe {
            // take out the extra NUL at the end.
            ::std::slice::from_raw_parts(stripped as *const _, out_len - 1)
        });
        unsafe {
            ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
        }
        // we can panic again here, tho this should never panic.
        String::from_utf8(out).unwrap()
    }

    /// Strips attributes from text. See [`Strip`] for which attributes can
    /// be stripped.
    ///
    /// # Examples
    ///
    /// ```no_run
    /// use hexchat_unsafe_plugin::{PluginHandle, Strip};
    ///
    /// /// Removes colors
    /// fn strip_colors(ph: &PluginHandle<'_>, s: &str) -> String {
    ///     ph.strip(s, Strip::new().colors(true))
    /// }
    /// ```
    #[cfg(not(feature = "nul_in_strip"))]
    pub fn strip(&self, s: &str, strip: Strip) -> String {
        let ph = self.ph;
        let s = CString::new(s).unwrap();
        let flags = strip.flags();
        // NOTE: avoid panicking from here.
        let stripped = unsafe {
            ((*ph.ph).hexchat_strip)(ph.ph, s.as_ptr(), -1, flags)
        };
        let mut s = s.into_string().unwrap_or_else(|_| unsafe {
            std::hint::unreachable_unchecked()
        });
        if let Ok(stripped) = unsafe { CStr::from_ptr(stripped) }.to_str() {
            s.clear();
            s.push_str(stripped);
        } else {
            unsafe {
                ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
            }
            unreachable!("broken hexchat");
        }
        unsafe {
            ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
        }
        s
    }

Which one should we keep?

alice · April 15, 2022, 9:04pm

You have two behaviors:

Handle null bytes by counting them.
Panic if the string has a null byte.

Currently, the above matches your implementations. However, regardless of which behavior you pick, I think you should be using the first implementation rather than the second one, because if you tweak the first version to panic on null bytes, you both avoid cloning the string data, and I'm pretty sure the result will be simpler than your current second function.

I can't tell you which of the two behaviors are correct for your application. If null bytes are important, then you should go for your first implementation of course, but it sounds like they aren't that important.

Soni · April 15, 2022, 9:47pm

Yeah we mean, the first implementation is cleaner to us, but we can't really take advantage of any std capabilities like CString or CStr here. On the other hand the second implementation does let us use those, but then we get that extra allocation. At the same time, none of this code is performance-critical, which is why we immediately turn the hexchat-allocated *const c_char into a String (instead of trying to roll our own HexchatBox<'ph, str>).

But it's all just... awkward. If only there was a way to get the output length out of this API function... Also, none of the other API functions handle NULs.

alice · April 15, 2022, 9:54pm

You could use CStr to get the output length like this:

pub fn strip(&self, s: &str, strip: Strip) -> String {
    // check for null bytes in s here

    let ph = self.ph;

    let in_ptr = s.as_ptr() as *const _;
    let in_len = s.len().try_into().unwrap();
    let flags = strip.flags();

    let stripped = unsafe {
        ((*ph.ph).hexchat_strip)(ph.ph, in_ptr, in_len, flags)
    };

    let out = CStr::from_ptr(stripped).to_bytes().to_vec();

    unsafe {
        ((*ph.ph).hexchat_free)(ph.ph, stripped as *const _);
    }

    String::from_utf8(out).unwrap()
}

Soni · April 15, 2022, 9:56pm

Oh yeah, we do go out of our way to reuse the CString allocation in the second one. So uh.

... Clearly that's not very clear, huh.

alice · April 15, 2022, 9:58pm

Ah yes, I missed that. Regardless, I still prefer the snippet I just posted.

system · July 14, 2022, 9:59pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Code review request for hexchat-unsafe-plugin (v2.2.0) code review	20	776	July 30, 2022
Liberal application of #[must_use] help	21	1132	March 4, 2023
My first Rust project: the Ripasso password manager announcements	4	1354	January 12, 2023
Blog: Clippy vs. Rust	5	2253	January 12, 2023
Relevance of `starts_with` and `strip_prefix` help	5	373	August 27, 2022

Which one of these should we keep?

Related Topics