Equivalent of C++ std::string.erase()

Suppose we have a text string and we've found the character position of a word, after a delimiter such as a space, as well as the character length of the word, i.e., to the next delimiter, and we wanted to remove that word. In C++, given text, beg and len as the given variables, we could do that with:

text.erase(beg, len);

Looking at Rust's std::string, I came up with the following "equivalent" code:

text.replace_range(beg..(beg + len), "");

However, it looks "hacky" to me. Is there a better approach using just Rust and its std library?

You might be looking for String::drain

1 Like

I did look at drain but that returns an iterator, which then requires a collect() and some variable to accept the result, which would then have to be ignored. To me that seemed more convoluted than the replace_range solution.

You don't have to collect or even explicitly ignore the returned iterator.

...but if you don't want to use ranges, you could write your own method (via an extension trait) that takes begin, len instead.

1 Like

Nice.
Ranges are ok. I didn't realize that you could ignore the return value. It seemed to me that Rust insisted on "dotting the i's and crossing the t's" since it complains about unused variables or mutable variables that are not mutated. Is there some place where the specifics of this particular "allow to ignore" caes are discussed?

To be honest, I had to check if you could ignore it without warning or not. The main mechanism for such warnings is the must_use attribute. I'm not aware of any maintained list. As far as I know it's a case-by-case, organic process.

Allow to ignore is the default. Warnings when a return value is ignored only occur if the function or return type is explicitly annotated with #[must_use].

One more thing worth pointing out is that in Rust strings (str and String types) must be utf-8, but all indexing is done in bytes. So you need to be careful to not slice inside multi-byte character, because this will panic at runtime. For example following program:

fn main() {
    let greeting = "cześć";
    println!("{}", &greeting[0..4]);
}

will panic with a message:

thread 'main' panicked at src/main.rs:3:29:
byte index 4 is not a char boundary; it is inside 'ś' (bytes 3..5) of `cześć`
1 Like

I see. #[must_use] is the equivalent of [[nodiscard]] in C++.

2 Likes

Yes, I was well aware of the UTF-8 strings. That's why in my original question I used "character position" (and "character length") rather than "byte position" or "position", even though my C++ example assumes C++ char (byte) strings and the erase() would remove characters at byte positions for a given byte length.

There are two lints that can get triggered when a value returned from a function is not "used": unused_must_use and unused_results. The former is set to warn by default and only applies when the function that is called or the type returned from the function has the #[must_use] attribute. The latter is set to allow and is triggered by almost any function that doesn't return types like () and ! (and perhaps a few others). I set both to deny because I'm weird like that. My preferred way to suppress them is by assigning them to _ (without the let binding) when they don't have a destructor otherwise I drop them. You could of course allow/expect them on a function-by-function basis as well.

Fun fact: _ = ... (without let) was implemented as part of destructuring assignments, and thus unavailable before Rust 1.59. In case you wondered why let _ = ... is more common.

3 Likes

Ah, that was quite some time ago. I'm sure I used to do that but don't remember. I'm pathological with the rustc and clippy lints I deny though, so lints like clippy::let_underscore_typed and clippy::let_underscore_must_use would have long ago fired and forced me to change from let _ to _.

Summary

In case one is wondering how pathological I am with most of my packages:

[lints.rust]
# Lints that are commented out are only on nightly unless otherwise stated.
# Many lints are not part of groups; thus this should be updated as more
# lints are added that are not part of any group.
ambiguous_negative_literals = { level = "deny", priority = -1 }
closure_returning_async_block = { level = "deny", priority = -1 }
deprecated_safe = { level = "deny", priority = -1 }
deref_into_dyn_supertrait = { level = "deny", priority = -1 }
ffi_unwind_calls = { level = "deny", priority = -1 }
future_incompatible = { level = "deny", priority = -1 }
#fuzzy_provenance_casts = { level = "deny", priority = -1 }
impl_trait_redundant_captures = { level = "deny", priority = -1 }
keyword_idents = { level = "deny", priority = -1 }
let_underscore = { level = "deny", priority = -1 }
# Bug in how `linker_messages` works on macOS when testing is done.
# See [this issue](https://github.com/rust-lang/rust/issues/136096)
# for more info. Once fixed, this should be re-enabled.
#linker_messages = { level = "deny", priority = -1 }
#lossy_provenance_casts = { level = "deny", priority = -1 }
macro_use_extern_crate = { level = "deny", priority = -1 }
meta_variable_misuse = { level = "deny", priority = -1 }
missing_copy_implementations = { level = "deny", priority = -1 }
missing_debug_implementations = { level = "deny", priority = -1 }
missing_docs = { level = "deny", priority = -1 }
#multiple_supertrait_upcastable = { level = "deny", priority = -1 }
#must_not_suspend = { level = "deny", priority = -1 }
non_ascii_idents = { level = "deny", priority = -1 }
#non_exhaustive_omitted_patterns = { level = "deny", priority = -1 }
nonstandard_style = { level = "deny", priority = -1 }
redundant_imports = { level = "deny", priority = -1 }
redundant_lifetimes = { level = "deny", priority = -1 }
refining_impl_trait = { level = "deny", priority = -1 }
rust_2018_compatibility = { level = "deny", priority = -1 }
rust_2018_idioms = { level = "deny", priority = -1 }
rust_2021_compatibility = { level = "deny", priority = -1 }
rust_2024_compatibility = { level = "deny", priority = -1 }
single_use_lifetimes = { level = "deny", priority = -1 }
#supertrait_item_shadowing_definition = { level = "deny", priority = -1 }
trivial_casts = { level = "deny", priority = -1 }
trivial_numeric_casts = { level = "deny", priority = -1 }
unit_bindings = { level = "deny", priority = -1 }
unnameable_types = { level = "deny", priority = -1 }
#unqualified_local_imports = { level = "deny", priority = -1 }
unreachable_pub = { level = "deny", priority = -1 }
unsafe_code = { level = "deny", priority = -1 }
unstable_features = { level = "deny", priority = -1 }
unused = { level = "deny", priority = -1 }
unused_crate_dependencies = { level = "deny", priority = -1 }
unused_import_braces = { level = "deny", priority = -1 }
unused_lifetimes = { level = "deny", priority = -1 }
unused_qualifications = { level = "deny", priority = -1 }
unused_results = { level = "deny", priority = -1 }
variant_size_differences = { level = "deny", priority = -1 }
warnings = { level = "deny", priority = -1 }

[lints.clippy]
all = { level = "deny", priority = -1 }
cargo = { level = "deny", priority = -1 }
complexity = { level = "deny", priority = -1 }
correctness = { level = "deny", priority = -1 }
nursery = { level = "deny", priority = -1 }
pedantic = { level = "deny", priority = -1 }
perf = { level = "deny", priority = -1 }
restriction = { level = "deny", priority = -1 }
style = { level = "deny", priority = -1 }
suspicious = { level = "deny", priority = -1 }
# Noisy, opinionated, and likely don't prevent bugs or improve APIs.
arbitrary_source_item_ordering = "allow"
blanket_clippy_restriction_lints = "allow"
exhaustive_enums = "allow"
exhaustive_structs = "allow"
implicit_return = "allow"
min_ident_chars = "allow"
missing_trait_methods = "allow"
module_name_repetitions = "allow"
option_option = "allow"
pub_use = "allow"
pub_with_shorthand = "allow"
question_mark_used = "allow"
redundant_pub_crate = "allow"
ref_patterns = "allow"
return_and_then = "allow"
self_named_module_files = "allow"
single_call_fn = "allow"
single_char_lifetime_names = "allow"
unseparated_literal_suffix = "allow"

I also believe that insertion of let _ = is actively suggested by the compiler, which might help keeping it a popular choice.

3 Likes

Indeed; although to be clear, the compiler doesn't suggest that when _ is used. If one wants to avoid ignoring an "expensive" Drop impl, then one must upgrade the allow-by-default lint let_underscore_drop which will cause the compiler to inform you to explicitly drop the value. If one has done that or the value returned doesn't have a destructor nor causes the deny-by-default lint let_underscore_lock to fire, then one must upgrade the allow-by-default and part of the restriction Clippy group lint clippy::let_underscore_untyped which will instruct one to explicitly define a type to the let _ variable. I forget how I came to the conclusion that it's cleaner to just assign such a value to _ over let _: T. I'm guessing I read some code that did that and preferred the brevity of it while still appeasing all my "annoying" lints.

I think I started around 1.3x.. ah, those were the days.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.