In this example you are sub-slicing, which indeed does not play to Arc
's natural advantage:
To detail that owning_ref
idea (we'll need the indices pairs rather than &str
references, and since there does not seem to be one easily available, I have hand-rolled my own .split(" ")
iterator that yields indices pairs instead):
use std::sync::Arc;
use owning_ref::ArcRef;
fn yield_strings (s: impl Into<Arc<str>>)
-> Vec<ArcRef<str>>
{
let s: Arc<str> = s.into(); // ownership of the string is _local_ here
let s: ArcRef<str> = s.into();
let split_space_idxs =
s .char_indices()
.filter(|&(_, c)| c == ' ').map(|(i, _)| i)
.chain(Some(s.len()))
.map({
let mut prev = 0;
move |i| (
::core::mem::replace(&mut prev, i + 1),
i,
)
})
;
let mut ret = vec![];
let mut yield_ = |item| ret.push(item);
for (start, end) in split_space_idxs {
if end == start { continue; } // Optional: skip emtpy strings
let s = s.clone(); // inc the refcount to give ownership to the vec (elements)
let sub = s.map(|it| &it[start .. end]);
yield_(sub);
}
ret // even though the local `s` is dropped here, the vec has ownership
}
fn main ()
{
let strs = yield_strings("foo bar");
assert_eq!(dbg!(&*strs[0]), "foo");
assert_eq!(dbg!(&*strs[1]), "bar");
assert!(strs.get(2).is_none());
}
That being said, the above example will lead to up to n + 1
owners, where n
is the number of words in the string. That means n + 1
incrementing and decrementing atomic counters, which can have a non-negligible performance impact ( technically using Rc
instead of Arc
would make sense, here; by wrapping the owned RcRef
in an Unshare
*-kind of wrapper...)
* Unshare
pub use lib::Unshared;
mod lib {
pub
struct Unshared<T> /* = */ (
T,
);
impl<T> From<T> for Unshared<T> { ... }
impl<T> Into<T> for Unshared<T> { ... }
impl<T> Unshared<T> {
pub
fn get_mut (self: &'_ mut Self)
-> &'_ mut T
{
&mut self.0
}
}
unsafe // Safety: no `&Unshared` API whatsoever
impl<T> Sync for Unshared<T>
{}
}
In which case, now that we have achieved transforming the iteration into one that yields pairs of indices rather than references, solving the initial problem at hand becomes quite trivial, if only just a tad unergonomic:
- use std::sync::Arc;
- use owning_ref::ArcRef;
-
- fn yield_strings (s: impl Into<Arc<str>>)
- -> Vec<ArcRef<str>>
+ fn yield_strings (s: impl Into<String>)
+ -> (String, Vec<(usize, usize)>)
{
- let s: Arc<str> = s.into();
- let s: ArcRef<str> = s.into();
+ // Reference-counting is not that needed anymore.
+ let s: String = s.into();
let split_space_idxs = ... ;
...
for (start, end) in split_space_idxs {
if end == start { continue; } // Optional: skip emtpy strings
- let s = s.clone();
- let sub = s.map(|it| &it[start .. end]);
- yield_(sub);
+ yield_((start, end));
}
- ret
+ (s, ret)
}
Usage:
fn main ()
{
let (s, idxs) = yield_strings("foo bar");
let get_str = |i| {
let (start, end) = idxs[i];
&s[start .. end]
};
assert_eq!(dbg!(get_str(0)), "foo");
assert_eq!(dbg!(get_str(1)), "bar");
}
Of course, now the issue is that the usage is a bit ugly, and worse, error-prone! (what if they mutate s
before indexing?).
The trick then is to inline that op directly in the called function / returned value:
fn main ()
{
let strs = yield_strings("foo bar");
assert_eq!(strs.len(), 2);
assert_eq!(dbg!(&strs[0]), "foo");
assert_eq!(dbg!(&strs[1]), "bar");
}