Rust beginner notes & questions

Please convince yourself that an implementation of Read2 for std::fs::File cannot have fewer copies than what the Read trait does. The type signatures alone already tell me that, and indeed, they directly imply that any implementation of Read2 for std::fs::File that uses standard read calls must necessarily maintain an internal buffer. This is what std::io::BufReader does for any implementation of Read, but Read does not require the use of an internal buffer and is thus more flexible.

You fundamentally misunderstand the purpose of the buffer rolling. It has to do with line oriented searching, context handling and limiting the use of heap memory. In ripgrep's implementation, there are exactly as many copies as would be done if it used your Read2 implementation. Moreover, ripgrep's implementation permits the amortization of allocation, which is critical, and it's not obvious to me how that would be done with your Read2 trait.

More generally, your Read2 trait assumes the use of an internal buffer. ripgrep's searching requires not only the use of an internal buffer, but one that can be extended dynamically based on the size of the largest line. It's not obvious to me that an implementation of Read2 would support such a use case.

Finally, that there are two implementations of ripgrep's search is a failing of mine, not of the Read trait. The implementations have been unified in my dev branch as part of factoring more of ripgrep's internals out into libraries, and I didn't need Read2 to do it. Moreover, in my dev branch, the library supports an important new feature: the ability to limit or control the amount of heap allocation being done. If I used Read2, then I don't see how that could be implemented using the interface you've provided. It can be done with the Read trait however because the Read trait makes far fewer assumptions than Read2.

This is once again false. I explained why above. This is why I suggested that we stopped communicating, because it isn't productive. The regex engine itself requires UTF-8, so regardless of what an implementation of Read2 yields---whether its u16 or otherwise---some explicit transcoding step is required.

6 Likes