What should go in the io module documentation?


#1

Hey everyone! Now that the new io has landed, I wanted to request some feedback from yinz.

std::old_io has a bunch of examples at the module level: http://doc.rust-lang.org/nightly/std/old_io/index.html

I’m going to be starting to write API docs for std::io this week. What do you think of these common actions at the module level? Is the list good? Should I use other examples? Are any of these too obscure to go here?

Also, thoughts about what I should document first would be welcome. Thanks!


#2

I think that std::io should provide a lot of high-level documentation along the lines of goals, principles, organization, etc, rather than specific examples per se. I would personally prefer to see the examples on the methods themselves rather than the top-level module docs (although I’m sure that at least one example would be nice!).

Some topics I might expect to be explained are:

  • What Read and Write are
  • Principles of non-lossy methods, cross-platform-ness, and predictability
  • Where to find filesystem and networking functionality
  • Where to find platform-specific functionality
  • Common adapters/types defined in std::io. For example the primitive impls on &[u8] and Vec<u8> should be explained, along with structures like Cursor, Buf{Reader,Writer},io::{sink, empty, repeat}, etc.

#3

I just skim read through RFC 517 and some of the associated pull requests. I made notes on areas that were not familiar. If you don’t understand something I’ve interpreted below then perhaps focus on that area to decrease the barrier to entry. I would greatly appreciate comments/feedback to fix anything I’ve incorrectly interpreted:

  • Rust’s IO Story
    • Phase 1 Definition - libuv-style (i.e. pure green-threaded model)
      • Phase 1 Details
        • ‘librustuv’ and ‘libuv’ are based on POSIX mold (Portable Operating System Interface) lacked cross-platform platform-specific APIs
      • Note: “Green” Threads can run a single thread simultaneously across Multiple-CPUs
    • Phase 2 Definition - ???-style (i.e. dual green/native model)
      • Question 2.10: What is the style of this second phase known as?? (i.e. non-libuv-or-native-style?)
      • Phase 2 Details:
        • Includes removal of the Runtime System Crate (librustct) from Rust’s Standard Library (enabling the Rust’s Standard Library to support both green and native threading). Alternative is to provide Tooling and Best Practices, including a potential Log Crate to handle logging separately
        • Includes moving the libgreen Crate (currently in ‘tree’) into an External Library (separate Cargo Package)
      • Note: “Native” Threads only run on a Single CPU
    • Phase 3 Definition (i.e. Rust 1.0) - libnative - pure native model (the kernel)
      • Question 3.05: Why is Rust 1.0 going to be the result of an evolution from “Green” (supporting Multi-CPU on single thread) to “Native” (supporting only Single CPU)? Why not the other way around?
      • Phase 3 Details:
        • Cross-platform API portable code in ‘std’ (with separate platform-specific modules) with only cross-platform Services and Configurations exposed
        • Interoperability between APIs that have different Error Types by providing a try! Macro
        • Pay-As-You-Go is required where new useful high-level abstractions to ‘std’ impose extra costs over system services but with the ability to safely access Services directly (i.e. Write Traits write method is now a low-level practice of writing to the underlying IO object)
        • Platform modules with custom IO ‘prelude’ submodule for ‘glob’ import (Paths matching a given Unix style pattern), including Traits, Types, and Modules
        • Path Reform:
          - Involves redesigning std::path Module for API stabilisation and for cross-platform compatibility, including design of os-str and the following new Types that represent platform-specific strings: String (OsString), and String Slice (OsStr). os:getenv (introduces replacement chars (i.e. ! ), and ios:getenv_as_bytes (yields Vec)
          - Includes support for “wide” formats OsStrBuf, OsStr in API and methods to_wide and from_wide used at FFI boundary
          - Supports situations where a developer receives some bytes of data representing a Path but needs to interpret the bytes (i.e. interpret UTF-8 as &str) if non-UTF-8 on Windows. They need to be able to interpret the bytes to find out what File System a given Path is on, which requires system calls (more expensive than simple string comparisons), with steps including:
          - MAX_PATH (length of Path safety check before checking whether it is an absolute or relative Path)
          - standardise() by prepending the Path with \?\ depending on whether it is a UNC Path, which only start with \ instead of \?, and so require prefixing with \?\UNC)
          - normalise() by append the Path
          - Convert \ to / and cater for … and .
          - Illegal-in-paths code (i.e. embedded nulls such as \0) is currently dealt with by the API using ‘panic’, alternatively new_opt() may be used to avoid using ‘panic’.
          • Since Rust currently supports sending/receiving encoded Unicode (Windows) and u8 uninterpreted flat byte sequences compatible with UTF-8 (Unix), however Windows works with arbitrary u16 sequences (interpreted as UTF-16 but may not be valid UTF-16 aka UCS-2) and has MAX_PATH limitations. Rust API supports cross-platform by providing the ‘str’ version and [u8] (which is not cross-platform)
          • Does not cater for legacy Unix Systems with Non-UTF-8 (that show/convert to ‘str’)
          • Solution proposed is to use WTF-8 (compatible with both UTF-8 and UTF-16). It interprets arbitrary u16 sequences as UTF-16. APIs use u16_slice(), utf16 and to_utf16 Methods to overcome limitations, or ‘glob’ ignore non-UTF16 files (ill-formed)
          • Includes:
            - Path and PathBuf (modelled on String and str respectively)
            - Path (newtype wrapper for [u8], Borrowed Type, usized struct)
            - Path may be created from &str to &Path without allocation or Normalisation
            - Used when just [u8] field required
            - PathBuf (newtype wrapper for Vec, Mutable Owning Variant)
          • Question 3.10: How does Path Reform prevent Security bugs??
          • Question 3.20: When does inspection occur? Is it performed at Path level or when crossing into the File System? How does Rust expose ‘normalising’ Methods (for either inspecting the File System or Paths), (i.e. syntactic normalisation via p.iter().collect() )? Is File System based Canonisation (converting data with more than one representation into a Standard) into “realpath” being undertaken (orthogonal to the rest of the design)?
          • Question 3.30: What does DST stand for? What is the DST-rollout for ‘libstd’? What is the difference between DST/Slices versus Owned Paths (with normalisation of ‘.’)? How is as_slice applicable?
          • Question 3.40: Does Path have no newtype wrapper around u8/str due to BorrowFrom???
          • Question 3.50: When is Path ‘header’ also required and what purpose does it serve?
          • Question 3.60: Is Path unsized because it’s size is not known at Compile time, only at Runtime?
          • Question 3.70 : Why must the Path be Isomorphic (similar structure) to pub struct Path ([u8])?
          • Question 3.80: What is the difference between &'a Path and Path<'a>?
          • Question 3.90: What is Boost Design (where iterator yields an element for root path it exists)?
        • Types should be ‘newtyped’ (tuple of struct with a single field that have new distinct names for existing types with no Runtime overhead during type convertion) in ‘std’ API
        • std::io Module is welded directly to native threads and system calls
          • Traits including Reader (with read_to_end method) and Writer (which currently does not indicate extent written when failure occurs midway).
          • IO Objects (i.e. file, socket, etc) with set_timeout Method that affects all Non-Blocking operations that is to be replaced by Deadlines (set_timeout is to be renamed)
          • Question 3.100: What does Atomicity mean in this context?
          • Question 3.110: What is an example of a Transient problem in this context?
        • POSIX-, libc-, and os- specific code that supports the ‘std’ API
        • core::io split into:
          • ‘libcore’
          • std::io Facade (non-Normative), where std::io is a Facade over core::io
            • std::io::util has Reader/Writer Adapters
        • High-level libraries may be built upon it by forking libnative and including a microkernel or unikernel of other new platforms (modifying libnative for embedded systems / exokernel Rust situations)
        • Embedded Systems Trade-Off Definition - Rust’s layer of indirection is flexible but this comes at a performance and binary size cost (due to the limitation of a fixed abstraction in system capabilities)
        • Blocking API whereby connections are blocked until each operation completes (i.e. receive a byte of data as an acknowledgement that data received by a remote site before proceeding to next operation)
        • Future Asynchronous IO Models requires new research and experimentation
          • EINTR (POSIX error) arises from different Blocking functions when interfacing with different code
          • Question 3.120: Who will and how will they research and experiment?
        • ‘os’ moves to ‘env’ (new Module to inspect/update environment variables, CPU counts, etc)
        • New Cursor Structure includes Seek Trait option and allows viewing in-memory data during Read/Write when composing Streams or creating Test Cases.
        • TapStream is Clone
          • Question 3.130: What is the supporting infrastructure for this?
          • Question 3.140: What is the LLVM (Low-level Virtual Machine) compiler infrastructure of Rust?
          • Question 3.150: What does Splitting Streams and Cancellation mean in this context?
        • Error Chaining Pattern is supported by std::io Module, and used when crossing abstraction boundaries, and enabling the chaining of operations where errors are possible but by not having to mention explicit error handling in the code (i.e. Reader Trait is “lifted” over IoResult)
        • try Blocks with ‘?’ Operator to propagate Exceptions (Polymorphic Synthesis) and Try/Catch Blocks (syntactic sugar)
        • Rust ST and IO Monads versus “ambient” Monad (syntax of checked Exceptions), and with “do” Notation for Results and integration with other Control Flow Structures
        • Benchmarking with memcpy at N = 10
          - Question 3.160: What is the relevance of being able to perform this benchmark test?

Credit goes to all RFC 517 contributors. I have quoted many of them above without providing direct credit.