diff --git a/text/0000-io-fs-2.1.md b/text/0000-io-fs-2.1.md new file mode 100644 index 00000000000..920564d1d0c --- /dev/null +++ b/text/0000-io-fs-2.1.md @@ -0,0 +1,558 @@ +- Feature Name: `fs2` +- Start Date: 2015-04-04 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Expand the scope of the `std::fs` module by enhancing existing functionality, +exposing lower-level representations, and adding a few new functions. + +# Motivation + +The current `std::fs` module serves many of the basic needs of interacting with +a filesystem, but is missing a lot of useful functionality. For example, none of +these operations are possible in stable Rust today: + +* Inspecting a file's modification/access times +* Reading low-level information like that contained in `libc::stat` +* Inspecting the unix permission bits on a file +* Blanket setting the unix permission bits on a file +* Leveraging `DirEntry` for the extra metadata it might contain +* Reading the metadata of a symlink (not what it points at) +* Resolving all symlink in a path + +There is some more functionality listed in the [RFC issue][issue], but this RFC +will not attempt to solve the entirety of that issue at this time. This RFC +strives to expose APIs for much of the functionality listed above that is on the +track to becoming `#[stable]` soon. + +[issue]: https://github.com/rust-lang/rfcs/issues/939 + +## Non-goals of this RFC + +There are a few areas of the `std::fs` API surface which are **not** considered +goals for this RFC. It will be left for future RFCs to add new APIs for these +areas: + +* Enhancing `copy` to copy directories recursively or configuring how copying + happens. +* Enhancing or stabilizing `walk` and its functionality. +* Temporary files or directories + +# Detailed design + +First, a vision for how lowering APIs in general will be presented, and then a +number of specific APIs will each be proposed. Many of the proposed APIs are +independent from one another and this RFC may not be implemented all-in-one-go +but instead piecemeal over time, allowing the designs to evolve slightly in the +meantime. + +## Lowering APIs + +### The vision for the `os` module + +One of the principles of [IO reform][io-reform-vision] was to: + +> Provide hooks for integrating with low-level and/or platform-specific APIs. + +The original RFC went into some amount of detail for how this would look, in +particular by use of the `os` module. Part of the goal of this RFC is to flesh +out that vision in more detail. + +Ultimately, the organization of `os` is planned to look something like the +following: + +``` +os + unix applicable to all cfg(unix) platforms; high- and low-level APIs + io extensions to std::io + fs extensions to std::fs + net extensions to std::net + env extensions to std::env + process extensions to std::process + ... + linux applicable to linux only + io, fs, net, env, process, ... + macos ... + windows ... +``` + +APIs whose behavior is platform-specific are provided only within the `std::os` +hierarchy, making it easy to audit for usage of such APIs. Organizing the +platform modules internally in the same way as `std` makes it easy to find +relevant extensions when working with `std`. + +It is emphatically *not* the goal of the `std::os::*` modules to provide +bindings to *all* system APIs for each platform; this work is left to external +crates. The goals are rather to: + +1. Facilitate interop between abstract types like `File` that `std` provides and + the underlying system. This is done via "lowering": extension traits like + [`AsRawFd`][AsRawFd] allow you to extract low-level, platform-specific + representations out of `std` types like `File` and `TcpStream`. + +2. Provide high-level but platform-specific APIs that feel like those in the + rest of `std`. Just as with the rest of `std`, the goal here is not to + include all possible functionality, but rather the most commonly-used or + fundamental. + +Lowering makes it possible for external crates to provide APIs that work +"seamlessly" with `std` abstractions. For example, a crate for Linux might +provide an `epoll` facility that can work directly with `std::fs::File` and +`std::net::TcpStream` values, completely hiding the internal use of file +descriptors. Eventually, such a crate could even be merged into `std::os::unix`, +with minimal disruption -- there is little distinction between `std` and other +crates in this regard. + +Concretely, lowering has two ingredients: + +1. Introducing one or more "raw" types that are generally direct aliases for C + types (more on this in the next section). + +2. Providing an extension trait that makes it possible to extract a raw type + from a `std` type. In some cases, it's possible to go the other way around as + well. The conversion can be by reference or by value, where the latter is + used mainly to avoid the destructor associated with a `std` type (e.g. to + extract a file descriptor from a `File` and eliminate the `File` object, + without closing the file). + +While we do not seek to exhaustively bind types or APIs from the underlying +system, it *is* a goal to provide lowering operations for every high-level type +to a system-level data type, whenever applicable. This RFC proposes several such +lowerings that are currently missing from `std::fs`. + +[io-reform-vision]: https://github.com/rust-lang/rfcs/blob/master/text/0517-io-os-reform.md#vision-for-io +[AsRawFd]: http://static.rust-lang.org/doc/master/std/os/unix/io/trait.AsRawFd.html + +#### `std::os::platform::raw` + +Each of the primitives in the standard library will expose the ability to be +lowered into its component abstraction, facilitating the need to define these +abstractions and organize them in the platform-specific modules. This RFC +proposes the following guidelines for doing so: + +* Each platform will have a `raw` module inside of `std::os` which houses all of + its platform specific definitions. +* Only type definitions will be contained in `raw` modules, no function + bindings, methods, or trait implementations. +* Cross-platform types (e.g. those shared on all `unix` platforms) will be + located in the respective cross-platform module. Types which only differ in + the width of an integer type are considered to be cross-platform. +* Platform-specific types will exist only in the `raw` module for that platform. + A platform-specific type may have different field names, components, or just + not exist on other platforms. + +Differences in integer widths are not considered to be enough of a platform +difference to define in each separate platform's module, meaning that it will be +possible to write code that uses `os::unix` but doesn't compile on all Unix +platforms. It is believed that most consumers of these types will continue to +store the same type (e.g. not assume it's an `i32`) throughout the application +or immediately cast it to a known type. + +To reiterate, it is not planned for each `raw` module to provide *exhaustive* +bindings to each platform. Only those abstractions which the standard library is +lowering into will be defined in each `raw` module. + +### Lowering `Metadata` (all platforms) + +Currently the `Metadata` structure exposes very few pieces of information about +a file. Some of this is because the information is not available across all +platforms, but some of it is also because the standard library does not have the +appropriate abstraction to return at this time (e.g. time stamps). The raw +contents of `Metadata` (a `stat` on Unix), however, should be accessible via +lowering no matter what. + +The following trait hierarchy and new structures will be added to the standard +library. + +```rust +mod os::windows::fs { + pub trait MetadataExt { + fn file_attributes(&self) -> u32; // `dwFileAttributes` field + fn creation_time(&self) -> u64; // `ftCreationTime` field + fn last_access_time(&self) -> u64; // `ftLastAccessTime` field + fn last_write_time(&self) -> u64; // `ftLastWriteTime` field + fn file_size(&self) -> u64; // `nFileSizeHigh`/`nFileSizeLow` fields + } + impl MetadataExt for fs::Metadata { ... } +} + +mod os::unix::fs { + pub trait MetadataExt { + fn as_raw(&self) -> &Metadata; + } + impl MetadataExt for fs::Metadata { ... } + + pub struct Metadata(raw::stat); + impl Metadata { + // Accessors for fields available in `raw::stat` for *all* unix platforms + fn dev(&self) -> raw::dev_t; // st_dev field + fn ino(&self) -> raw::ino_t; // st_ino field + fn mode(&self) -> raw::mode_t; // st_mode field + fn nlink(&self) -> raw::nlink_t; // st_nlink field + fn uid(&self) -> raw::uid_t; // st_uid field + fn gid(&self) -> raw::gid_t; // st_gid field + fn rdev(&self) -> raw::dev_t; // st_rdev field + fn size(&self) -> raw::off_t; // st_size field + fn blksize(&self) -> raw::blksize_t; // st_blksize field + fn blocks(&self) -> raw::blkcnt_t; // st_blocks field + fn atime(&self) -> (i64, i32); // st_atime field, (sec, nsec) + fn mtime(&self) -> (i64, i32); // st_mtime field, (sec, nsec) + fn ctime(&self) -> (i64, i32); // st_ctime field, (sec, nsec) + } +} + +// st_flags, st_gen, st_lspare, st_birthtim, st_qspare +mod os::{linux, macos, freebsd, ...}::fs { + pub mod raw { + pub type dev_t = ...; + pub type ino_t = ...; + // ... + pub struct stat { + // ... same public fields as libc::stat + } + } + pub trait MetadataExt { + fn as_raw_stat(&self) -> &raw::stat; + } + impl MetadataExt for os::unix::fs::RawMetadata { ... } + impl MetadataExt for fs::Metadata { ... } +} +``` + +The goal of this hierarchy is to expose all of the information in the OS-level +metadata in as cross-platform of a method as possible while adhering to the +design principles of the standard library. + +The interesting part about working in a "cross platform" manner here is that the +makeup of `libc::stat` on unix platforms can vary quite a bit between platforms. +For example some platforms have a `st_birthtim` field while others do not. +To enable as much ergonomic usage as possible, the `os::unix` module will expose +the *intersection* of metadata available in `libc::stat` across all unix +platforms. The information is still exposed in a raw fashion (in terms of the +values returned), but methods are required as the raw structure is not exposed. +The unix platforms then leverage the more fine-grained modules in `std::os` +(e.g. `linux` and `macos`) to return the raw `libc::stat` structure. This will +allow full access to the information in `libc::stat` in all platforms with clear +opt-in to when you're using platform-specific information. + +One of the major goals of the `os::unix::fs` design is to enable as much +functionality as possible when programming against "unix in general" while still +allowing applications to choose to only program against macos, for example. + +#### Fate of `Metadata::{accesed, modified}` + +At this time there is no suitable type in the standard library to represent the +return type of these two functions. The type would either have to be some form +of time stamp or moment in time, both of which are difficult abstractions to add +lightly. + +Consequently, both of these functions will be **deprecated** in favor of +requiring platform-specific code to access the modification/access time of +files. This information is all available via the `MetadataExt` traits listed +above. + +Eventually, once a `std` type for cross-platform timestamps is available, these +methods will be re-instated as returning that type. + +### Lowering and setting `Permissions` (Unix) + +> **Note**: this section only describes behavior on unix. + +Currently there is no stable method of inspecting the permission bits on a file, +and it is unclear whether the current unstable methods of doing so, +`PermissionsExt::mode`, should be stabilized. The main question around this +piece of functionality is whether to provide a higher level abstractiong (e.g. +similar to the `bitflags` crate) for the permission bits on unix. + +This RFC proposes considering the methods for stabilization as-is and not +pursuing a higher level abstraction of the unix permission bits. To facilitate +in their inspection and manipulation, however, the following constants will be +added: + +```rust +mod os::unix::fs { + pub const USER_READ: raw::mode_t; + pub const USER_WRITE: raw::mode_t; + pub const USER_EXECUTE: raw::mode_t; + pub const USER_RWX: raw::mode_t; + pub const OTHER_READ: raw::mode_t; + pub const OTHER_WRITE: raw::mode_t; + pub const OTHER_EXECUTE: raw::mode_t; + pub const OTHER_RWX: raw::mode_t; + pub const GROUP_READ: raw::mode_t; + pub const GROUP_WRITE: raw::mode_t; + pub const GROUP_EXECUTE: raw::mode_t; + pub const GROUP_RWX: raw::mode_t; + pub const ALL_READ: raw::mode_t; + pub const ALL_WRITE: raw::mode_t; + pub const ALL_EXECUTE: raw::mode_t; + pub const ALL_RWX: raw::mode_t; + pub const SETUID: raw::mode_t; + pub const SETGID: raw::mode_t; + pub const STICKY_BIT: raw::mode_t; +} +``` + +Finally, the `set_permissions` function of the `std::fs` module is also proposed +to be marked `#[stable]` soon as a method of blanket setting permissions for a +file. + +## Constructing `Permissions` + +Currently there is no method to construct an instance of `Permissions` on any +platform. This RFC proposes adding the following APIs: + +```rust +mod os::unix::fs { + pub trait PermissionsExt { + fn from_mode(mode: raw::mode_t) -> Self; + } + impl PermissionsExt for Permissions { ... } +} +``` + +This RFC does not propose yet adding a cross-platform way to construct a +`Permissions` structure due to the radical differences between how unix and +windows handle permissions. + +## Creating directories with permissions + +Currently the standard library does not expose an API which allows setting the +permission bits on unix or security attributes on Windows. This RFC proposes +adding the following API to `std::fs`: + +```rust +pub struct DirBuilder { ... } + +impl DirBuilder { + /// Creates a new set of options with default mode/security settings for all + /// platforms and also non-recursive. + pub fn new() -> Self; + + /// Indicate that directories create should be created recursively, creating + /// all parent directories if they do not exist with the same security and + /// permissions settings. + pub fn recursive(&mut self, recursive: bool) -> &mut Self; + + /// Create the specified directory with the options configured in this + /// builder. + pub fn create>(&self, path: P) -> io::Result<()>; +} + +mod os::unix::fs { + pub trait DirBuilderExt { + fn mode(&mut self, mode: raw::mode_t) -> &mut Self; + } + impl DirBuilderExt for DirBuilder { ... } +} + +mod os::windows::fs { + // once a `SECURITY_ATTRIBUTES` abstraction exists, this will be added + pub trait DirBuilderExt { + fn security_attributes(&mut self, ...) -> &mut Self; + } + impl DirBuilderExt for DirBuilder { ... } +} +``` + +This sort of builder is also extendable to other flavors of functions in the +future, such as [C++'s template parameter][cpp-dir-template]: + +[cpp-dir-template]: http://en.cppreference.com/w/cpp/experimental/fs/create_directory + +```rust +/// Use the specified directory as a "template" for permissions and security +/// settings of the new directories to be created. +/// +/// On unix this will issue a `stat` of the specified directory and new +/// directories will be created with the same permission bits. On Windows +/// this will trigger the use of the `CreateDirectoryEx` function. +pub fn template>(&mut self, path: P) -> &mut Self; +``` + +At this time, however, it it not proposed to add this method to +`DirBuilder`. + +## Adding `FileType` + +Currently there is no enumeration or newtype representing a list of "file types" +on the local filesystem. This is partly done because the need is not so high +right now. Some situations, however, imply that it is more efficient to learn +the file type at once instead of testing for each individual file type itself. + +For example some platforms' `DirEntry` type can know the `FileType` without an +extra syscall. If code were to test a `DirEntry` separately for whether it's a +file or a directory, it may issue more syscalls necessary than if it instead +learned the type and then tested that if it was a file or directory. + +The full set of file types, however, is not always known nor portable across +platforms, so this RFC proposes the following hierarchy: + +```rust +#[derive(Copy, Clone, PartialEq, Eq, Hash)] +pub struct FileType(..); + +impl FileType { + pub fn is_dir(&self) -> bool; + pub fn is_file(&self) -> bool; + pub fn is_symlink(&self) -> bool; +} +``` + +Extension traits can be added in the future for testing for other more flavorful +kinds of files on various platforms (such as unix sockets on unix platforms). + +#### Dealing with `is_{file,dir}` and `file_type` methods + +Currently the `fs::Metadata` structure exposes stable `is_file` and `is_dir` +accessors. The struct will also grow a `file_type` accessor for this newtype +struct being added. It is proposed that `Metadata` will retain the +`is_{file,dir}` convenience methods, but no other "file type testers" will be +added. + +## Enhancing symlink support + +Currently the `std::fs` module provides a `soft_link` and `read_link` function, +but there is no method of doing other symlink related tasks such as: + +* Testing whether a file is a symlink +* Reading the metadata of a symlink, not what it points to + +The following APIs will be added to `std::fs`: + +```rust +/// Returns the metadata of the file pointed to by `p`, and this function, +/// unlike `metadata` will **not** follow symlinks. +pub fn symlink_metadata>(p: P) -> io::Result; +``` + +## Binding `realpath` + +There's a [long-standing issue][realpath] that the unix function `realpath` is +not bound, and this RFC proposes adding the following API to the `fs` module: + +[realpath]: https://github.com/rust-lang/rust/issues/11857 + +```rust +/// Canonicalizes the given file name to an absolute path with all `..`, `.`, +/// and symlink components resolved. +/// +/// On unix this function corresponds to the return value of the `realpath` +/// function, and on Windows this corresponds to the `GetFullPathName` function. +/// +/// Note that relative paths given to this function will use the current working +/// directory as a base, and the current working directory is not managed in a +/// thread-local fashion, so this function may need to be synchronized with +/// other calls to `env::change_dir`. +pub fn canonicalize>(p: P) -> io::Result; +``` + +## Tweaking `PathExt` + +Currently the `PathExt` trait is unstable, yet it is quite convenient! The main +motivation for its `#[unstable]` tag is that it is unclear how much +functionality should be on `PathExt` versus the `std::fs` module itself. +Currently a small subset of functionality is offered, but it is unclear what the +guiding principle for the contents of this trait are. + +This RFC proposes a few guiding principles for this trait: + +* Only read-only operations in `std::fs` will be exposed on `PathExt`. All + operations which require modifications to the filesystem will require calling + methods through `std::fs` itself. + +* Some inspection methods on `Metadata` will be exposed on `PathExt`, but only + those where it logically makes sense for `Path` to be the `self` receiver. For + example `PathExt::len` will not exist (size of the file), but + `PathExt::is_dir` will exist. + +Concretely, the `PathExt` trait will be expanded to: + +```rust +pub trait PathExt { + fn exists(&self) -> bool; + fn is_dir(&self) -> bool; + fn is_file(&self) -> bool; + fn metadata(&self) -> io::Result; + fn symlink_metadata(&self) -> io::Result; + fn canonicalize(&self) -> io::Result; + fn read_link(&self) -> io::Result; + fn read_dir(&self) -> io::Result; +} + +impl PathExt for Path { ... } +``` + +## Expanding `DirEntry` + +Currently the `DirEntry` API is quite minimalistic, exposing very few of the +underlying attributes. Platforms like Windows actually contain an entire +`Metadata` inside of a `DirEntry`, enabling much more efficient walking of +directories in some situations. + +The following APIs will be added to `DirEntry`: + +```rust +impl DirEntry { + /// This function will return the filesystem metadata for this directory + /// entry. This is equivalent to calling `fs::symlink_metadata` on the + /// path returned. + /// + /// On Windows this function will always return `Ok` and will not issue a + /// system call, but on unix this will always issue a call to `stat` to + /// return metadata. + pub fn metadata(&self) -> io::Result; + + /// Return what file type this `DirEntry` contains. + /// + /// On some platforms this may not require reading the metadata of the + /// underlying file from the filesystem, but on other platforms it may be + /// required to do so. + pub fn file_type(&self) -> io::Result; + + /// Returns the file name for this directory entry. + pub fn file_name(&self) -> OsString; +} + +mod os::unix::fs { + pub trait DirEntryExt { + fn ino(&self) -> raw::ino_t; // read the d_ino field + } + impl DirEntryExt for fs::DirEntry { ... } +} +``` + +# Drawbacks + +* This is quite a bit of surface area being added to the `std::fs` API, and it + may perhaps be best to scale it back and add it in a more incremental fashion + instead of all at once. Most of it, however, is fairly straightforward, so it + seems prudent to schedule many of these features for the 1.1 release. + +* Exposing raw information such as `libc::stat` or `WIN32_FILE_ATTRIBUTE_DATA` + possibly can hamstring altering the implementation in the future. At this + point, however, it seems unlikely that the exposed pieces of information will + be changing much. + +# Alternatives + +* Instead of exposing accessor methods in `MetadataExt` on Windows, the raw + `WIN32_FILE_ATTRIBUTE_DATA` could be returned. We may change, however, to + using `BY_HANDLE_FILE_INFORMATION` one day which would make the return value + from this function more difficult to implement. + +* A `std::os::MetadataExt` trait could be added to access truly common + information such as modification/access times across all platforms. The return + value would likely be a `u64` "something" and would be clearly documented as + being a lossy abstraction and also only having a platform-specific meaning. + +* The `PathExt` trait could perhaps be implemented on `DirEntry`, but it doesn't + necessarily seem appropriate for all the methods and using inherent methods + also seems more logical. + +# Unresolved questions + +* What is the ultimate role of crates like `liblibc`, and how do we draw the + line between them and `std::os` definitions?