Skip to content

Added platform notes to fs pub functions #28613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Added platform notes to fs pub functions #28613

wants to merge 7 commits into from

Conversation

nathansizemore
Copy link
Contributor

Adds a platform notes section to the fs module's public functions #24795

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

///
/// ### Windows
///
/// `DeleteFileW`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure specifying (and therefore stabilising) implementation details is a good thing to do or the way we want to go in.

@steveklabnik
Copy link
Member

Yes, I share @nagisa 's concerns.

@rust-lang/libs what do you think of this PR?

@aturon
Copy link
Member

aturon commented Sep 25, 2015

@nathansizemore \o/ Thanks for taking this on!

@steveklabnik @nagisa This is desired in most cases across the IO API surface (see the original issue). Many of these APIs are intended to map directly to system APIs, and you really care (and need to be able to depend on) which APIs are being called.

That said, it will need a thorough review by multiple members of the libs team since it is a form of additional stabilization!

cc @alexcrichton @sfackler @BurntSushi (I think you all are the most interested on the libs team)

cc @retep998

@nagisa
Copy link
Member

nagisa commented Sep 25, 2015

Can we somehow annotate these as non-normative? This would prevent the issues with stabilisation.

Many of these APIs are intended to map directly to system APIs, and you really care (and need to be able to depend on) which APIs are being called.

Nonsense. What you might rely on is knowing family of API is being called, but not exactly which function.

For example, for process spawning on linux knowing a function of exec family will be called might be useful, but it is not really good to tie the API strongly to one of execl, execlp, execle, execv, execvp, execvpe or even direct syscall implementation of execve or execveat.

There’s a very good reason to do so. Eventually, in the future it might get discovered some execzz API is unsafe someway and a functionally equivalent, but safe alternative gets provided. Suddenly rust cannot move the implementation of process spawning to the newly-introduced (fictional) execxxyy, because it was specified that execzz is used.

EDIT: perhaps a simpler example: For windows you might be interested that DeleteFile is called, but should not rely at all on some exact variant (DeleteFileW or DeleteFileA) of that function being used.

Note, that I’m aware we probably won’t ever use the ANSI variants of functions on Windows, but what if windows suddenly goes utf-8 overnight and introduces *UTF8 variants of all its functions?

EDIT2: s/kind/family/

@aturon
Copy link
Member

aturon commented Sep 25, 2015

@nagisa

Can we somehow annotate these as non-normative? This would prevent the issues with stabilisation.

In my opinion, that would basically defeat the purpose of having these docs.

Many of these APIs are intended to map directly to system APIs, and you really care (and need to be able to depend on) which APIs are being called.

Nonsense. What you might rely on is knowing kind of API is being called, but not exactly which function.

You mention spawning a process as an example, but that's not the kind of API I had in mind when I said "map directly to system APIs" -- quite the opposite, since in Rust spawning involves quite a bit of work and several syscalls. I'm referring to APIs that map directly to a single (or in some cases two) system call.

Some examples that have come up where this kind of detail is important to know:

  • Sandboxing, where you have to whitelist underlying system APIs for use. @pcwalton's work on this front depended on knowing exactly what std was doing, and I'm sure he could elaborate on the need.
  • Precise flags. While std aims to be cross-platform, it's simply not possible to completely abstract away platform details. Knowing which system API is being used and with which flags is sometimes essential for understanding the exact behavior you can expect out of the call. Consider CLOEXEC (and whether it is set atomically) for example.
  • Error information. In some cases the precise error behavior varies by platform and it's important to know what to expect.

Now, only the first use case truly demands knowing the exact call being made. The other use cases could be addressed by giving platform-specific information at a somewhat higher level, which could make it more feasible to change what call we make over time.

@aturon
Copy link
Member

aturon commented Sep 25, 2015

Can we somehow annotate these as non-normative? This would prevent the issues with stabilisation.

In my opinion, that would basically defeat the purpose of having these docs.

Well, OK, I overstated here. It would mean that you couldn't rely on this behavior, but it could still be useful for understanding the behavior you're seeing.

Still, I'd really like to see what kind of normative information we can provide here.

@retep998
Copy link
Member

Note, that I’m aware we probably won’t ever use the ANSI variants of functions on Windows, but what if windows suddenly goes utf-8 overnight and introduces *UTF8 variants of all its functions?

First Windows would have to abandon all backwards compatibility with existing NTFS filesystems that allow lone surrogates in file names. Or they'd have to adopt WTF-8. I don't see either happening.

@aturon
Copy link
Member

aturon commented Sep 25, 2015

BTW, for reference, @pcwalton's sandboxing work can be found at https://github.com/pcwalton/gaol

@nagisa
Copy link
Member

nagisa commented Sep 25, 2015

You mention spawning a process as an example, but that's not the kind of API I had in mind when I said "map directly to system APIs" -- quite the opposite, since in Rust spawning involves quite a bit of work and several syscalls. I'm referring to APIs that map directly to a single (or in some cases two) system call.

My example was intended to be fictional, of course, and generally applies to a wide set of scenarios involving APIs that map to syscalls more directly. I only chosen exec as example, because it is my go to family of functions with a lot of them.

@aturon
Copy link
Member

aturon commented Sep 25, 2015

@nagisa Sure, fair enough! I actually think it's great to be concrete here, but want to focus on realistic concrete cases if we can.

I think fundamentally we want to try to get at what aspects of a call into system APIs you care about. For sandboxing, the identity of the API matters, and any change will break sandbox usages -- but you could perhaps argue that that's a niche case that should be calling std::os::* APIs directly, or something. For most other use cases, it might be possible to specify the behavior in a higher-level way (e.g. "On Unix systems we always open in CLOEXEC and do so atomically whenever possible" or some such). That's certainly a bigger effort to undertake, but it may be worthwhile.

@BurntSushi
Copy link
Member

At least for these specific APIs, the unix notes look pretty innocuous to me. I do think it is a good thing to be able to rely on these kinds of details, since we are explicitly trying to add as few layers as possible.

Also, is it appropriate to call all of these syscalls? Do all of them necessarily map straight to a syscall?

@retep998
Copy link
Member

On Windows none of these functions are syscalls, they are all win32 functions which in turn may perform NT syscalls.

@alexcrichton
Copy link
Member

I agree that we probably don't want to word this documentation in such a way that we're never able to modify the implementation in the future. These functions are likely to be relatively stable moving forward, but new things may pop up over time. For example TcpListener::accept currently only calls accept but we may want to perhaps probe for accept4 on more recent Linux platforms.

I definitely sympathize, however, that applications like gaol need syscall whitelists, but I think that the desire to have an ironclad guarantee about the syscalls being made are opposed to this desire to modify the standard library over time.

Perhaps we could have a section of documentation (in std::io?) explaining the high-level philosophy including things like CLOEXEC as @BurntSushi mentioned as well as "we try to map a function to a syscall where possible". It could then also be explicitly documented that while relatively stable, the exact set of functions/syscalls made may change over time, but there is documentation indicating what's currently happening to help authors who want to explore the implementation.

Now that being said, if you're looking to whitelist syscalls you're probably not that far off from just reading the source of the standard library, so the documentation-per-function may just be a nice-to-have.

@alexcrichton
Copy link
Member

I'm gonna close this out of inactivity for now, but @nathansizemore if you'd like to pick this up again please just let me know! I may be able to help figure out some good wording here that shouldn't commit us too heavily while still being useful.

@nathansizemore
Copy link
Contributor Author

Didn't mean to let fizzle, got busy with work and was hoping discussion would narrow down how pedantic these docs actually need to be by the time I got more free time. I'm still in for working on these, @alexcrichton! Next week, I'll be able to really spend some time on 'em :) I'll add some more thoughts before it fizzles away for good...

IMO, you either provide the underlying syscalls/Win32 calls in the platform notes, or nothing at all. The only real difference in behavior is directly due to what syscall call is made. The OS specific errors returned are also directly related to the underlying calls. I'd hope that knowing anything related to hardware in std::io is using the underlying OS's support, and would be common enough knowledge to avoid the sometimes we use syscalls with some flags blanket note.

I'd imagine the largest pool of users interested in this information would be in the security/sandboxing application space and anyone doing highly performant OS specific stuff. And as, @alexcrichton said, at this point you're digging into source. But also at this point, having the notes in the docs is a great deal more helpful than digging through a few layers of abstraction in source to find your end goal.

Not sure I completely understand the stabilization issue enough? I'd assume semver and release notes/change logs would take care of those worries, unless I'm missing something? Changing the underlying syscalls used is not like changing the syntax of lifetimes. One causes separate compiler versions for the ability to compile, while the other causes separate compiler versions for behavioral compliance. I'd also thing the users most affected by these types of changes would also be the same pool as above, and am assuming they would also be into reading those logs to see exactly what changed in various functions to see if they need to adjust anything.

@alexcrichton
Copy link
Member

How about a section of the std::io docs talking about "Platform behavior" sections which explains that the current documentation is roughly what happens under the hood but it's not guaranteed to stay the same over time? This could also take care of the boilerplate like CLOEXEC and non-inheritable handles on Windows. Each section could then look something along the lines of:

/// # Platform behavior
///
/// This function currently corresponds to the `open` function 
/// on Unix and the `CreateFile` function on Windows. Note that,
/// this may [change in the future][link-here], however.

@nathansizemore
Copy link
Contributor Author

I like it, I'll update the PR and get these updated

@alexcrichton
Copy link
Member

Ok, thanks again @nathansizemore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants