Skip to content

Flush CodedOutputStream buffer on drop. #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cfallin
Copy link
Contributor

@cfallin cfallin commented Apr 2, 2016

The current behavior -- that some leftover bytes may be silently forgotten in a CodedOutputStream's output buffer on drop -- caught me by surprise.

I'm not sure if the current behavior is intentional. Perhaps you wanted to be able to explicitly return a Result on flush(), whereas drop() cannot do that? But in that case the user who wants to catch the error can now explicitly flush() first to avoid any panic on drop, and they should be doing that anyway to avoid silently forgetting bytes. In any case, the new semantics seem less error-prone to me.

Thanks!


This change is Reviewable

@stepancheg
Copy link
Owner

Well, it is intentional. I think drop must only release resources, and not do anything heavy. The problem arises, for example, if CodedOutputStream is created over a socket, and buffer is not empty, then drop may hang indefinetly.

This behavior however contradics behavior of std::io::BufWriter (which I consider wrong for described reasons).

Need to think about it.

@stepancheg
Copy link
Owner

The problem with current BufWriter design: http://is.gd/ZdgbF9 Error is quietly ignored.

You propose to panic instead of ignore, which is also not good.

@cfallin
Copy link
Contributor Author

cfallin commented Apr 2, 2016

Hmm, I see. It seems there's no perfect solution -- as long as something is buffered, there's potential for a delayed error. We either have to drop bytes, drop the error, or panic. So I understand your reasoning.

I'm fine if you decide to keep things as they are, but one more proposal just came to mind:

It seems to me, in one way of thinking, that dropping without flushing is "indeterminate" in the same way that a race condition or uninitialized memory are: no correct program should do that, and you'll get something but you don't know what. It might be better to demand that the user either explicitly flushes or explicitly clears the buffer before dropping, and panic if not. That is:

  • a stream would have an "unfinished business" flag set on any write and cleared on any flush or explicit clear, and initially false.
  • a stream still has flush() returning a Result.
  • a stream also has clear_buffer() that forgets any unsent bytes, and always succeeds.
  • on drop, if flush() or clear_buffer() has not been called since last write (flag is set), then panic.

The difference is that this panic would be due to incorrect use of API (i.e., always occuring), rather than only occurring on some infrequent error, so at least it would be less error-prone.

What do you think? I'm happy to drop this and write my own RAII wrapper if you disagree. Thanks for considering :-)

@stepancheg
Copy link
Owner

I like your solution, except it has a problem: drop must not panic if it is currently unwinding due to another panic. Otherwize code like this would crash the process, not just panic:

let mut buffered = ...
buffered.write(buf);
assert!(some_cond);
buffered.flush().unwrap();

To solve the problem, drop needs to query unwind reason (normal or panic, something similar to std::uncaught_exception of C++), and panic only if it is normal unwinding.

I think the issue should be raised in rust about BufWriter. And which solution is chosen for BufWriter should be used for CodedOutputStream.

@stepancheg
Copy link
Owner

Found it: panicking. So, yes, your solution should work. I'll open an issue in Rust.

@stepancheg
Copy link
Owner

The issue about BufWriter: rust-lang/rust#32677

@cfallin
Copy link
Contributor Author

cfallin commented Apr 4, 2016

The rust-lang issue is still open, but I had a thought in the meantime: why not have CodedOutputStream actually use a BufWriter in its implementation? I agree with you that the two should have consistent behavior, and it seems this would be the simplest way to achieve that. A few questions:

  1. Does CodedOutputStream roll its own buffer in order to have the drop-without-flush capability, or were there other reasons too (e.g. performance)?
  2. If we refactor to use BufWriter under the covers, would you prefer to get a cancel() method on BufWriter first so that we can expose that in CodedOutputStream's API to support your drop-without-hang use-case?

@stepancheg
Copy link
Owner

why not have CodedOutputStream actually use a BufWriter in its implementation?

Several reasons:

  • CodedOutputStream could be implemented efficiently if it had access to BufWriter buffer (although CodedOutputStream is not optimized for that yet). BufWriter does not expose its buffer.
  • CodedOutputStream can write directly to byte array
  • C++ and Java versions of CodedOutputStream do own buffering
  • Historical reasons: there were not BufWriter when first version of rust-protobuf implemented

If we refactor to use BufWriter under the covers, would you prefer to get a cancel() method on BufWriter first so that we can expose that in CodedOutputStream's API to support your drop-without-hang use-case?

I think we can't use BufWriter under cover for reasons described above.

I'm not sure I understand what do you propose right now. My point is that CodedOutputStream should be to some degree consistent with rust API, and rust team hasn't come with decision yet (for example, they could mark certain functions as "destructors" which has to be called manually).

@cfallin
Copy link
Contributor Author

cfallin commented Apr 4, 2016

OK, thanks. I was trying to figure out if we could make forward progress here by simply delegating to BufWriter, since you had stated you wanted to follow its semantics, but I understand the efficiency reasons for not doing so. Will wait to see how the BufWriter issue concludes.

danburkert added a commit to danburkert/rust-protobuf that referenced this pull request May 14, 2016
This commit removes the internal buffering in CodedOutputStream. The protobuf
library should rely on the application to provide a buffered Write
implementation, if it so desires. By buffering internally, the protobuf library
forces double buffering in situations where the application is already buffering.
Additionally, it forces awkward resource lifetime issues as discussed in stepancheg#147.
Removing the internal buffering simplifies the internals of CodedOutputStream
and improves performance on the rust-protobuf-perftest write tests.

Before:

test1: write: 86 ns per iter
test_repeated_bool: write: 91 ns per iter
test_repeated_packed_int32: write: 127 ns per iter
test_repeated_messages: write: 224 ns per iter
test_optional_messages: write: 202 ns per iter
test_strings: write: 136 ns per iter
test_small_bytearrays: write: 1098 ns per iter
test_large_bytearrays: write: 119221 ns per iter

After:

test1: write: 55 ns per iter
test_repeated_bool: write: 66 ns per iter
test_repeated_packed_int32: write: 122 ns per iter
test_repeated_messages: write: 248 ns per iter
test_optional_messages: write: 207 ns per iter
test_strings: write: 118 ns per iter
test_small_bytearrays: write: 1064 ns per iter
test_large_bytearrays: write: 109567 ns per iter
@stepancheg
Copy link
Owner

This is merged into master long time ago, but it is not in stable though.

@stepancheg stepancheg closed this Apr 15, 2019
jxs pushed a commit to jxs/rust-protobuf that referenced this pull request Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants