Skip to content

Commit 98af75d

Browse files
committed
Fix some cases of concurrent panics and backtraces
This commit adds an awful hack to this crate to attempt to mitigate the issue of concurrently generating a backtrace in the standard library via `RUST_BACKTRACE=1` and also generating a backtrace with this crate. On Windows the system library that we used for doing this is required to be single-threaded but we don't actually provide such a guarantee when using this crate because we can't easily synchronize with the standard library. The hack applied in this crate is to use a *panic hook* to inject ourselves into the panic process and ensure that panics are synchronized with generating a backtrace. The commit itself has a lot more comments about how this is not a complete nor a bullet-proof solution. It's just sort of the best approximation I think we can do with stable Rust. Long-term we're going to want to have some sort of official stable API in libstd to do this coordination.
1 parent 61ba508 commit 98af75d

File tree

3 files changed

+166
-0
lines changed

3 files changed

+166
-0
lines changed

Cargo.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,8 @@ edition = '2018'
152152
name = "accuracy"
153153
required-features = ["std", "dbghelp", "libbacktrace", "libunwind"]
154154
edition = '2018'
155+
156+
[[test]]
157+
name = "concurrent-panics"
158+
required-features = ['std']
159+
harness = false

src/dbghelp.rs

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,13 +253,25 @@ pub struct Init;
253253
/// lock.
254254
#[cfg(all(windows, feature = "dbghelp"))]
255255
pub unsafe fn init() -> Result<Init, ()> {
256+
// See comments below on `configure_synchronize_std_panic_hook` for why this
257+
// is necessary. Also note that we want to initialize this only once, but it
258+
// can fail so we try again for each time this function is called.
259+
#[cfg(feature = "std")]
260+
{
261+
static mut HOOK_CONFIGURED: bool = false;
262+
if !HOOK_CONFIGURED {
263+
HOOK_CONFIGURED = configure_synchronize_std_panic_hook();
264+
}
265+
}
266+
256267
// Calling `SymInitializeW` is quite expensive, so we only do so once per
257268
// process.
258269
static mut INITIALIZED: bool = false;
259270
if INITIALIZED {
260271
return Ok(Init);
261272
}
262273

274+
263275
// Actually load `dbghelp.dll` into the process here, returning an error if
264276
// that fails.
265277
DBGHELP.ensure_open()?;
@@ -286,3 +298,85 @@ pub unsafe fn init() -> Result<Init, ()> {
286298
DBGHELP.SymInitializeW().unwrap()(GetCurrentProcess(), ptr::null_mut(), TRUE);
287299
Ok(Init)
288300
}
301+
302+
/// A horrible, nasty, no-good, very bad hack. Otherwise known as "a best effort
303+
/// attempt to make this crate threadsafe against the standard library".
304+
///
305+
/// The dbghelp functions on Windows are all required to be invoked in a
306+
/// single-threaded manner. They cannot be invoked concurrently in a process.
307+
/// Ever. This provides an interesting problem for us because the standard
308+
/// library is going to be generating backtraces with `RUST_BACKTRACE=1` and
309+
/// this crate is also generating backtraces. This means that it's not safe for
310+
/// this crate to race with the standard library. We, however, don't really have
311+
/// a great way of determining this.
312+
///
313+
/// Hence, we add in an awful approximation of this. Here we configure a panic
314+
/// hook which unconditionally synchronizes with this crate if we think that the
315+
/// standard library will be generating backtraces. Wow this is an abuse of
316+
/// panic hooks.
317+
///
318+
/// The intention here though is that whenever a thread panics and would
319+
/// otherwise generate a backtrace we are careful to synchronize with the global
320+
/// lock in this crate protecting the APIs of this crate. That way either this
321+
/// crate is generating a backtrace or a thread is panicking, but never both at
322+
/// the same time.
323+
///
324+
/// This strategy is horribly fraught with bugs and only fixes some of the
325+
/// problem, not all. In addition to all the "TODO" mentioned below this is just
326+
/// a downright weird strategy that's an abuse of a feature that's not supposed
327+
/// to be used for this purpose. A true solution would be some sort of
328+
/// coordination with the standard library, probably literally exposing the lock
329+
/// used to generate backtraces. We would then acquire that lock instead to
330+
/// synchronize this crate instead of having our own lock.
331+
///
332+
/// I suspect this strategy probably also has:
333+
///
334+
/// * Double-panics. God forbid if this crate itself panics.
335+
/// * Deadlocks. There's a lot of locks in play, are they really grabbed in the
336+
/// right order?
337+
/// * Unsafety. This doesn't solve the problem, it only papers over some cases.
338+
///
339+
/// What I can at least hopefully say is that this does indeed solve some
340+
/// issues. If a program isn't otherwise configuring panic hooks, isn't
341+
/// changing `RUST_BACKTRACE` at runtime, and isn't using unstable features of
342+
/// libstd, then I think that this is correct and will actually protect against
343+
/// segfaults on Windows.
344+
#[cfg(feature = "std")]
345+
fn configure_synchronize_std_panic_hook() -> bool {
346+
use std::panic::{self, PanicInfo};
347+
use std::prelude::v1::*;
348+
349+
// If we don't find the `RUST_BACKTRACE` environment variable let's assume
350+
// that the standard library isn't generating backtraces, so we'll just skip
351+
// this step.
352+
//
353+
// TODO: this is memory unsafe if the value of `RUST_BACKTRACE` changes at
354+
// runtime.
355+
if std::env::var("RUST_BACKTRACE").is_err() {
356+
return true;
357+
}
358+
359+
// If our thread is already panicking, then we would panic again by calling
360+
// `take_hook` and `set_hook` below. We don't want to cause a double panic,
361+
// so fail initialization and retry again later.
362+
//
363+
// TODO: this is memory unsafe because we're not actually synchronizing
364+
// with the standard library. YOLO I guess?
365+
if std::thread::panicking() {
366+
return false;
367+
}
368+
369+
// Configure a panic hook to invoke the previous hook, but in a lock. This
370+
// way we're guaranteed that only one thread is accessing the backtrace lock
371+
// at any one point in time.
372+
//
373+
// TODO: this is racy wrt take_hook/set_hook. We can't really do anything
374+
// about that, this is just an awful strategy to fix this.
375+
let original_hook = panic::take_hook();
376+
let hook = Box::new(move |info: &PanicInfo| {
377+
let _guard = crate::lock::lock();
378+
original_hook(info);
379+
});
380+
panic::set_hook(hook);
381+
return true;
382+
}

tests/concurrent-panics.rs

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
use std::cell::Cell;
2+
use std::env;
3+
use std::panic;
4+
use std::process::Command;
5+
use std::sync::atomic::{AtomicBool, Ordering::SeqCst};
6+
use std::sync::Arc;
7+
use std::thread;
8+
9+
std::thread_local!(static IGNORE_PANIC: Cell<bool> = Cell::new(false));
10+
11+
const PANICS: usize = 1_000;
12+
const THREADS: usize = 8;
13+
const VAR: &str = "__THE_TEST_YOU_ARE_LUKE";
14+
15+
fn main() {
16+
if env::var(VAR).is_err() {
17+
parent();
18+
} else {
19+
child();
20+
}
21+
}
22+
23+
fn parent() {
24+
let me = env::current_exe().unwrap();
25+
let result = Command::new(&me)
26+
.env("RUST_BACKTRACE", "1")
27+
.env(VAR, "1")
28+
.output()
29+
.unwrap();
30+
if result.status.success() {
31+
return;
32+
}
33+
println!("stdout:\n{}", String::from_utf8_lossy(&result.stdout));
34+
println!("stderr:\n{}", String::from_utf8_lossy(&result.stderr));
35+
println!("code: {}", result.status);
36+
panic!();
37+
}
38+
39+
fn child() {
40+
let done = Arc::new(AtomicBool::new(false));
41+
let done2 = done.clone();
42+
let a = thread::spawn(move || {
43+
while !done2.load(SeqCst) {
44+
format!("{:?}", backtrace::Backtrace::new());
45+
}
46+
});
47+
48+
let threads = (0..THREADS)
49+
.map(|_| {
50+
thread::spawn(|| {
51+
IGNORE_PANIC.with(|c| c.set(true));
52+
for _ in 0..PANICS {
53+
assert!(panic::catch_unwind(|| {
54+
panic!();
55+
})
56+
.is_err());
57+
}
58+
})
59+
})
60+
.collect::<Vec<_>>();
61+
for thread in threads {
62+
thread.join().unwrap();
63+
}
64+
65+
done.store(true, SeqCst);
66+
a.join().unwrap();
67+
}

0 commit comments

Comments
 (0)