Skip to content

Use SmallVec for SmallCStr #53644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 24, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 39 additions & 48 deletions src/librustc_data_structures/small_c_str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,69 +11,61 @@
use std::ffi;
use std::ops::Deref;

const SIZE: usize = 38;
use smallvec::SmallVec;

const SIZE: usize = 36;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Array isn't implemented for [T; 38], and due to the union optimization, the on-stack version can take more content per byte.


/// Like SmallVec but for C strings.
#[derive(Clone)]
pub enum SmallCStr {
OnStack {
data: [u8; SIZE],
len_with_nul: u8,
},
OnHeap {
data: ffi::CString,
}
pub struct SmallCStr {
data: SmallVec<[u8; SIZE]>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Now I see what you're doing. Why is this an improvement over the OnHeap version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absent the union optimization, this should generate the same code. However with the union optimization, the type is reduced to (usize, [T; _] / *mut T), which saves the discriminant + padding.

Apart from the memory savings, the code has seen a good deal of perf work already.

}

impl SmallCStr {
#[inline]
pub fn new(s: &str) -> SmallCStr {
if s.len() < SIZE {
let mut data = [0; SIZE];
data[.. s.len()].copy_from_slice(s.as_bytes());
let len_with_nul = s.len() + 1;

// Make sure once that this is a valid CStr
if let Err(e) = ffi::CStr::from_bytes_with_nul(&data[.. len_with_nul]) {
panic!("The string \"{}\" cannot be converted into a CStr: {}", s, e);
}

SmallCStr::OnStack {
data,
len_with_nul: len_with_nul as u8,
}
let len = s.len();
let len1 = len + 1;
let data = if len < SIZE {
let mut buf = [0; SIZE];
buf[..len].copy_from_slice(s.as_bytes());
SmallVec::from_buf_and_len(buf, len1)
} else {
SmallCStr::OnHeap {
data: ffi::CString::new(s).unwrap()
}
let mut data = Vec::with_capacity(len1);
data.extend_from_slice(s.as_bytes());
data.push(0);
SmallVec::from_vec(data)
};
if let Err(e) = ffi::CStr::from_bytes_with_nul(&data) {
panic!("The string \"{}\" cannot be converted into a CStr: {}", s, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not introducing this, but should we be returning Result here instead? Probably ok given the usage, but still...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already discussed during the introduction of the SmallCStr type, the consensus was that ergonomics are more important as long as it's possible to get a stack trace.

}
SmallCStr { data }
}

#[inline]
pub fn new_with_nul(s: &str) -> SmallCStr {
let b = s.as_bytes();
if let Err(e) = ffi::CStr::from_bytes_with_nul(b) {
panic!("The string \"{}\" cannot be converted into a CStr: {}", s, e);
}
SmallCStr { data: SmallVec::from_slice(s.as_bytes()) }
}


#[inline]
pub fn as_c_str(&self) -> &ffi::CStr {
match *self {
SmallCStr::OnStack { ref data, len_with_nul } => {
unsafe {
let slice = &data[.. len_with_nul as usize];
ffi::CStr::from_bytes_with_nul_unchecked(slice)
}
}
SmallCStr::OnHeap { ref data } => {
data.as_c_str()
}
unsafe {
ffi::CStr::from_bytes_with_nul_unchecked(&self.data[..])
}
}

#[inline]
pub fn len_with_nul(&self) -> usize {
match *self {
SmallCStr::OnStack { len_with_nul, .. } => {
len_with_nul as usize
}
SmallCStr::OnHeap { ref data } => {
data.as_bytes_with_nul().len()
}
}
self.data.len()
}

pub fn spilled(&self) -> bool {
self.data.spilled()
}
}

Expand All @@ -85,7 +77,6 @@ impl Deref for SmallCStr {
}
}


#[test]
fn short() {
const TEXT: &str = "abcd";
Expand All @@ -95,7 +86,7 @@ fn short() {

assert_eq!(scs.len_with_nul(), TEXT.len() + 1);
assert_eq!(scs.as_c_str(), reference.as_c_str());
assert!(if let SmallCStr::OnStack { .. } = scs { true } else { false });
assert!(!scs.spilled());
}

#[test]
Expand All @@ -107,7 +98,7 @@ fn empty() {

assert_eq!(scs.len_with_nul(), TEXT.len() + 1);
assert_eq!(scs.as_c_str(), reference.as_c_str());
assert!(if let SmallCStr::OnStack { .. } = scs { true } else { false });
assert!(!scs.spilled());
}

#[test]
Expand All @@ -121,7 +112,7 @@ fn long() {

assert_eq!(scs.len_with_nul(), TEXT.len() + 1);
assert_eq!(scs.as_c_str(), reference.as_c_str());
assert!(if let SmallCStr::OnHeap { .. } = scs { true } else { false });
assert!(scs.spilled());
}

#[test]
Expand Down