safe and unsafe r ust - university of wisconsin–madisonmarkm/unsafe-rust.pdf · safe and unsafe r...
TRANSCRIPT
LECTURE 27Guest Lecture by Mark Mansi (CS 538)
April 30, 2019
Safe and Unsafe Rust
AGENDA
FOUNDATIONSWhat does Rust actually guarantee?Introducing unsafeUnsafety and InvariantsUsing Abstraction
GETTING STARTED WITH UNSAFE RUSTRaw pointersLinks to further reading
WHAT DOES RUSTGUARANTEE?
GOAL: FEW BUGS, FASTER PROGRAMSAvoid doing non-sensical or wrong things…… and nd out when we do.Enable compiler optimizations.
LANGUAGE SPECDenes allowed, disallowed, and unspecied behaviors.
Examples of disallowed:dereference null pointer
have a bool that is not true or falseaccess array out of bounds
Examples of unspecied:In C/C++: a = f(b) + g(c)which is rst: f or g?
UNDEFINED BEHAVIOR (UB)there are on the behavior
of the program.no restrictions
Compilers are not required to diagnoseundened behavior (although many
simple situations are diagnosed),
and the compiled program is not requiredto do anything meaningful.
IMPLICATIONS OF UBCorrect programs don’t invoke UBUB can be hard to debugCompilers can assume no UB when optimizing
EXAMPLE FROM C++
ISO C++ forbids mutating string literals (ISO C++§2.13.4p2)
char *p = "I'm a string literal";p[3] = 'x';
EXAMPLE FROM C++
Deferencing an invalid pointer is forbidden (ISO C§6.5.3.2p4)
char *p = nullptr;p[3] = 'x'; // Program is allowed to eat laundry here
SAFETY IN RUST
Memory safetye.g. accesses are to valid values onlye.g. prohibiting mutable aliasing pointers
Thread safetye.g. mutable aliasing state
by
“Safety” means no UB
Enforced type system
NO UB IN SAFE RUST let x = Vec::new(); // Empty Vec
println!("Out of bounds: ", x[2]); // Panic, not UB!
fn foo() > &usize let x = 3; &x // Return reference to stack variable ﴾allowed in C﴿
// Doesn't compile ﴾borrow checker error﴿, not UB!
Dereferencing null, dangling, or unaligned pointersReading uninitialized memoryBreaking the pointer aliasing rulesProducing invalid primitive values:
dangling/null referencesnull fn pointers
a bool that isn’t true or false
UB IN (UNSAFE) RUST
Producing invalid primitive values:an undened enum discriminanta char outside the ranges [0x0, 0xD7FF] and
[0xE000, 0x10FFFF]A non-utf8 str
Unwinding into another languageCausing a data race
MORE UB IN (UNSAFE) RUST
WHAT DOES RUST NOTGUARANTEE?
EXAMPLE
struct Foo(Option<Arc<Mutex<Foo>>>);
impl Drop for Foo /// Implement a destructor for `Foo` fn drop(&mut self) // <do some clean up>
EXAMPLE (CONTINUED)
fn do_the_foo_thing() let foo1 = Arc::new(Mutex::new(Foo(None))); let foo2 = Arc::new(Mutex::new(Foo(None)));
// Reference cycle foo1.lock().unwrap().0 = Some(Arc::clone(&foo2)); foo2.lock().unwrap().0 = Some(Arc::clone(&foo1));
// `foo1` and `foo2` are never dropped! // Memory never freed. Foo::drop never called. No UB!
SAFE RUST CAN STILL…Panic (“graceful” crashing)Deadlock (two threads both waiting for each other)Leak of memory and other resources (never freedback to the system)Exit without calling destructors (never clean up)Integer overow (MAX_INT + 1)
A DILEMMA
EXAMPLEIn my program (Rust):
In libc (C):
/// Read from file `fd` into buffer `buf`.fn read_file(fd: i32, buf: &mut [u8]) let len = buf.len(); libc::read(fd, buf.as_mut_ptr(), len);
ssize_t read(int fd, void *buf, size_t count) // oops bug accidentally overflows `buf`
RESTORING SAFETY
Ok, but how do we call C libraries or the OS?
Compiler error: no unsafe C from safe Rust!
/// Read from the file descriptor into the buffer.fn read_file(fd: i32, buf: &mut [u8]) let len = buf.len(); libc::read(fd, buf.as_mut_ptr(), len); // Compile error!
unsafeSometimes need to do something potentially unsafe
system callscalls to C/C++ librariesinteracting with hardwarewriting assembly code…
Compiler can’t check these: Be careful!
EXAMPLE
/// Read from the file descriptor into the buffer.fn read_file(fd: i32, buf: &mut [u8]) let len = buf.len(); unsafe libc::read(fd, buf.as_mut_ptr(), len);
Rust compiles, but C code may dosomething bad: Be careful!
WHAT DOES unsafe MEAN?
“AUDIT unsafe BLOCKS”From libstd . Consider :Vec set_len
pub struct Vec<T> buf: RawVec<T>, len: usize,
impl Vec /// Sets the length of the vector to `new_len`. pub fn set_len(&mut self, new_len: usize) self.len = new_len;
“AUDIT unsafe BLOCKS”
Huh?!?
fn main() let mut my_vec = Vec::with_capacity(0); // empty vector my_vec.set_len(100);
my_vec[30] = 0; // UB!
UB in safe Rust? How?
unsafe fn
Can only be called in an unsafe block!
But why is it possible in the rst place?
impl Vec /// Sets the length of the vector to `new_len`. pub unsafe fn set_len(&mut self, new_len: usize) self.len = new_len;
UB AND INVARIANTSLanguage Invariant: something that is always trueaccording to Rust
breaking a language invariant is (by denition) UBe.g. bool is always true or falsee.g. all references are valid to dereference
UB AND INVARIANTS: something that is always true
according to the program spece.g. the server must write results to the log beforeresponding to the client
In the presence of unsafe, breaking
can break a language invariant
Program Invariant
program
invariants which is UB
UB AND INVARIANTS
Language invariant: no accesses to invalid memory
Program invariant: len is no longer than buf
Bad use of Vec::set_len violates program
invariant => access memory out of bounds == UB.
to just look in unsafe blocks!
pub struct Vec<T> buf: RawVec<T>, // `unsafe` in `RawVec` len: usize,
Not sufcient
UB AND INVARIANTSunsafe: someone promises to !
“Promise” is called a proof obligation.
uphold invariants
UB AND INVARIANTS
fn read_file(fd: i32, buf: &mut [u8]) let len = buf.len();
// `read_file` promises to respect buffer length unsafe libc::read(fd, buf.as_mut_ptr(), len);
// Caller of `set_len` promises to uphold `Vec` invariants!pub unsafe fn set_len(&mut self, new_len: usize) self.len = new_len;
unsafe ... blocks
Enclosing function is responsibleunsafe fn
Caller responsible when calling functionImpl. responsible when calling other unsafe
unsafe trait and unsafe implImplementor is responsible
DIFFERENT USES OF unsafeWhose job to check?
HOW TO PLAY WITHFIRE
SAFE ABSTRACTIONSIdea:
Users of the abstraction have no way to cause UBLanguage features make unsafe parts inaccessible
Private struct/enum eldsPrivate modules/types
Use unsafe to expose dangerous interfaces
Can reason about correctness modularly
Abstraction hides unsafe
EXAMPLE: VecUsing only safe methods of Vec, it is impossible to cause
UB, even though Vec uses unsafe internally.
The safe methods of Vec all uphold invariants.
Methods that could violate invariants are unsafe(e.g. set_len)
EXAMPLE: READING FILES
File, BufReader are safe abstractions that uphold
invariants about les, memory, etc.
fn main() > std::io::Result<()> // Open: call libc and OS. Safely! let file = File::open("foo.txt")?; let mut buf_reader = BufReader::new(file); let mut contents = String::new(); // Read: call libc and OS. Safely! buf_reader.read_to_string(&mut contents)?; assert_eq!(contents, "Hello, world!"); Ok(())
// Close: call libc and OS. Safely!
CAUTION: FIRE IS HOT
INVARIANTS YOU DIDN’T KNOW ABOUTVarianceRust ABIMemory layout of types
Zero-sized types, uninhabited types#[repr(C)] and #[repr(packed)]
Type-based optimizationsReordering, memory coherence, and Many more in the
optimizationsRustonomicon
PRACTICAL FIRETWIRLING 101
EXAMPLE: VecCaution: will ignore lots of concernsCan nd real implementation on GitHub
DETOUR: RAW POINTERS*const T and *mut T
Like C pointersNot borrow checked, unsafe to dereference
Utilities in std::ptrHelpful tools in libstdNonNull
impl Vec
pub struct Vec<T> buf: RawVec<T>, len: usize,
pub struct RawVec<T> ptr: *mut T, // ptr to allocated space cap: usize, // amount of allocated space
impl Vec
pub fn new() > Vec<T> Vec buf: RawVec::new(), // initially, no allocation len: 0,
impl RawVec
pub fn new() > Self RawVec ptr: ptr::null_mut(), // null ptr, safe to construct cap: 0,
impl Vec
pub fn pop(&mut self) > Option<T> if self.len == 0 None // empty vector else unsafe self.len = 1; // decrement length
// raw ptr read at index into buffer, return Some(ptr::read(self.buf.ptr.add(self.len())))
impl Vec
pub fn push(&mut self, value: T) // Are we out of space? if self.len == self.buf.cap() self.buf.double(); // alloc more space
// put the element in the `Vec` unsafe // compute address of end of buffer let end = self.buf.ptr.add(self.len); ptr::write(end, value); // write data to raw pointer self.len += 1; // increase length
impl RawVec
pub fn double(&mut self) unsafe let new_cap = self.cap * 2 + 1; // new capacity
// alloc more memory with system heap allocator let res = if self.cap > 0 heap::realloc(NonNull::from(self.ptr).cast(), self.cap, new_cap) else heap::alloc(new_cap) ; // ...
impl RawVec
pub fn double(&mut self) unsafe // ... match res Ok(new_ptr) => // update pointer and capacity self.ptr = res.unwrap().cast().into(); self.cap = new_cap; Err(AllocErr) => // handle out of memory out_of_memory();
OTHER unsafe TOOLSType memory layout: #[repr(...)]Mixed-language projectsextern fn
Strings, variadic fns (e.g. printf), extern types
rust-bindgen
cbindgen
THANK YOU!
EXTRA RESOURCES
Alexis Beingessner
The Rustonomicon
Ralf Jung’s Blog
Notes on Type Layouts and ABIOnly in RustThe Kinds of Implementation-Dened
EXTRA EXTRA RESOURCESVarious IRLO discussions:
UB and uninitialized memoryWhat do “memory safety”/“thread safety” mean?Taming UB in LLVM
Guide to UB