parallelism at scale: mpirossbach/cs378h/lectures/19-mpi-2-s20.… · outline for today rust...

185
cs378h Parallelism at Scale: MPI

Upload: others

Post on 30-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

cs378h

Parallelism at Scale: MPI

Page 2: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides
Page 3: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Outline for Today

Rust ClosuresProject CommentsScale: MPI

Acknowledgements:

Portions of the lectures slides were adopted from:

Argonne National Laboratory, MPI tutorials.

Lawrence Livermore National Laboratory, MPI tutorials

See online tutorial links in course webpage

W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, MIT Press, ISBN 0-262-57133-1, 1999.

W. Gropp, E. Lusk, and R. Thakur, Using MPI-2: Advanced Features of the Message Passing Interface, MIT Press, ISBN 0-262-57132-3, 1999.

Page 4: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Page 5: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Page 6: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Page 7: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Page 8: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Page 9: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn main() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex

Key ideas:• Use reference counting wrapper to pass refs• Use scoped lock for mutual exclusion• Actually compiles → works 1st time!

Page 10: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 11: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 12: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Why doesn’t “&” fix it?(&var_arc, instead of just var_arc)

Page 13: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Why doesn’t “&” fix it?(&var_arc, instead of just var_arc)

Would cloning var_arc fix it?

Page 14: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc.clone());

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 15: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc.clone());

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 16: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc.clone());

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Same problem!

Page 17: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc.clone());

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Same problem!

What if we just don’t move?

Page 18: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(|| {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 19: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(|| {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 20: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(|| {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

What’s the actual fix?

Page 21: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

let clone_arc = var_arc.clone();

thread::spawn(move || {

let ldata = Arc::clone(&clone_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 22: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

let clone_arc = var_arc.clone();

thread::spawn(move || {

let ldata = Arc::clone(&clone_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Compiles! Yay!Other fixes?

Page 23: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 24: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Page 25: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Why does this compile?

Page 26: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Could we use a vec of JoinHandleto keep var_arc in scope?

Page 27: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Could we use a vec of JoinHandleto keep var_arc in scope?

for i in 0..N { join(); }

Page 28: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Could we use a vec of JoinHandleto keep var_arc in scope?

for i in 0..N { join(); }

What if I need my lambda to own some things and borrow others?

Page 29: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Could we use a vec of JoinHandleto keep var_arc in scope?

for i in 0..N { join(); }

What if I need my lambda to own some things and borrow others?

Parameters!

Page 30: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

fn test() {

let var = Structure::new();

let var_lock = Mutex::new(var);

let var_arc = Arc::new(var_lock);

for i in 0..N {

thread::spawn(move || {

let ldata = Arc::clone(&var_arc);

let vdata = ldata.lock();

// ok to mutate var (vdata)!

});

}

}

Sharing State: Arc and Mutex, really

Could we use a vec of JoinHandleto keep var_arc in scope?

for i in 0..N { join(); }

What if I need my lambda to own some things and borrow others?

Parameters!

Page 32: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Project Proposal

• A very good example

Ideas:• Heterogeneity• Transactional

Memory• Julia, X10, Chapel• Actor Models: Akka• Dataflow Models• Race Detection• Lock-free data

structures• ….The sky is the limit

Page 33: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Project Proposal

• A very good example

Questions?

Ideas:• Heterogeneity• Transactional

Memory• Julia, X10, Chapel• Actor Models: Akka• Dataflow Models• Race Detection• Lock-free data

structures• ….The sky is the limit

Page 34: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Scale Out vs Scale Up

Page 35: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Scale Out vs Scale Up

Page 36: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Scale Out vs Scale Up

Page 37: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

Page 38: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

Hot Startup Idea:

www.purchase-a-pooch.biz

Page 39: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

Page 40: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

1. User Browses Potential Pets

Page 41: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

1. User Browses Potential Pets2. Clicks “Purchase Pooch”

PurchasePooch

Page 42: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

1. User Browses Potential Pets2. Clicks “Purchase Pooch”3. Web Server, CGI/EJB + Database complete request

PurchasePooch

Page 43: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

1. User Browses Potential Pets2. Clicks “Purchase Pooch”3. Web Server, CGI/EJB + Database complete request4. Pooch delivered (not shown)

PurchasePooch

Page 44: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Systems Architects Wanted

1. User Browses Potential Pets2. Clicks “Purchase Pooch”3. Web Server, CGI/EJB + Database complete request4. Pooch delivered (not shown)

PurchasePooch

How to handle lots and lots of dogs?

Page 45: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Userrequest

Page 46: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontally

Userrequest

Web Servers App Servers

Page 47: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”

Userrequest

Database Server

Web Servers App Servers

Page 48: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 49: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 50: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 51: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 52: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 53: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Page 54: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

3 Tier architecture

Web Servers (Presentation Tier) and App servers (Business Tier) scale horizontallyDatabase Server → scales vertically

Horizontal Scale → “Shared Nothing”Why is this a good arrangement?

Userrequest

Database Server

Web Servers App Servers

Vertical scale gets you a long way, but there is always a

bigger problem size

Page 55: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Horizontal Scale: Goal

Page 56: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 57: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 58: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 59: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 60: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 61: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Design Space

ThroughputLatency

Internet

Privatedata

center

Sharednothing

Sharedsomething

Page 62: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Page 63: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Page 64: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Page 65: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Page 66: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Page 67: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Cluster of SMPs• Shared memory in SMP

node• Messaging → SMP nodes

• also regarded as MPP if processor # is large

… …

… …

interconnection network

MM

MM

PP P P

PPPPnetworkinterface

Page 68: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Cluster of SMPs• Shared memory in SMP

node• Messaging → SMP nodes

• also regarded as MPP if processor # is large

… …

… …

interconnection network

MM

MM

PP P P

PPPPnetworkinterface

Multicore SMP+GPU Cluster

• Shared mem in SMP node

• Messaging between nodes

• GPU accelerators attached

… …

… …

interconnection network

MM

MM

PP P P

PPPP

Page 69: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Cluster of SMPs• Shared memory in SMP

node• Messaging → SMP nodes

• also regarded as MPP if processor # is large

… …

… …

interconnection network

MM

MM

PP P P

PPPPnetworkinterface

Multicore SMP+GPU Cluster

• Shared mem in SMP node

• Messaging between nodes

• GPU accelerators attached

… …

… …

interconnection network

MM

MM

PP P P

PPPP

What have we left out?

Page 70: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Parallel Architectures and MPI

Distributed Memory Multiprocessor

Messaging between nodes

Massively Parallel Processor (MPP)

Many, many processors

processor

memory

processor

memory

processor

memory

processor

memory

interconnection network

Cluster of SMPs• Shared memory in SMP

node• Messaging → SMP nodes

• also regarded as MPP if processor # is large

… …

… …

interconnection network

MM

MM

PP P P

PPPPnetworkinterface

Multicore SMP+GPU Cluster

• Shared mem in SMP node

• Messaging between nodes

• GPU accelerators attached

… …

… …

interconnection network

MM

MM

PP P P

PPPP

What have we left out? • DSMs• CMPs• Non-GPU Accelerators

Page 71: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

What requires extreme scale?

Page 72: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

What requires extreme scale?

Simulations—why?

Page 73: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

What requires extreme scale?

Simulations—why?Simulations are sometimes morecost effective than experiments

Page 74: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

What requires extreme scale?

Simulations—why?Simulations are sometimes morecost effective than experiments

Why extreme scale?More compute cycles, morememory, etc, lead for fasterand/or more accurate simulations

Page 75: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

What requires extreme scale?

Simulations—why?Simulations are sometimes morecost effective than experiments

Why extreme scale?More compute cycles, morememory, etc, lead for fasterand/or more accurate simulations

Clim

ate

Change

Astr

ophysic

s

Nuclear Reactors

Image c

redit: P

rabhat,

LB

NL

Page 76: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

How big is “extreme” scale?

Measured in FLOPs

FLoating point Operations Per second1 GigaFLOP = 1 billion FLOPs

1 TeraFLOP = 1000 GigaFLOPs

1 PetaFLOP = 1000 TeraFLOPsMost current super computers

1 ExaFLOP = 1000 PetaFLOPsArriving in 2018 (supposedly)

Page 77: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

How big is “extreme” scale?

Measured in FLOPs

FLoating point Operations Per second1 GigaFLOP = 1 billion FLOPs

1 TeraFLOP = 1000 GigaFLOPs

1 PetaFLOP = 1000 TeraFLOPsMost current super computers

1 ExaFLOP = 1000 PetaFLOPsArriving in 2018 (supposedly)

Page 78: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

How big is “extreme” scale?

Measured in FLOPs

FLoating point Operations Per second1 GigaFLOP = 1 billion FLOPs

1 TeraFLOP = 1000 GigaFLOPs

1 PetaFLOP = 1000 TeraFLOPsMost current super computers

1 ExaFLOP = 1000 PetaFLOPsArriving in 2018 (supposedly)

Page 79: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Page 80: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Network

M $

P

M $

P

M $

P

Network interface

Page 81: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

• Nodes: complete computer• Including I/O

• Nodes communicate via network• Standard networks (IP)• Specialized networks (RDMA, fiber)

Network

M $

P

M $

P

M $

P

Network interface

Page 82: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Each processor has a local memoryPhysically separated address space

• Nodes: complete computer• Including I/O

• Nodes communicate via network• Standard networks (IP)• Specialized networks (RDMA, fiber)

Network

M $

P

M $

P

M $

P

Network interface

Page 83: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Each processor has a local memoryPhysically separated address space

Processors communicate to access non-local data

Message communication

Message passing architecture

Processor interconnection network

• Nodes: complete computer• Including I/O

• Nodes communicate via network• Standard networks (IP)• Specialized networks (RDMA, fiber)

Network

M $

P

M $

P

M $

P

Network interface

Page 84: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Each processor has a local memoryPhysically separated address space

Processors communicate to access non-local data

Message communication

Message passing architecture

Processor interconnection network

Parallel applications partitioned acrossProcessors: execution units

Memory: data partitioning

• Nodes: complete computer• Including I/O

• Nodes communicate via network• Standard networks (IP)• Specialized networks (RDMA, fiber)

Network

M $

P

M $

P

M $

P

Network interface

Page 85: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Distributed Memory Multiprocessors

Each processor has a local memoryPhysically separated address space

Processors communicate to access non-local data

Message communication

Message passing architecture

Processor interconnection network

Parallel applications partitioned acrossProcessors: execution units

Memory: data partitioning

Scalable architectureIncremental cost to add hardware (cost of node)

• Nodes: complete computer• Including I/O

• Nodes communicate via network• Standard networks (IP)• Specialized networks (RDMA, fiber)

Network

M $

P

M $

P

M $

P

Network interface

Page 86: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

Page 87: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Page 88: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Wait…bisection bandwidth?

Page 89: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Wait…bisection bandwidth?

if network is bisected, bisection bandwidth == bandwidth

Page 90: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Wait…bisection bandwidth?

if network is bisected, bisection bandwidth == bandwidthbetween the two partitions

Page 91: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Wait…bisection bandwidth?

if network is bisected, bisection bandwidth == bandwidthbetween the two partitions

Page 92: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

Page 93: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

LatencyPerformance affected: processor may have to waitHard to overlap communication and computationOverhead to communicate: a problem in many machines

Page 94: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

LatencyPerformance affected: processor may have to waitHard to overlap communication and computationOverhead to communicate: a problem in many machines

Latency hidingIncreases programming system burdenE.g.: communication/computation overlap, prefetch

Page 95: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Performance: Latency and Bandwidth

BandwidthNeed high bandwidth in communicationMatch limits in network, memory, and processorNetwork interface speed vs. network bisection bandwidth

LatencyPerformance affected: processor may have to waitHard to overlap communication and computationOverhead to communicate: a problem in many machines

Latency hidingIncreases programming system burdenE.g.: communication/computation overlap, prefetch

Is this different from metrics we’ve cared about so far?

Page 96: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Page 97: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Page 98: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Communication explicit, simpler to understand

Page 99: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Communication explicit, simpler to understand

Explicit communication →focus attention on costly aspect of parallel computation

Page 100: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Communication explicit, simpler to understand

Explicit communication →focus attention on costly aspect of parallel computation

Synchronization →naturally associated with sending messages

reduces possibility for errors from incorrect synchronization

Page 101: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Communication explicit, simpler to understand

Explicit communication →focus attention on costly aspect of parallel computation

Synchronization →naturally associated with sending messages

reduces possibility for errors from incorrect synchronization

Easier to use sender-initiated communication →some advantages in performance

Page 102: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Ostensible Advantages of Distributed Memory Architectures

Hardware simpler (especially versus NUMA), more scalable

Communication explicit, simpler to understand

Explicit communication →focus attention on costly aspect of parallel computation

Synchronization →naturally associated with sending messages

reduces possibility for errors from incorrect synchronization

Easier to use sender-initiated communication →some advantages in performance

Can you think of any disadvantages?

Page 103: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Page 104: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 105: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 106: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 107: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobs

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 108: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobsSupercomputers used continuously

Processors: “scarce resource” jobs are “plentiful”

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 109: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobsSupercomputers used continuously

Processors: “scarce resource” jobs are “plentiful”

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

Page 110: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobsSupercomputers used continuously

Processors: “scarce resource” jobs are “plentiful”

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

• Scheduler runs scripts that initialize the environment• Typically done with environment variables

Page 111: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobsSupercomputers used continuously

Processors: “scarce resource” jobs are “plentiful”

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

• Scheduler runs scripts that initialize the environment• Typically done with environment variables

• At the end of initialization, it is possible to infer:• What the desired job configuration is (i.e., how many tasks per node)• What other nodes are involved• How your node’s tasks relates to the overall program

Page 112: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running on Supercomputers

Sometimes 1 job takes whole machineThese are called “hero runs”…

Sometimes many smaller jobsSupercomputers used continuously

Processors: “scarce resource” jobs are “plentiful”

• Programmer plans a job; job ==• parallel binary program

• “input deck” (specifies input data)

• Submit job to a queue

• Scheduler allocates resources when• resources are available,

• (or) the job is deemed “high priority”

• Scheduler runs scripts that initialize the environment• Typically done with environment variables

• At the end of initialization, it is possible to infer:• What the desired job configuration is (i.e., how many tasks per node)• What other nodes are involved• How your node’s tasks relates to the overall program

• MPI library interprets this information, hides the details

Page 113: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

Page 114: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

Page 115: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

Page 116: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

• MPI == Message-Passing Interface specification• Extended message-passing model• Not a language or compiler specification• Not a specific implementation or product

Page 117: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

• MPI == Message-Passing Interface specification• Extended message-passing model• Not a language or compiler specification• Not a specific implementation or product

• Specified in C, C++, Fortran 77, F90

Page 118: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

• MPI == Message-Passing Interface specification• Extended message-passing model• Not a language or compiler specification• Not a specific implementation or product

• Specified in C, C++, Fortran 77, F90• Message Passing Interface (MPI) Forum

• http://www.mpi-forum.org/ • http://www.mpi-forum.org/docs/docs.html

Page 119: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

The Message-Passing Model

Process: a program counter and address space

Processes: multiple threads sharing a single address space

MPI is for communication among processesNot threads

Inter-process communication consists of Synchronization

Data movement

P1 P2 P3 P4process

thread

address

space(memory)

How does this compare with CSP?

• MPI == Message-Passing Interface specification• Extended message-passing model• Not a language or compiler specification• Not a specific implementation or product

• Specified in C, C++, Fortran 77, F90• Message Passing Interface (MPI) Forum

• http://www.mpi-forum.org/ • http://www.mpi-forum.org/docs/docs.html

• Two flavors for communication• Cooperative operations

• One-sided operations

Page 120: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Page 121: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Process 0 Process 1

Send(data)

Receive(data)

time

Page 122: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Data is cooperatively exchanged in message-passing

Process 0 Process 1

Send(data)

Receive(data)

time

Page 123: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Data is cooperatively exchanged in message-passingExplicitly sent by one process and received by another

Process 0 Process 1

Send(data)

Receive(data)

time

Page 124: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Data is cooperatively exchanged in message-passingExplicitly sent by one process and received by anotherAdvantage of local control of memory

Change in the receiving process’s memory made with receiver’s explicit participation

Process 0 Process 1

Send(data)

Receive(data)

time

Page 125: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Data is cooperatively exchanged in message-passingExplicitly sent by one process and received by anotherAdvantage of local control of memory

Change in the receiving process’s memory made with receiver’s explicit participation

Communication and synchronization are combined

Process 0 Process 1

Send(data)

Receive(data)

time

Page 126: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Cooperative Operations

Data is cooperatively exchanged in message-passingExplicitly sent by one process and received by anotherAdvantage of local control of memory

Change in the receiving process’s memory made with receiver’s explicit participation

Communication and synchronization are combined

Process 0 Process 1

Send(data)

Receive(data)

time

Familiar argument?

Page 127: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

Page 128: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

Process 0 Process 1

Put(data)

(memory)

(memory)

Get(data)

time

Page 129: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

One-sided operations between processesInclude remote memory reads and writes

Process 0 Process 1

Put(data)

(memory)

(memory)

Get(data)

time

Page 130: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

One-sided operations between processesInclude remote memory reads and writes

Only one process needs to explicitly participateThere is still agreement implicit in the SPMD program

Process 0 Process 1

Put(data)

(memory)

(memory)

Get(data)

time

Page 131: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

One-sided operations between processesInclude remote memory reads and writes

Only one process needs to explicitly participateThere is still agreement implicit in the SPMD program

Implication:Communication and synchronization are decoupled

Process 0 Process 1

Put(data)

(memory)

(memory)

Get(data)

time

Page 132: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

One-Sided Operations

One-sided operations between processesInclude remote memory reads and writes

Only one process needs to explicitly participateThere is still agreement implicit in the SPMD program

Implication:Communication and synchronization are decoupled

Process 0 Process 1

Put(data)

(memory)

(memory)

Get(data)

time

Are 1-sided operations better for performance?

Page 133: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

A Simple MPI Program

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

MPI_Init( &argc, &argv );

printf( "Hello, world!\n" );

MPI_Finalize();

return 0;

}

Page 134: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Page 135: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Page 136: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Start processes on different nodesWhere does their executable program come from?

Page 137: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Start processes on different nodesWhere does their executable program come from?

Give processes what they need to knowWait…what do they need to know?

Page 138: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Start processes on different nodesWhere does their executable program come from?

Give processes what they need to knowWait…what do they need to know?

Configure OS-level resources

Page 139: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Start processes on different nodesWhere does their executable program come from?

Give processes what they need to knowWait…what do they need to know?

Configure OS-level resourcesConfigure tools that are running with MPI

Page 140: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Init

Hardware resources allocatedMPI-managed ones anyway…

Start processes on different nodesWhere does their executable program come from?

Give processes what they need to knowWait…what do they need to know?

Configure OS-level resourcesConfigure tools that are running with MPI…

Page 141: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Page 142: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?

Page 143: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

Page 144: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?

Page 145: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?

Page 146: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Page 147: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Executive Summary• Undo all of init• Be able to do it on

success or failure exit

Page 148: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Executive Summary• Undo all of init• Be able to do it on

success or failure exit

• By default, an error causes all processes to abort

Page 149: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Executive Summary• Undo all of init• Be able to do it on

success or failure exit

• By default, an error causes all processes to abort• The user can cause routines to return (with an error code)

• In C++, exceptions are thrown (MPI-2)

Page 150: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Executive Summary• Undo all of init• Be able to do it on

success or failure exit

• By default, an error causes all processes to abort• The user can cause routines to return (with an error code)

• In C++, exceptions are thrown (MPI-2)

• A user can also write and install custom error handlers

Page 151: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI_Finalize

Why do we need to finalize MPI?What is necessary for a “graceful” MPI exit?

Can bad things happen otherwise?Suppose one process exits…

How do resources get de-allocated?How to shut down communication?What type of exit protocol might be used?

Executive Summary• Undo all of init• Be able to do it on

success or failure exit

• By default, an error causes all processes to abort• The user can cause routines to return (with an error code)

• In C++, exceptions are thrown (MPI-2)

• A user can also write and install custom error handlers• Libraries may handle errors differently from applications

Page 152: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running MPI Programs

Page 153: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running MPI Programs

MPI-1 does not specify how to run an MPI program

Page 154: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running MPI Programs

MPI-1 does not specify how to run an MPI program

Starting an MPI program is dependent on implementation

Scripts, program arguments, and/or environment variables

Page 155: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running MPI Programs

MPI-1 does not specify how to run an MPI program

Starting an MPI program is dependent on implementation

Scripts, program arguments, and/or environment variables

% mpirun -np <procs> a.out

For MPICH under Linux

Page 156: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Running MPI Programs

MPI-1 does not specify how to run an MPI program

Starting an MPI program is dependent on implementation

Scripts, program arguments, and/or environment variables

% mpirun -np <procs> a.out

For MPICH under Linux

mpiexec <args>

Recommended part of MPI-2, as a recommendation

mpiexec for MPICH (distribution from ANL)

mpirun for SGI’s MPI

Page 157: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Finding Out About the Environment

Page 158: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Finding Out About the Environment

Two important questions that arise in message passingHow many processes are being use in computation?Which one am I?

Page 159: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Finding Out About the Environment

Two important questions that arise in message passingHow many processes are being use in computation?Which one am I?

MPI provides functions to answer these questionsMPI_Comm_size reports the number of processesMPI_Comm_rank reports the rank

number between 0 and size-1identifies the calling process

Page 160: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Hello World Revisited

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

int rank, size;

MPI_Init( &argc, &argv );

MPI_Comm_rank( MPI_COMM_WORLD, &rank );

MPI_Comm_size( MPI_COMM_WORLD, &size );

printf( "I am %d of %d\n", rank, size );

MPI_Finalize();

return 0;

}

Page 161: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Hello World Revisited

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

int rank, size;

MPI_Init( &argc, &argv );

MPI_Comm_rank( MPI_COMM_WORLD, &rank );

MPI_Comm_size( MPI_COMM_WORLD, &size );

printf( "I am %d of %d\n", rank, size );

MPI_Finalize();

return 0;

}

What does this program do?

Page 162: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Hello World Revisited

#include "mpi.h"

#include <stdio.h>

int main( int argc, char *argv[] )

{

int rank, size;

MPI_Init( &argc, &argv );

MPI_Comm_rank( MPI_COMM_WORLD, &rank );

MPI_Comm_size( MPI_COMM_WORLD, &size );

printf( "I am %d of %d\n", rank, size );

MPI_Finalize();

return 0;

}

What does this program do?Comm?“Communicator”

Page 163: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Basic Concepts

Processes can be collected into groups

Each message is sent in a contextMust be received in the same context!

A group and context together form a communicator

A process is identified by its rankWith respect to the group associated with a communicator

There is a default communicator MPI_COMM_WORLDContains all initial processes

Page 164: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Page 165: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatype

Page 166: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatypeAn MPI datatype is recursively defined as:

Predefined data type from the languageA contiguous array of MPI datatypesA strided block of datatypesAn indexed array of blocks of datatypesAn arbitrary structure of datatypes

Page 167: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatypeAn MPI datatype is recursively defined as:

Predefined data type from the languageA contiguous array of MPI datatypesA strided block of datatypesAn indexed array of blocks of datatypesAn arbitrary structure of datatypes

There are MPI functions to construct custom datatypesArray of (int, float) pairsRow of a matrix stored columnwise

Page 168: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatypeAn MPI datatype is recursively defined as:

Predefined data type from the languageA contiguous array of MPI datatypesA strided block of datatypesAn indexed array of blocks of datatypesAn arbitrary structure of datatypes

There are MPI functions to construct custom datatypesArray of (int, float) pairsRow of a matrix stored columnwise

Page 169: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatypeAn MPI datatype is recursively defined as:

Predefined data type from the languageA contiguous array of MPI datatypesA strided block of datatypesAn indexed array of blocks of datatypesAn arbitrary structure of datatypes

There are MPI functions to construct custom datatypesArray of (int, float) pairsRow of a matrix stored columnwise

• Enables heterogeneous communication• Support communication between processes on machines with different

memory representations and lengths of elementary datatypes• MPI provides the representation translation if necessary

Page 170: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Datatypes

Message data (sent or received) is described by a triple

address, count, datatypeAn MPI datatype is recursively defined as:

Predefined data type from the languageA contiguous array of MPI datatypesA strided block of datatypesAn indexed array of blocks of datatypesAn arbitrary structure of datatypes

There are MPI functions to construct custom datatypesArray of (int, float) pairsRow of a matrix stored columnwise

• Enables heterogeneous communication• Support communication between processes on machines with different

memory representations and lengths of elementary datatypes• MPI provides the representation translation if necessary

• Allows application-oriented layout of data in memory• Reduces memory-to-memory copies in implementation• Allows use of special hardware (scatter/gather)

Page 171: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Tags

Page 172: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Tags

Messages are sent with an accompanying user-defined integer tagAssist the receiving process in identifying the message

Page 173: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Tags

Messages are sent with an accompanying user-defined integer tagAssist the receiving process in identifying the message

Messages can be screened at receiving end by specifying specific tagMPI_ANY_TAG matches any tag in a receive

Page 174: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Tags

Messages are sent with an accompanying user-defined integer tagAssist the receiving process in identifying the message

Messages can be screened at receiving end by specifying specific tagMPI_ANY_TAG matches any tag in a receive

Tags are sometimes called “message types”MPI calls them “tags” to avoid confusion with datatypes

Page 175: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI Basic (Blocking) Send

MPI_SEND (start, count, datatype, dest, tag, comm)

The message buffer is described by:start, count, datatype

The target process is specified by dest

Rank of the target process in the communicatorspecified by comm

Process blocks until:

Data has been delivered to the system

Buffer can then be reused

Message may not have been received by target process!

Page 176: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI with Only Six Functions

Page 177: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI with Only Six Functions

Many parallel programs can be written using:MPI_INIT()

MPI_FINALIZE()

MPI_COMM_SIZE()

MPI_COMM_RANK()

MPI_SEND()

MPI_RECV()

Page 178: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI with Only Six Functions

Many parallel programs can be written using:MPI_INIT()

MPI_FINALIZE()

MPI_COMM_SIZE()

MPI_COMM_RANK()

MPI_SEND()

MPI_RECV()

Why have any other APIs (e.g. broadcast, reduce, etc.)?

Page 179: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

MPI with Only Six Functions

Many parallel programs can be written using:MPI_INIT()

MPI_FINALIZE()

MPI_COMM_SIZE()

MPI_COMM_RANK()

MPI_SEND()

MPI_RECV()

Why have any other APIs (e.g. broadcast, reduce, etc.)?

Point-to-point (send/recv) isn’t always the most efficient...Add more support for communication

Page 180: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Excerpt: Barnes-Hut

Page 181: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Excerpt: Barnes-Hut

Page 182: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

Excerpt: Barnes-Hut

Page 183: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

To use or not use MPI?

45Introduction to Parallel Computing, University of Oregon, IPCC

Page 184: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

To use or not use MPI?

• USE• You need a portable parallel program

• You are writing a parallel library

• You have irregular or dynamic data relationships

• You care about performance

45Introduction to Parallel Computing, University of Oregon, IPCC

Page 185: Parallelism at Scale: MPIrossbach/cs378h/lectures/19-MPI-2-s20.… · Outline for Today Rust Closures Project Comments Scale: MPI Acknowledgements: Portions of the lectures slides

To use or not use MPI?

• USE• You need a portable parallel program

• You are writing a parallel library

• You have irregular or dynamic data relationships

• You care about performance

• NOT USE• You don’t need parallelism at all

• You can use libraries (which may be written in MPI) or other tools

• You can use multi-threading in a concurrent environment• You don’t need extreme scale

45Introduction to Parallel Computing, University of Oregon, IPCC