flexflow installation on aws - stanford university

9
FlexFlow Installation on AWS Vinay Ramakrishnaiah 2020-12-11 LA-UR-20-29824 1 Logging-in to AWS Step 1: Visit https://aws.amazon.com and click the sign-in button on the top right corner. 1

Upload: others

Post on 09-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FlexFlow Installation on AWS - Stanford University

FlexFlow Installation on AWS

Vinay Ramakrishnaiah

2020-12-11

LA-UR-20-29824

1 Logging-in to AWS

Step 1: Visit https://aws.amazon.com and click the sign-in button on the topright corner.

1

Page 2: FlexFlow Installation on AWS - Stanford University

Step 2: Select root user and sign in to your AWS account.

Step 3: In the AWS Management Console, select EC2.

2

Page 3: FlexFlow Installation on AWS - Stanford University

Step 4: In the EC2 console, select ”Launch instance”.

Step 5: You’ll be redirected to the ”Choose AMI” window. There, search for”deep learning” and select ”Deep Learning AMI (Ubuntu 18.04) Version 36.0- ami-0bc87a16c757a7f07”. This AMI has most of the dependencies installedthat are required for FlexFlow.

3

Page 4: FlexFlow Installation on AWS - Stanford University

Step 6: Under the ’Choose Instance Type’ tab, ”filter by”: ”p2”.

Step 7: Depending on your requirement, you can select p2.xlarge or somethingelse and click ”Review and Launch” button at the bottom.

4

Page 5: FlexFlow Installation on AWS - Stanford University

Step 8: You’ll be asked to create a new key pair or use an existing key pair forconnecting to your instance securely. If you haven’t, create a new pair and click”Launch instance”.

Step 9: Select your instance and click ”Connect”.

5

Page 6: FlexFlow Installation on AWS - Stanford University

Step 10: Follow the instructions to connect using your favorite SSH client.

1.1 Connecting to the Instance

An AWS instance is provided for hands-on exercises. You should have receivedan email with your username and password (all lower case letters).username: <your_first_name>

password: bootcamp

SSH into the AWS instance.

1 ssh <username >@ec2 -<ip >.<region >. compute.amazonaws.com

2 Installing FlexFlow

FlexFlow has the following prerequisites:

• CUDNN

• Legion

• Protocol Buffer

• NCCL (optional)

• GASNet (Optional)

Your deep learning Ubuntu AMI has the following dependencies installed:CUDA, CUDNN, NCCL, Protocol Buffer. So, we only need to install Legion tobuild FlexFlow on your instance.

Step 1: Clone the repository (with submodules).

6

Page 7: FlexFlow Installation on AWS - Stanford University

1 git clone --recursive https :// github.com/flexflow/

FlexFlow.git

2 cd FlexFlow/

Step 2: Export environment variables to point FlexFlow to its dependencies(make sure to use your local paths).

1 export FF_HOME=<FlexFlow_dir >

2 export CUDNN_DIR =/usr/local/cuda

3 export CUDA_DIR =/usr/local/cuda

4 export LG_RT_DIR=<FlexFlow_dir >/ legion/runtime

5 export PROTOBUF_DIR =/home/ubuntu/anaconda3/pkgs/

libprotobuf -3.13.0.1 - h8b12597_0

6 export LD_LIBRARY_PATH=$PROTOBUF_DIR/lib:

$LD_LIBRARY_PATH

Step 3: specify CUDA architecture of your GPU (K80 is 3.7).

1 export GPU_ARCH =37

Step 4: Go to the Python directory in FlexFlow and build Python bindings.

1 cd python/

2 make -j4

Step 5: Verify your installation by running a simple MNIST example. Refer tosection 3 for description on flags. Note: You’ll need keras-processing to runthis example, which can be installed using pip.

1 pip install keras -preprocessing

Run the MNIST example:

1 ./ flexflow_python ../ examples/python/keras/

seq_mnist_mlp.py -ll:py 1 -ll:gpu 1 -ll:fsize 10000

-ll:zsize 5000

3 Command-Line Flags

In addition to setting runtime configurations in a FlexFlow Python/C++ pro-gram, the FlexFlow runtime also accepts command line arguments for variousruntime parameters.

FlexFlow training flags

• -e or --epochs: total number of epochs to run (default: 1)

7

Page 8: FlexFlow Installation on AWS - Stanford University

• -b or --batch-size: global batch size in each iteration (default: 64)

• -p or --print-freq: print frequency (default: 10)

• -d or --dataset: path to training dataset. If not set, synthetic data willbe used for training

Legion runtime flags

• -ll:py : number of python processors (default: 0)

• -ll:gpu : number of GPUs to use on each node (default: 0)

• -ll:fsize : size of device memory on each GPU (in MB)

• -ll:zsize : size of zero copy memory (pinned memory with direct GPUaccess) on each node (in MB). This is used for pre-fetching training imagesfrom disk

• -ll:cpu : number of data loading workers (default: 4)

• -ll:util : number of utility threads to create per process (default: 1)

• -ll:bgwork : number of background worker threads to create per process(default: 1)

Performance auto-tuning flags

• --search-budget or --budget: number of iterations of MCMC search(default: 0)

• --search-alpha or --alpha: hyper-parameter for the search procedure(default: 0.05)

• --export-strategy or --export: path to export the best discoveredstrategy (default: None)

• --import-strategy or --import: path to import previously saved strat-egy (default: None)

4 Docker Image

A Dockerfile defining the FlexFlow container in available in the folder FlexFlow/-docker, which can be used to build an image as shown in the example below.

Build an image using the Dockerfile.

1 cd docker

2 docker build -t ff:v1 .

8

Page 9: FlexFlow Installation on AWS - Stanford University

Use the NVIDIA Container Toolkit to run the GPU accelerated Docker con-tainer.

1 nvidia -docker run -ti --rm ff:v1

9