quiver: an informed storage cache for deep learning · quiver: an informed storage cache for deep...

55
Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India

Upload: others

Post on 04-Jun-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Quiver: An informed storage cache for Deep Learning

Abhishek Vijaya Kumar, Muthian Sivathanu

Microsoft Research India

Page 2: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Deep Learning: Important systems workload

• Already powers many real-world applications• Voice assistants

• Web search

• Compute intensive – expensive hardware e.g. GPUs

Page 3: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Deep Learning: Important systems workload

• Already powers many real-world applications• Voice assistants

• Web search

• Compute intensive – expensive hardware e.g. GPUs

Storage

Page 4: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Deep Learning: Important systems workload

• Already powers many real-world applications• Voice assistants

• Web search

• Compute intensive – expensive hardware e.g. GPUs

Storage

Same setting on Cloud

Page 5: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Deep Learning: Important systems workload

• Already powers many real-world applications• Voice assistants

• Web search

• Compute intensive – expensive hardware e.g. GPUs

Storage

Same setting on Cloud

Page 6: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Deep Learning: Important systems workload

• Already powers many real-world applications• Voice assistants

• Web search

• Compute intensive – expensive hardware e.g. GPUs

Storage

1V100 = 140 tflops/s

Same setting on Cloud

Page 7: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Example workload

• Resnet50 is a popular vision model

• Process 10,500 images/sec on 8 Nvidia V100s

• Goal: Keep GPUs busy and utilize them efficiently

Remote store with several TBs of training data

2GB /s

Page 8: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Example workload

• Resnet50 is a popular vision model

• Process 10,500 images/sec on 8 Nvidia V100s

• Goal: Keep GPUs busy and utilize them efficiently

Remote store with several TBs of training data

JOB K

.

.

.

JOB 1

Hyper-parameter tuning

2GB /s * K

Page 9: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Example workload

• Resnet50 is a popular vision model

• Process 10,500 images/sec on 8 Nvidia V100s

• Goal: Keep GPUs busy and utilize them efficiently

Remote store with several TBs of training data

JOB K

.

.

.

JOB 1

Hyper-parameter tuning

Load on Storage

Load on Network

2GB /s * K

Page 10: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Example workload

• Resnet50 is a popular vision model

• Process 10,500 images/sec on 8 Nvidia V100s

• Goal: Keep GPUs busy and utilize them efficiently

Remote store with several TBs of training data

JOB K

.

.

.

JOB 1

• Cheap Preemptible VMs => Job Migration

• Large datasets

Hyper-parameter tuning

Load on Storage

Load on Network

2GB /s * K

Page 11: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Quiver: Key ideas

• Domain specific intelligence at caching layer• Substitutability – Use existing contents of the cache to avoid thrashing

• Hash-based content addressing for security

• Co-designed with deep-learning framework (PyTorch)

• Dynamically manages cache allocation

• Improve cluster throughput up-to 2.3x

Page 12: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Structure

• Introduction & Motivation

• Background

• Design

• Implementation

• Evaluation

Page 13: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Background: Deep Learning

• Learn a model to represent training data

• Iterate over random subsets of input data – Mini batch

• Perform Gradient Descent (SGD) on each mini-batch

• Process the entire dataset in random order – Epoch

Page 14: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

A cache for DLT jobs

• DLT datasets are accessed multiple times• Within same job: Multiple epochs read the entire dataset

• Across jobs: Hyperparameter exploration, popular datasets (e.g. ImageNet)

• Good fit for caching

Page 15: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

A cache for DLT jobs

• DLT datasets are accessed multiple times• Within same job: Multiple epochs read the entire dataset

• Across jobs: Hyperparameter exploration, popular datasets (e.g. ImageNet)

• Good fit for caching

• Challenges• Random access within epoch => Partial caching can cause thrashing (e.g. LRU)

• Job Heterogeneity => Not all jobs benefit the same from caching

• Secure inter-job data access

Page 16: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

A cache for DLT jobs

• DLT datasets are accessed multiple times• Within same job: Multiple epochs read the entire dataset• Across jobs: Hyperparameter exploration, popular datasets (e.g. ImageNet)

• Good fit for caching

• Challenges• Random access within epoch => Partial caching can cause thrashing (e.g. LRU)• Job Heterogeneity => Not all jobs benefit the same from caching• Secure inter-job data access

• Quiver: Use domain intelligence to address these challenges

Page 17: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#1: Thrashing-proof partial caching

• Two I/O properties• Each input touched once in an epoch

• Every mini-batch needs to be randomly sampled

• Substitutable hits• I/O is substitutable

• Mini-batch samples order does not matter, as long as it is random

Page 18: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#1: Thrashing-proof partial caching

• Substitutability while sampling

• Looks up more than the number of indices and returns whatever is in the cache (substitutable hits)

Page 19: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#1: Thrashing-proof partial caching

• Substitutability while sampling

• Looks up more than the number of indices and returns whatever is in the cache (substitutable hits)

Default Sampling(1 hit, 2 misses)

Page 20: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#1: Thrashing-proof partial caching

• Substitutability while sampling

• Looks up more than the number of indices and returns whatever is in the cache (substitutable hits)

Quiver Sampling(3 hits, 6 misses)

Default Sampling(1 hit, 2 misses)

Page 21: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#2: Job heterogeneity and caching

• Benefit-aware caching to handle Job heterogeneity• Time per mini-batch is an application-specific metric for performance

• Allows cheap profiling to measure benefits from cache

• Predictability• Measure time per minibatch with different caching modes

• Given total space budget, the manager allocates cache per dataset

Page 22: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#3: Secure Inter-Job Data access

• Multiple jobs and users share cache

• Data needs reuse/sharing while retaining isolation

• Each file is addressed by its hash instead of its name

Page 23: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#3: Secure Inter-Job Data access

• Multiple jobs and users share cache

• Data needs reuse/sharing while retaining isolation

• Each file is addressed by its hash instead of its name

Page 24: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#3: Secure Inter-Job Data access

• Multiple jobs and users share cache

• Data needs reuse/sharing while retaining isolation

• Each file is addressed by its hash instead of its name

User1/imagenet/file.jpg

User2/imgnt/file.jpg

Page 25: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

#3: Secure Inter-Job Data access

• Multiple jobs and users share cache

• Data needs reuse/sharing while retaining isolation

• Each file is addressed by its hash instead of its name

User1/imagenet/file.jpg

User2/imgnt/file.jpg

hash(file.jpg)

hash(file.jpg)

Page 26: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Structure

• Introduction & Motivation

• Background

• Design

• Implementation

• Evaluation

Page 27: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Architecture of Quiver

• Quiver cache server

• Quiver cache client co-designed with PyTorch

• Quiver cache manager

• Quiver instance types1. Entire cluster

2. Each rack

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

Cache Miss

Quiver Cache Manager

Co-ordinated Eviction

Mini-batch time probing for Benefit aware caching

Set caching policy for datasets

Page 28: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Architecture of Quiver

• Quiver cache server

• Quiver cache client co-designed with PyTorch

• Quiver cache manager

• Quiver instance types1. Entire cluster

2. Each rack

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

Cache Miss

Quiver Cache Manager

Co-ordinated Eviction

Mini-batch time probing for Benefit aware caching

Set caching policy for datasets

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

Page 29: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Architecture of Quiver

• Quiver cache server

• Quiver cache client co-designed with PyTorch

• Quiver cache manager

• Quiver instance types1. Entire cluster

2. Each rack

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

Cache Miss

Quiver Cache Manager

Co-ordinated Eviction

Mini-batch time probing for Benefit aware caching

Set caching policy for datasets

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

PyTorchQuiver Client

Quiver Server

Hash Lookup /

Insert

VM Boundary Container Boundary

Page 30: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Cache Access

• Client is integrated with PyTorch data-layer• Fetches files from remote on misses

• Populates the cache servers

• Works with hash-digest file

• Incorporates substitutable hits and co-operative miss handling

Page 31: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Hash digest and Partition

• Dataset is represented by a hash-digest

• Major components of an entry in the hash-file• <content_hash: file_location>

• Key space is partitioned across servers

Page 32: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Hash digest and Partition

• Dataset is represented by a hash-digest

• Major components of an entry in the hash-file• <content_hash: file_location>

• Key space is partitioned across servers

Cache server 1 Cache server 2

F1 F5F3F2 F4

Page 33: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Hash digest and Partition

• Dataset is represented by a hash-digest

• Major components of an entry in the hash-file• <content_hash: file_location>

• Key space is partitioned across servers

Cache server 1 Cache server 2

F1 F5F3F2 F4

Page 34: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-operative miss handling

• Misses are sharded across jobs using same dataset.• Sharding is implicit by randomizing indices

• Happens naturally in DLT access pattern

• Jobs benefit from other jobs as they progress

Page 35: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-operative miss handling

• Misses are sharded across jobs using same dataset.• Sharding is implicit by randomizing indices

• Happens naturally in DLT access pattern

• Jobs benefit from other jobs as they progress

Page 36: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-operative miss handling

• Misses are sharded across jobs using same dataset.• Sharding is implicit by randomizing indices

• Happens naturally in DLT access pattern

• Jobs benefit from other jobs as they progress

0

1

2

3

4

5

Page 37: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-operative miss handling

• Misses are sharded across jobs using same dataset.• Sharding is implicit by randomizing indices

• Happens naturally in DLT access pattern

• Jobs benefit from other jobs as they progress

0

1

2

3

4

5

Page 38: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

Double buffer of a Cache server

J1 J2

Page 39: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

C1Double buffer of a Cache server

J1 J2

Page 40: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

C1Double buffer of a Cache server

J1 J2

Page 41: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

C2 C1Double buffer of a Cache server

J1 J2

Page 42: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

C2 C1Double buffer of a Cache server

J1 J2 J3

Page 43: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction

• Dataset partition• Digest file is partitioned into

given number of chunks

• Double buffering of chunks• Chunks allow coordinated

access of cache

• Co-ordinated eviction• Mark for eviction – no new refs

• Then evict

• Similar to UNIX unlink call

C2C3Double buffer of a Cache server

J1 J2 J3

Page 44: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Structure

• Introduction & Motivation

• Design

• Implementation & Evaluation

Page 45: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Implementation

• Cache client (900 LoC)• Dataloader of PyTorch (v 1.1.0)

• Dataset of PyTorch

• Sampler of PyTorch

• Cache server (1200 LOC)• A C++ key value store

• Cache manager • A python program

Page 46: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Evaluation Setup

• Cluster (48 GPUs)• 6 VMs with 4 NVIDIA P100 GPUs

• 6 VMs with 4 NVIDIA P40 GPUs

• Workloads• Resnet50 on Imagenet dataset (154 GB)

• Inception_V3 on openimages dataset (531 GB)

• DeepSpeech2 on LibriSpeech dataset (90 GB)

Page 47: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Impact on accuracy

RESNET50 on Imagenet

Config Word Error Rate (WER)

Baseline Sampling 22.29

Quiver Sampling 22.32

DeepSpeech2 on LibriSpeech

Similar curves

Page 48: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Throughput increase because of quvier

Resnet50

Time for 7000 mini-batches (s)Workload

Resnet50 2505 646 (3.88x) 1064 (2.35x)Baseline HIT CO-OP

Page 49: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Throughput increase because of quvier

Resnet50 InceptionV3 DeepSpeech2

Baseline Quiver (HIT) Quiver (CO-OP)Resnet50 2505 646 (3.88x) 1064 (2.35x)Inception 2874 1274 (2.26x) 1817 (1.58x)

DeepSpeech 1614 1234 (1.31x) 1265 (1.28x)

Time for 7000 mini-batches (s)Workload

Resnet50 2505 646 (3.88x) 1064 (2.35x)Baseline HIT CO-OP

Page 50: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction in action(s

ec)

Page 51: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction in action

• 2 Chunks cached at a time

• New jobs start using 3rd chunk

(sec

)

Page 52: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Co-ordinated eviction in action

• 2 Chunks cached at a time

• New jobs start using 3rd chunk

(sec

)

Page 53: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Benefit aware caching

Page 54: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Benefit aware caching

• Mixed workload – 12 Different jobs• Quiver preferentially allocates cache to different datasets• Quiver yields sizeable benefits even with tiny cache (100G)• Improvement in cluster throughput ranges between 1.6x to 2.3x

Page 55: Quiver: An informed storage cache for Deep Learning · Quiver: An informed storage cache for Deep Learning Abhishek Vijaya Kumar, Muthian Sivathanu Microsoft Research India. ... •Process

Summary

• Quiver is a domain-specific storage cache for DLT jobs

• Utilizes I/O behavior of deep learning training jobs• Substitutable hits => New thrash-proof partial caching

• Predictability => Benefit-aware caching

• Improves cluster GPU utilization by reducing I/O wait time

• Implemented in PyTorch