cloudfabric 3.0 hyper-converged dcn

13
Huawei Confidential 1 CloudFabric 3.0 Hyper-Converged DCN Unleashing Computing Power with New Ethernet

Upload: others

Post on 06-Apr-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential1

CloudFabric 3.0 Hyper-Converged DCN

Unleashing Computing Power with New Ethernet

Page 2: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential2

General-Purpose Computing

Storage HPC

DCN

DCN Connects General-Purpose Computing, Storage, and

HPC Networks

Storage networkService network Computing network

Page 3: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential3

Scale: 100xCentralized

↓Distributed

IT architecture

Computing

Unit

Storage media

PCIE

IB Ethernet

Performance: 100x

or

As-Is To-Be

Capacity: 1000xSCSI NVMe

FC(32G) RoCE(400G)

PCIe is replaced

HDD SSD

Ethernet Ethernet

Centralized Distributed

Three IT Changes Drive DCN Towards New Ethernet

CPU/GPU interconnection

over Ethernet

All-flash storage

interconnection over Ethernet

Server interconnection

over Ethernet

Page 4: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential4

More complex O&M of

large-scale networks

Zero packet loss required for

active-active storageZero packet loss required for HPC

0.2-0.3% Packet loss rate

0.15%

0.02%

(>70km)

DC A DC B

1000 nodes,millions of

configurations

DCN Is Facing Three Challenges to Evolve to New

Ethernet

Number of

Nodes

The packet loss rate increases

exponentially with network nodes on a

traditional Ethernet.

Traditional Ethernet lacks effective

O&M methods, and the network is

too complex to be handled manually.

Delays increase during intra-city (long

distance) transmission, and it is difficult to

perform flow control across DCNs on a

traditional Ethernet.

Page 5: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential5

Lossless Ethernet

• Zero packet loss for local and long-distance

transmission

• Convergence of computing and storage networks

Network-wide intelligent O&M

• Network-wide intelligent O&M of devices, interfaces,

optical modules, networks, and services

• Predictive maintenance, ensuring zero service

interruption

Full-lifecycle automation

• Automated network planning, construction,

maintenance, and optimization

• Intent-driven network, enabling network servitization

Three Features of Next-Generation DCN

Storage Cluster

Storage

Service Cluster

CPU

Computing Cluster

GPU

Ethernet

FinanceLarge

enterpriseGovernment

Page 6: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential6

Full-lifecycle automation

Reduces TTM by 90%.

Network-wide Intelligent O&M

Proactively predicts 90% of network faults

Ethernet for active-active storage

Reduces inter-DC links by 90%.

Unleashes 100% of computing power.

Ethernet for HPC

CloudFabric 3.0 Hyper-Converged Data Center Network Solutions

Hyper-Converged DCN

General-purpose

computing StorageHPC

OptimizationPlanning

Multi cloud

Construction Maintenance

Automation Intelligence

Page 7: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential7

Packet

Loss

0

0.1%

50% 100% 2X

Huawei

Solution

Traditional

Ethernet

Ethernet for HPC: Eliminates Ethernet Packet Loss and Unleashes 100% of Computing Power

Challenge: Ethernet Packet Loss

Has Persisted for 40 YearsHuawei: Intelligent Algorithms Introduced

to Eliminate Ethernet Packet Loss

Why does packet loss

occur on an Ethernet?

Unique iLossless-DCN algorithm

achieves precise speed control

N:1 traffic, exceeding the receive bandwidth

More packets lost as the number of nodes increases

Real-time traffic modelTens of millions of random samples

Unique iLossless-DCN algorithm

Scenario auto-

adaptation

Zero packet loss at

100% throughput

Scale auto-adaptation

Computing power doubled with

the network scale unchangedComputing power is

reduced by 50% if the

packet loss rate is just 0.1%

Average packet

loss rate

Network throughput

Page 8: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential8

Active-Active Ethernet Storage Network: Makes Breakthrough with Long-Distance Lossless Ethernet Algorithm and Reduces Links by 90%

vs

Why can a traditional Ethernet not be

used for cross-DC active-active storage?

Lossless algorithm, achieving zero

packet loss for a 70 km long-distance transmission

on an Ethernet

Active-active storageZero packet loss

Traditional Ethernet

> 0.2% packet loss over

long-distance

transmission

Requirement Actual

The RTT for 70 km intra-city transmission reaches up to 1 ms.

The iLossless-DCN algorithm cannot ensure zero packet loss

over such a long-distance transmission.

iLossless-DCI algorithm achieves

zero packet loss over long-

distance transmission

Three-dimensional iLossless

algorithm becomes invalid in

long-distance transmission

+ Spatiotemporal

variable

(distance/delay/jitter)

One more dimension,

100x difficultyService

requirement

Traffic model Network status

8G*128 100G*10

Helping Bank C implement cross-DC active-active storage by using 10 100GE lossless Ethernet links

to replace 100+ FC links, reducing links by 90%+

100+ 8 Gbit/s FC links -> 10

100GE links

Active DC intra-city A-A DCActive DC Intra-city A-A DC

Annual saving

CNY 20+

million

Huawei switch

Page 9: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential9

Full-Lifecycle Automation: Implements NaaS and Second-level Service Provisioning

Challenge: efficiency bottleneck of the two-dimensional

linear mode

Huawei's three-dimensional modeling: 1000x↑

efficiency, intelligent design and verification

Multiple persons from different

departments take 5 days to review

design plans.

Linear design of 10+ metrics,

leading to low efficiencyVerification scripts run for 3 hours for

10,000 nodes

N2

verifications take

a long time

Two-dimensional linear

verification

Two-dimensional linear

design

Resilience

Reliability

Availability

Resource

Security

Computing amount for the three-dimensional

mode decreases exponentially

Three-dimensional overlay

verificationThree-dimensional parallel

evaluation

Multimodal evaluation of

400+ factors

Network solution generation and

recommendation in seconds

Verify millions of

configurations in seconds

Optimal

resources

Optimal

quality

Optimal reliability

Flow

Device

Topology

Planning and

design

Expert review Configuration

delivery

Result

verification

Solution

recommendation

Pre-event

simulation

Configuration

delivery

Post-event

verification

Semi-automated network deployment,

implementing service changes in 5–8 days

E2E fully intelligent orchestration, provisioning

services in seconds

Page 10: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential10

Network-Wide Intelligent O&M: Implements Fault Self-Healing and Ensures Always-On Services

Knowledge Graph

01 02

0304

The complexity of the network exceeds individual

intelligence, leading to passive responses to faults.

Huawei's knowledge graph increases the fault

prediction rate (0 to 90%).

1000+ discrete KPIs and 100,000

configurationsBurst of millions of flows in

milliseconds

Complex relationships Rapid status changes Online self-learning Knowledge graph training

Incremental training on live-

network data

Proactive prediction of up to 97%

of faults

Data mining and modelingBig data collection

30+ years of O&M experience

Tens of millions of data

samples

Graph embedding fuzzy fault

inference algorithm, automatically

inferring 90% of network faults

Creative propagation relationship

learning algorithm

Mining of 1000+ relationships and

exclusive 5-dimensional modeling

Slow perception

Data collected

every 5 minutes

Slow locating

76 minutes

required for node-

by-node

troubleshooting

Heavy loss

$600,000 every 5

minutes of network

interruption

Rapid perception

Real-time

network health

evaluation

Rapid recovery

A risk removed in 5

minutes

Precise

prediction

90% of risks

predicted

Expert experience-dependent passive O&M,

locating faults in 76 minutes Proactive O&M, predicting 90% of network risks

and eliminating a risk in 5 minutes

Page 11: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential11

Products of CloudFabric 3.0 Hyper-converged Data Center Network

GE ToR Switch25GE ToR Switch100GE Switch

10GE ToR Switch 10GE Big Buffer

ToR SwitchFlexible Card Switch

CloudEngine 8850-64CQ-EI

CloudEngine 8850-SAN

CloudEngine 6863-48S8CQ-EI

CloudEngine 6860-SAN CloudEngine 5882-48T4S

CloudEngine 5855-48T4S2Q-EI

CloudEngine 9860-4C-EI

CloudEngine 8861-4C-EI

CloudEngine 6881-48T6CQ

CloudEngine 6881-48S6CQ

CloudEngine 6820-48S6CQ CloudEngine 6870-48S6CQ-EI

CloudEngine 6870-48T6CQ-EI

CloudEngine 16800

Page 12: CloudFabric 3.0 Hyper-Converged DCN

Huawei Confidential12

42,000+ flagship core switches

1400+ iMaster NCE

No.1 in China's market share for four consecutive

years (IDC 2020)

10GE+25GE Shipment No.1 globally(Gartner 2020)

Fastest growth in the global market (IDC 2020)

Huawei DCN solution has been widely deployed in 12,000+ DCs

CloudEngine 16800

F&S Global Technical Leadership

Award with the highest score

CloudFabric

Gartner ‘Customer choice’

for 3 consecutive times

CloudEngine 16800

Sullivan recommended network

products for China's new infrastructure

CloudEngine 16800

Interop Gold Medal

CloudEngine 16800

ODCC 400GE Data Center Switch

Outstanding Product Award

Gartner Exclusive ChallengerForrester Leader

Page 13: CloudFabric 3.0 Hyper-Converged DCN

Copyright©2021 Huawei Technologies Co., Ltd.

All Rights Reserved.

The information in this document may contain predictive

statements including, without limitation, statements regarding

the future financial and operating results, future product

portfolio, new technology, etc. There are a number of factors that

could cause actual results and developments to differ materially

from those expressed or implied in the predictive statements.

Therefore, such information is provided for reference purpose

only and constitutes neither an offer nor an acceptance. Huawei

may change the information at any time without notice.

把数字世界带入每个人、每个家庭、每个组织,构建万物互联的智能世界。

Bring digital to every person, home and organization for a fully connected, intelligent world.