knight landing

7/26/2019 Knight Landing

http://slidepdf.com/reader/full/knight-landing 1/8

The Knights Landing Xeon Phi Rollout Begins

November 16, 2015 Timothy Prickett Morgan

The natural place for Intel to launch the next iteration of its

“Knights” family of parallel X86 processors is at one of the

two major supercomputer conferences that are hosted each

year, which is the ISC conference in Germany and the SC

conference in the United States. Many people had hoped for

the “Knights Landing” Xeon Phi processors, which have been

anticipated for a few years now, at this week’s SC15

conference in Austin, Texas. But Intel is not yet ready, so we

have to wait.

Well, not everybody has to wait. As Charles Wuischpard, general manager of the HPC Platform

Group within Intel’s Data Center Group, explained in a prebriefing ahead of the SC15

conference, some early adopter customers are getting their hands on early versions of the

Knights Landing chips, even though they have not been formally announced. This has been

Intel’s practice for a number of years for both its Xeon and Xeon Phi compute engines, with

hyperscalers or HPC shops (or sometimes both) get to play with early silicon.

“We are making great progress and meeting some of the commitments that we made,”

explained Wuischpard. “Part of this is just getting silicon out and into the hands of our

partners and our early users.”

Three sites have pre-production, Top500-class supercomputer systems installed. Cray has to

have a large system as a precursor to delivering the “Cori” system at the National Energy

Research Scientific Computing Center (NERSC) and the “Trinity” system being shared by Los

Alamos National Laboratory and Sandia National Laboratories. The Cori and Trinity machines

will be installed in the first quarter of 2016, and according to Wuischpard, a large Xeon

Phi-machine is up and running in Cray’s lab testing a suite of applications “at full

functionality.”

Cray was awarded a $174 million contract back in July 2014 to build Trinity, which is a joint

effort between the Alliance for Computing at Extreme Scale (ACES) at Los Alamos NationalLaboratory and Sandia National Laboratories as part of the National Nuclear Security

ABOUT CONTRIBUTORS CONTACT THE REGISTER THE CHANNEL

HOME COMPUTE STORE CONNECT CONTROL CODE ANALYZE HPC ENTERP

The Knights Landing Xeon Phi Rollout Begins http://www.nextplatform.com/2015/11/16/the-kn...

1 of 8 20/05/2016 09:15 ከሰዓት



Administration’s Advanced Simulation and Computing Program (ASC). Trinity is using a mix

of “Haswell” Xeon E5 v3 and Knights Landing Xeon Phi processors for its compute elements

and will be installed at Los Alamos. The Trinity system is based on Cray’s next-generation

“Shasta” XC40 system, and is expected to have 9,346 dual-socket Xeon E5 v3 nodes and over

9,000 Knights Landing nodes, with more than 2 PB of DDR4 main memory and 42.2 petaflops

of aggregate peak performance across its compute elements. About 30.7 petaflops of that

compute should be coming from Knights Landing, and the remaining 11.5 petaflops come fromthe Haswell Xeons.

Trinity will use the “Aries” interconnect created by Cray and acquired by Intel several years

ago, and will have 82 PB in its parallel file system with 1.6 TB/sec of system bandwidth. The

system has a DataWarp burst buffer from Cray that weighs in at 3.65 PB and that delivers 3.28

TB/sec of sustained bandwidth, and the entire system should consume under 10 megawatts of

juice. Los Alamos started taking delivery of the Xeon E5 nodes back in February. Los Alamos

and Sandia expect for Trinity to deliver at least eight times the performance on its applications

compared to the “Cielo” supercomputer currently at Los Alamos, which was installed by Cray

in 2011 and delivers a peak performance of 1.37 petaflops across its 143,104 AMD Opteron coresand “Gemini” interconnect.

Cray won the $70 million contract to build the Cori system at NERSC back in April 2014, and

like Trinity, it is a hybrid Cray XC40 machine that mixes and matches Haswell Xeon E5 nodes

and Knights Landing Xeon Phi nodes. The Haswell Xeon E5 portion of the machine has already

been installed and has 1,630 nodes with a total of 52,160 cores with 203 TB of aggregate main

memory on the nodes and 28 PB of scratch storage with more than 700 GB/sec of peak I/O

bandwidth. The Cori phase 1 machine has a 750 TB burst buffer based on non-volatile

memory that delivers 750 GB/sec of I/O bandwidth. The Aries interconnect linking the

Haswell nodes, which uses a dragonfly topology, delivers 5.6 TB/sec of global bandwidth in

the current Cori phase 1 configuration. By the summer of 2016, Cori will be augmented with

9,304 Knights Landing Xeon Phi processor nodes and have 1,920 Xeon E5 nodes, and 384 burst buffer nodes. The whole thing will fit in 64 cabinets, and oddly enough, we have not seen a


2 of 8 20/05/2016 09:15 ከሰዓት



peak number-crunching performance figure for the machine published anywhere, but it

should be somewhere around 34 petaflops based on the performance figures given for Trinity

and the specs given for Cori.

In addition to Cray, the Bull systems unit of French systems integrator Atos has also received

early versions of the Knights Landing Xeon Phi chips from Intel for the foundational work for

the Tera 1000 system that Atos is building for the Commissariat à l’énergie atomique et aux énergies alternatives , or CEA, which is the abbreviation for the French Alternative Energies

and Atomic Energy Commission. The first phase of the Tera 1000 machine was installed last

week ahead of the SC15 supercomputing event, and it includes a mix of Xeon E5 v3 nodes and

pre-production Knights Landing Xeon Phi nodes; this first phase of the Tera 1000 machine is

expected to have about twice the peak performance of the current Tera 100 machine, which

has 4,730 two-socket Xeon E5 v1 nodes linked by QDR InfiniBand running at 40 Gb/sec and

delivering 1.25 petaflops peak. In mid-2016, Atos will roll out its own Bull Exascale

Interconnect for the Tera 1000 machine, and in 2017 phase two will launch with more than

8,000 Knights Landing Xeon Phi processors added to the complex. The final configuration is

expected to have in excess of 25 petaflops of performance.

The third facility that is getting early access to the Knights Landing chips is Sandia National

Laboratories, which has a bunch of machines with earlier generations of Xeon Phi

coprocessors and which is working with Penguin Computing on the machines. This particular

test machine is using a mix of Xeon Phi rack based on Intel’s new Omni-Path interconnect,

which Sandia is using to test their codes.

It the ISC15 conference in Germany last summer, Intel had said to expect for first commercial

shipments of the Knights Landing chips before the end of the year, but made no promises

about when it would actually do the launch. The company has been doing a rolling thunder

release of features for both the Knights Landing Xeon Phi and Omni-Path for the past year,and never promised to do the Knights Landing launch at the SC15 supercomputing conference,

but that is clearly where many had expected it. With over 8 billion transistors and using Intel’s

latest 14 nanometer technologies, it is fair to guess that Intel is working on getting the yields up

before it commits to general availability, and this is the gating factor to the formal

announcement for the Knights Landing chips.

Back at ISC15, Intel confirmed that the Knights Landing chip would have 72 cores, and that it

would come in a variant that plugged into a socket, another one that plugged into the socket

with dual integrated Omni-Path interfaces, and a third that would be packaged as a

PCI-Express coprocessor card like current Xeon Phi accelerators. The a month later, at the HotChips 27 conference, we learned that the Knights Landing chip actually has 76 cores, with four

being spares, which are there to help with yields and which might eventually be activated for

compute. We are not going to review all of the feeds and speeds of the Knights Landing design

here, but suffice it to say that it is a sophisticated processor and one of the largest circuits that

Intel has ever made, so it is no surprise to us that it is taking a bit longer to get it to market.


3 of 8 20/05/2016 09:15 ከሰዓት



“The general availability is still expected in the first half of 2016,” confirmed Wuischpard. “The

thing that we are wrestling with is that we have actually got our production volumes in the

factory right now for all of the first deliveries and we have quite a bit to deliver even pre-GA.

So we are going to have an early ship program and we have already got a number of orders

against that, and we expect the GA with more than 50 system providers. And when we look at

the application suite that really needs to be tuned and optimized, there are about 80 to 100 that

support the majority of the workloads in the HPC segment. We have got active collaborations

there.”

Intel is also going to be creating a single-socket Xeon Phi workstation, with the appropriate

main memory and PCI-Express peripheral slots, that it will make available to developers so

they can port and test their code without trying to gain access on an early adopter machine

like the ones mentioned above and the ones that will no doubt follow in the first part of 2016

ahead of and after the official Knights Landing launch. This development machine will not be a

server with Xeon processors and Xeon Phi coprocessors, but a real workstation with all of the

software and developer tools needed to port and test code.

Given that the Knights Landing implementation is available as a standalone processor as well

as a coprocessor, you might be thinking that Intel expects for a lot of the machinery built using

Knights Landing will be a mix of Xeon and Xeon Phi systems clustered together and working

side-by-side but not with the Xeon Phi being linked as a coprocessor to the Xeons – what the

Trinity, Cori, and Tera 1000 systems above look like to our eye and what the Tianhe-2A

supercomputer does not.

“You have the duality of running in a coprocessor mode through a PCI-Express connected

device or running in a true native bootable mode, and what we have seen is that by far the

larger interest is in native mode processing,” said Wuischpard in regards to the Knights

Landing chip. “If you look at the large supercomputing that have been announced such as

Trinity, Cori, and on and on, they are almost exclusively made up of Xeon Phi processors

running in native mode. If you look at the HPC industry going forward, I think that you are

going to see people that run Xeon because it keeps getting better and better, and people that

will decide to run in a sort of mixed mode with half of the nodes being Xeon and half being

Xeon Phi, and then there will be those who will say based on their workloads that it will be

best to run 100 percent on Xeon Phi. Even within that, I think the need for coprocessors will

diminish over time as you will be able to achieve various levels of performance and compute

density in the various configurations mentioned above.”

We happen to think – and have been saying all along – that the uptake for Xeon Phi could be


4 of 8 20/05/2016 09:15 ከሰዓት



stronger than Intel originally anticipated, and that may be another reason that it is coming to

market a little bit later than many had expected. Intel might see strong demand and want to

meet it. (This is a relative measure in a world that consumes roughly 20 million Xeon

processors a year, of course. The three big supers mentioned above have on the order of

27,000 Xeon Phi chips in them.)

Xeon Phi performance is going to drive demand if Intel gets the price right and can getsufficient yields on this monster chip using its 14 nanometer process.

Back in August, we showed off some performance benchmarks that compared a single Xeon

Phi with 72 cores testing against a two-socket Xeon E5 v3 server using ten-core E5-2697

processors on a variety of raw processor and application workload benchmarks. The single

Xeon Phi has about 2.5X the peak raw double-precision teraflops as a pair of Xeon E5s, and

also can run the AlexNet neural network training algorithm about 2.5X as fast and the

STREAM memory bandwidth test about 3.5X as fast. If you adjust this for performance per

watt, the gap is even larger.

If Intel prices the Knights Landing chip very aggressively – making it less than twice the priceof that Xeon E5 mentioned above, for instance – the uptake could be quite large indeed. In fact,

we suspect that there are some hyperscalers that are early testers of the Knights Landing chips

right now, even though Intel did not mention that, and that Intel is taking time to work out

how to price the future “Broadwell” Xeons and Knights Landing Xeon Phis to present their true

value while remaining competitive with GPUs, FPGAs and other acceleration technologies.

Share this:

Reddit Facebook 18 LinkedIn 36 Twitter Google Email


5 of 8 20/05/2016 09:15 ከሰዓት



Top 500 Supercomputing at

Landing Before Steep Climb

Intel Rounds Out Scalable

Systems With Omni-Path

Similar Vein

Categories: Compute, HPC

Tags: Intel, Knights Landing, SC15, Xeon, Xeon Phi

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment

Inside Future “Knights

Landing” Xeon Phi

Systems

Where Will Future Xeon

Phi Chips Land?

Atos Carves Its Own

Path To Exascale With

Bull Sequana

Intel Stacks Knights

Landing Chips Next To

Xeons

Dense Eurotech Hive

System Broadens

Compute

Future Systems: Intel

Ponders Breaking Up

The CPU


6 of 8 20/05/2016 09:15 ከሰዓት



Name *

Email *

Website

Post Comment

Pages

About

Contact

Contributors

Newsletter

Recent Posts

Google Takes Unconventional

Route with Homegrown Machine

Learning Chips

IBM Extends GPU Cloud

Capabilities, Targets Machine

Learning

Climate Research Pulls Deep

Learning Onto Traditional

Supercomputers


7 of 8 20/05/2016 09:15 ከሰዓት

knight landing

Documents