stephen dart lards service manager monash e-research centre lards staging post enhancing workgroup...

Stephen DartLaRDS Service ManagerMonash e-Research Centre

LaRDS Staging PostEnhancing Workgroup Productivity

Managing User Expectation

In a perfect world

• Dedicated wire • 1Gb/s • 125 MB per second• 7.5 GB per minute• 450 GB per hour• 10 TB per day

In reality,inconsistency

• Slow Speed• 3~30 MB per second

• Workstation, Server or LaRDS?

• Share Hangs or Disconnects

• Please Explain!

Network at the Edge

Complications at the Core

Current LaRDS Samba service

- LaRDS Samba service for workgroup sharing files- End user experience is speed limitations- Not suited for workstation backup- Not suited for bulk upload- Oversubscribed disk is pushed to tape- Something faster please

Many factors to make things work slow• Current situation

– LaRDS Samba based on virtual server– Workstations at the edge of the network– Network bandwidth contention getting to LaRDS

Current ARMI workstation service

- Single Network Port per workstation- 1Gb/s bit rate on port- Effective throughput peak below 10%

- Common network switch for whole floor- Can handle many point to point within floor- Must share floor bandwidth to building switch

- Common network switch for building- Must share building bandwidth to precinct switch

What can be done now- Provide a local data service for workstations

- Install Staging Post on same switch as users- bypass VeRA for uploads and backup

- Increase bandwidth between floor switch and the precinct router

- Extra floor and building uplinks- Faster links between switches

What can be done now- Offload the big data as quickly as possible- To a local cache that can be used as a working share- Sync the data on a daily basis with LaRDS

Something still not right

• NAS on same switch and subnet as workstation

• One session ok, but second session kills first!

• Network engineers insist NAS too slow and dropping packets

• Serious detective work starts

Network Engineers in Denial

• Network bandwidth to NDT server – http://ndt.its.monash.edu.au/toolkit/

• Network bandwidth to Speedtest.net– http://www.speedtest.net/

• Network Weather Map all clear– http://cacti.its.monash.edu.au/cacti/weather

map/weathermap.html

– Low utilization and no errors

http://ndt.its.monash.edu.au/toolkit/

http://www.speedtest.net/

http://cacti.its.monash.edu.au/cacti/weathermap/weathermap.html

http://cacti.its.monash.edu.au/cacti/weathermap/weathermap.html

QoS Policy set at default for VOIP

Research networks generate data at theedge for upload to the core

Traditional Corporate Intranet

Research and Instrumentation Intranet

Tackle System Integration• Rethink QoS

– Trial with QoS off (unmanaged)

– Open call with CISCO

– TCP/IP behaviour

– Get Network Engineers trained in QoS

• Make sure NAS connected to AD – VeRA Samba was not AD connected

What can be done now- Offload the big data as quickly as possible- To a local cache that can be used as a working share- Sync the data on a daily basis with LaRDS

Updated QoS rolled out to all switches

Five Size Options for Staging Post

Staging Post Capacity User load, NIC Speed Cost

QNAP-509Pro 5 x 1.5TB (6TB RAID5)

~10 users, $2,500

QNAP-809Pro 8 x 2TB (12TB RAID5)

~20 users, 1Gb/s $4,300

QNAP-859URP(rackmounted)

8 x 3TB(18TB RAID6)

~30 users, 1Gb/s $4,750

QNAP-1279U-RP 12 x 3TB(30TB RAID6)

~50 users, 10Gb/s $8,000

SGI ISS-3500 24 x 2TB (40TB RAID6)

~100 users, 10Gb/s $25,000

Re arrange existing disk usage

• Provide two file systems match usage • Working data sets (fast, local disk)

– Online now, used often, interim results

• Archive data sets (deep, NFS to DMF)– Step or phase completion

– Reference for future work

– Storage object as a group of files

– Publication and citation

Integrate with Grid Access

• Grid Users using DMF for home folders– Grid processes flooding DMF shares

– Many small files gone by the time they hit the front of the migration queue

– DMF recalls stall Grid jobs

• Provide non-DMF Grid Scratch– Don’t back it up

Outstanding Issues

• Speeding up other VMs without hardware scale out

• Presenting Samba users with indication of Offline status

• User Indoctrination

Questions

stephen dart lards service manager monash e-research centre lards staging post enhancing workgroup...

Documents

local data service

big data

tb raid5

floor switch

switchcommon network

hour10 tb

denialnetwork bandwidth

autoolkitnetwork bandwidth