rosa filgueira – university of edinburgh iraklis klamapnos- university of edinburgh yusuke...

28
FAST: Flexible Automated Synchronization Transfer Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Upload: ali-blease

Post on 14-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

FAST: Flexible Automated Synchronization Transfer

Rosa Filgueira – University of EdinburghIraklis Klamapnos- University of Edinburgh

Yusuke Tanimura- AIST, TsukubaMalcolm Atkinson- University of Edinburgh

Page 2: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Introduction◦ Problem description◦ Hypothesis◦ Rock Physics laboratory experiments◦ Objective◦ Proposal

Related developments◦ Data transfer protocols◦ Data transport systems

FAST◦ Selecting the best data transfer protocol◦ Data transfer experiments◦ Implementation and evaluation

Future work and Questions

Index

Page 3: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Large number of rock physics (RP) laboratories◦ Runs many experiments (Experimentalists)

Large number of rock physicists ◦ Develops computational codes (Code builders)

Sharing experimental data among this community is still in its early days◦ No facilities to transfer experimental data

automatically in real time with their associated description (metadata)

Problem description

Page 4: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Several tools for providing reliable and high performance data transfer capabilities◦ Dropbox or Globus Online

Not optimized for the RP requirements

Problem description

Page 5: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

The RP community will benefit from tool◦ Transfers data and metadata in near-real time ◦ Repository and DB accessible from a website

For experimentalists◦ Collection and comparison of experiments from

many labs For code builders

◦ Find test data for running their models

Hypothesis

Page 6: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Laboratory rock property measurements◦ Properties of the rock sample are studied under

different conditions

High-pressure vessels to apply pore pressures and stresses to cylindrical rock sample

Until the sample has failed, different features (e.g stress, porosity, temperature, etc, ....) are recorded at several time intervals

In each interval, data transferred to a local computer machine (channel. 1 channel per rock)

Laboratory experiments features-I

Page 7: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

RP laboratory experiment

Pressure Vessel UCL- RP Laboratory Rock Samples

Page 8: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Complex laboratory experiment-Creep 2

Initial target: 30 monthsDeploy under the sea- Mediterranean8 rock samples- different featuresDifferent interval of times and data sizes

Page 9: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Each experiment can record data differently◦ Events can be written in a new file or appended◦ Files can be stored in the same directory or not◦ Intervals for writing data can be shorts or long◦ Number of rocks samples could be one or several ◦ Duration of an experiments can be short or long

Data intensive problem for transferring the data

Laboratory experiments features-II

Page 10: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

To transfer RP experimental data from one location to another◦ Automated data transfer until the end-experiment

Transfer experimental data Near real time and non-real time

Synchronization Incremental (File) and Directory

◦ Possible interruptions and fails◦ Record and transfer the metadata

Objective

Page 11: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

FAST: Flexible automated synchronization transfer◦ Data and metadata in real time and non-

real time◦ Incremental (file) and directory sync◦ Selection of the data-transfer protocol◦ Compatible with all O.S◦ Simple to set up and manage◦ Monitors the transmission, detects errors

and recovers from them. ◦ Data collected in a repository, metadata in

DB, and web site for accessing them

Proposal is triggered by our work◦ EFFORT project ◦ Using data provided by the Creep-2 project

Proposal

Page 12: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

File transfer Protocol (FTP)◦ Control and data are un-encrypted◦ Easy to use, lack of security

FTP security extension (FTPS)◦ Control encrypted (TLS or STLS), but data might not be

Secure Copy (SCP)◦ SSH for transferring data and authentication (more secure than previous ones)◦ File transfer only◦ Ideal for quick transfer of single files

SSH File Transfer Protocol (SFTP)◦ Based in SSH-2: best for secure access (packet confirmation)◦ File transfer, creating and delete remote directories and files◦ Directory synchronization,

Rsync◦ Incremental file transfer (delta algorithm)◦ File and directory synchronization◦ Can provide encrypted transfer by using SSH◦ On-the-fly compression option◦ Idea for back-ups

Data transfer protocols- TCP

Page 13: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

UDP-(UDT)◦ UDP protocol for data-intensive applications◦ UDT can transfer data a higher speed than TCP-

based protocols UDT Enabled Rsync (UDR)

◦ Uses Rsync for the transport mechanism (delta)◦ Sends data over the UDT protocolIdeal for large

data over long distance◦ Ideal for large data over long distance

Data transfer protocols- UDP

Page 14: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

GridFTP:◦ HP secure, reliable data rate via high bandwidth◦ many-to-many◦ difficult to use

Globus Online◦ Uses GridFTP protocol◦ Automates the management of files:

monitoring performance, retrying files, recovering from failes◦ Do not support file synchronization.

Dropbox:◦ Centralize cloud storage, file and directory synchronization◦ Rsync-delta protocol◦ Data stored on the Amazon S3 (Third party)◦ One-to-one file transfer

BTSync◦ Decentralized cloud storage, P2P file synchronization (No Third party). ◦ Connecting the devices to communicate with UDP◦ Many-to-many file transfers

WinSCP◦ SFTP and FTP client for Windows

Data transport systems

Page 15: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Data transport systems

Email from Globus Online Support

We recently noticed that you are creating many CLI sessions tocli.globusonline.org, each with a single blocking transfer.  This is asuboptimal way to use Globus Online and in fact is causing us someresource usage issues.

Page 16: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Previous tools◦ Different data-transfer protocols ◦ Some automated data synchronization

No one◦ Select the best protocol depending on requirements◦ Methods for tracking metadata and transferring it

Our work automatically ◦ Selects a protocol among FTPS, SFTP, Rsync, and UDR◦ Injects a minimum of metadata ◦ GridFTP and P2P discarded: communications 1-to-1◦ FTPS instead of using FTP: minimum security level◦ SFTP derives from SCP

Data transport systems

Page 17: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Selecting the best protocol

FTPS, SFTP, Rsync and UDR

Page 18: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Two machines located in Edinburgh◦ VLAN Network 100MB/s

Synthetic program to generate events Data size written to files: 50KB, 500KB,

1MB, 10MB, 100MB, 500MB, 1GB and 10GB. Measures: transfer rate and elapsed

time Repetition: 10 times

Data transfer experiments- Same local network

Page 19: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Data transfer experiments- Same local network

SFTP fastest < 500MBRsync fastest >= 500MB** without compression

Elapsed Time

File Size

Rsync UDR SFTP FTPS

Rsync-c UDR-c

50KB 0 0 0 0 0.1 0.1

500KB 0.2 0.3 0.1 0.2 0.3 0.2

1MB 0.7 0.5 0.3 0.7 0.8 0.8

50MB 4 4 3 4 7 1.05

500MB 39 42 40 43 78 1.05

1GB 78 79 79 82 147 180

10GB 814 845 850 1012 1495 1712

Page 20: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

UDR has been specially designed◦ Large data transfer over long distance

UDR vs Rsync by using two machines◦ Located in different local networks

University of Edinburgh 1GbE AIST-Tsukuba 10GbE

Generated Files: 1MB, 500MB, 1GB, 10GB and 30GB.

Data transfer experiments- Different networks

Page 21: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Data transfer experiments- Different networks

UDR fastest** without compression

Elapsed Time

File size

Rsync UDR Rsync-c UDR-c

1MB 0 0 0 0

500MB 365 20 154 56

1GB 730 37 79 120

10GB 6722 364 3000 1140

30GB 1630 1080 7560 3360

Page 22: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Decision tree

Page 23: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Front-end: GUI using Java SWING Back-end: Decision tree Data and Metadata

◦ Data stored in a remote repository (NAS)◦ Metadata collected in remote database (MySQL)

Science gateway (Web tool) connected with the repository and database◦ Searching◦ Visualizing◦ Analyzing◦ Download

Implementation and evaluation

Page 24: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

User interface – New Experiment

Page 25: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

FAST has been evaluated:◦ By using synthetic programs for generating data

real time and non-real time For each type of synchronization Different data sizes, and different types of network locations Short and Long term experiments Stop and restart

◦ For transferring data from a real rock physic experiment Laboratory- UCL (London) and Edinburgh Days: 45 days Interval: Every minute Rock Samples: 1

Implementation and evaluation

Page 26: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh
Page 27: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

Use FAST in the Creep-2 experiment Implement FAST policies

◦ Data available in the repository for specific users during a reasonable period

Sharing data from many-to-many locations Decision-tree

◦ Automating generation and maintenance◦ Keep up-to-date the by measuring transfers

Use FAST in more rock physics laboratories Use FAST in other disciplines

Future work

Page 28: Rosa Filgueira – University of Edinburgh Iraklis Klamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh

email: [email protected]

Thanks & Questions