web interface to r for high-performance computingnakama.pdf · introduction rdweb system examples...

29
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks Web Interface to R for High-Performance Computing Junji NAKANO Ei-ji NAKAMA The Institute of Statistical Mathematics , Japan COM-ONE Ltd., Japan The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France

Upload: phamquynh

Post on 26-May-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Web Interface to R for High-PerformanceComputing

Junji NAKANO † Ei-ji NAKAMA‡

†The Institute of Statistical Mathematics , Japan

‡COM-ONE Ltd., Japan

The R User Conference 2009July 8-10, Agrocampus-Ouest, Rennes, France

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

1 Introduction

2 Rdweb system

3 Examples of execution

4 Installing Rdweb

5 Concluding remarks

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

R and requirement for huge calculation

R: a free software environment for statistical computing andgraphics for

statisticians to implement new statistical methodspractitioners to analyze real data sets in various fields

Recently, both users require huge amount of calculation for theirown purposes

Parallel computing

is a practical method for realizing huge calculationby executing calculations on several computers and/or many CPUcores at the same time

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Parallel computing techniques on R

Parallel BLAS (Basic Linear Algebra Subprograms) using threads

ATLASFree parallel and optimized BLASGotoBLASFastest parallel and optimized BLASIntel MKL, AMD ACMLParallel and optimized BLAS provided by venders

MPI type libraries for R using clustered computers

Rpvman R interface to PVM (Parallel Virtual Machine)Rmpian R interface to MPI (Message Passing Interface)

snow (Simple Network of Workstations)

A package for realizing parallel computing by parallel apply functionsUsing lower level parallel libraries such as Socket, MPI, PVM, nwsfor transferring data among processesAs it conceals difference of lower level libraries, it is easy to use forparallel computing.

multicoreRunning parallel computations in R on machines with multiple coresor CPUs....

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Existing Web environments for R

RwebA Web based interface to R for submitting the code

RpadA workbook-style user interface to R through a Web browser

rapacheEmbedding R in the Apache Web server

RserveTCP/IP server that allows other programs to use facilities of R

RWebServicesExposing R functions as Web services through Java/Axis/Apache

...

Parallel computing is not the main concern of these programs.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Supercomputers in ISM

We have three supercomputer systems in the Institute of StatisticalMathematics (ISM), Japan. (We will replace them next year.)

Present supercomputers provide parallel computing facilities.

We use R on our supercomputers.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Our problems

Troubles

Each supercomputer uses different(Unix-like) environment.

Unix-like environments are not easy touse for novices.

Several parameters for parallelcomputing need to be specifieddifferently for each supercomputer.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Our solution

­ R script edit­ file transfer­ job resource management

Approach: Web interface

We have made “Rdweb”, a Web interfaceto R for using parallel computing functionsin R

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Structure of Rdweb

Rdweb (R daemon for Web) system consists of threecomponents:

Web interface (via Web browser on user’s computer)It is rather simple and programmed by HTML andJavaScript.JavaScript is used to assist users’ input slightly.Web server (on Rdweb gateway computer)It is a CGI program for authentication, file transfer, jobcontrol (start, stop and check), creation of JCL(JobControl Language) script and scattering the programto remote computers as a client of RdaemonRdaemon (on the front-end computer of clustersystem)It checks authentication, transfers required files, startsand ends jobs, and shows the status.

Web server

HTTP server

CGI made of perl

client

browser

user interface

data file

authorization

number of snow Slave

parallel number of BLAS

R machine

Rdaemon

NISor

PAMor

CRYPTbatch system

R Master

RSlave

RSlave

RSlave

RSlave

HTTP

TCP/10024

job control

R program

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Characteristics of Rdweb

Rdweb is designed for supercomputers and personal PC clustersystems.

Above stated three components of Rdweb and R slaves can reside ondifferent or same computers.

Text-based Web browsers can be used (with a little limitation).

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Rdweb on supercomputers in ISM

Shared-Memory

Ope

nPB

S

SGI ALTIX 350

SGI ALTIX 3700

LAM­MPI

node 4

node 3

node 2

node 1

Web Server

front end

Rdaemon

Apache

CGI

R Master

R Slave 01

R Slave 03

R Slave ..

R Slave 60

R Slave 02

R Slave 04

TCP:10024

Physical random

 number server

R Slave 61

R Slave 62

R Slave 63

SGI ALTIX­­­ front end ­­­Itanium2 8CPU32GB memory­­­ back end ­­­

Itanium2 64CPU / node512GB memory / node

total 4 nodes

Distributed-Memory

LFS

 + S

LUR

M

Web Server HP XC4000 Cluster

HP­MPI

front end

Rdaemon

Apache

CGI

node 1

R Master

R Slave

node 2

R Slave

node ...

R Slave

node 127

R Slave

node 128

R Slave

R Slave

R Slave

R Slave

R Slave

TCP:10024

Physical random

 number server

HP XC4000 ClusterOpteron252 2CPU / node

2GB or 4GB memory / nodetotal 128 nodes

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Differences between Rweb and Rdweb

From the user side, Rdweb is similar to Rweb.Rdweb can control system resources such as user, CPU, memory andqueue. Although Rweb does not allow the use of “system” commandfrom the security reason, Rdweb does not have such limitation becauseRdweb has rigid authentication mechanism.

Rweb and Rdweb

Rweb RdwebAuthentication none PAM, NIS or Unix passwardFile upload one file A lot of filesControl of parallel BLAS impossible Each sessionControl of snow impossible Each session

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Authentication of Rdweb (1) - Web server

Rdweb adopts two authentication stages. First stage utilizes Web serverauthentication mechanism when the user is connected to the Web serveron the gateway computer. The mechanism is realized by mod auth pamof Apache.

sites-enabled

<Directory ‘‘/www/’’>Options ....AllowOverride NoneOrder allow,denyAllow from allAuthPAM_Enabled onAuthType BasicAuthName "Rdweb User Login"Require valid-user</Directory>

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Authentication of Rdweb (2) - Rdaemon

As second stage of Rdweb authentication, Rdaemon utilizesauthentication methods such as PAM (recommended), NIS and Unixpassword. We can select one of them when we compile Rdweb system.

Cookie must be enabled in the Web browser for Web interface of Rdweb.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

PAM authentication

PAM (Pluggable AuthenticationModules) is the API forauthentication used in Linux,Solaris, MacOSX and AIX (5.3 orlater).

PAM uses NIS or LDAP or Unixpassword.

If PAM is not available, NIS orUnix password can be directly usedfor authentication in Rdaemon.

Web Server

Cluster System

ApplicationPAM API

login telnet Rdaemon

PAM Librarypam.conf

PAM Service Modules

LDAP NISUnix

password

Rdweb

TCP10024

browser HTTPS

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Location of files

“Rdweb” directory is created in the home directory on the front-end.

Directory for execution is˜/Rdweb/

Uploaded files are also stored in˜/Rdweb/

Logs and scripts are stored in˜/Rdweb/YYYYMMDD hhmmss/where YYYYMMDD hhmmss shows year, month, day, hour, minuteand second, according to the ISO-8601 date format.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Uploading files

To upload data and/or program files, we click “Choose” button,select a file, and click “upload” button.

These operations can be repeated without affecting edited script andother functions.

SCP or SFTP clients such as Filezilla client are recommended foruploading large files because HTTP upload sometimes causestimeout and stops.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Preparing data and program

By using a text editor, we prepare the following data file.

HW.csv

height,weight1.70,651.85,801.75,86

Save this file as “HW.csv”.We also prepare R program

BMI.R

BMI<-function(H,W){

W/H^2}

and save it as “BMI.R”.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Input

Upload two files “HW.csv” and “BMI.R”. Then input the following Rprogram

input text area

HW<-read.csv("HW.csv")source("BMI.R")HWB <- cbind(HW,BMI=BMI(HW$height,HW$weight))HWBplot(HWB)

in the editor area of Web interface which is connected to Rdweb gateway.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Execution

Job is started by clicking “Execute” button. Job status is shown in “JOBInformation”.Job information is refreshed by clicking “Refresh” button or top title.

Results of calculation are stored as files with extensions .Rout (textformat) and .pdf (pdf Graphics).

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Use of snow

Usually in R, we have to specify the number of processes differentlyaccording to the cluster type.

makeCluster normal# SOCK clustercl <- makeCluster(c("hostname1","hostname2"))# MPI cluster with 2 slave processescl <- makeCluster(2)

We add new function “setDefaultClusterOptions” to use parametersgiven in the Web interface in the same way for all cluster types.

makeCluster Rdweb

cl <- makeCluster(getClusterOption("spec"))

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Selection of parameters for parallel computing

We need to select queue, number of slave processes, number of threads ofparallel BLAS, and cluster type by using pull-down menus in this order.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Execution

Job is started by clicking “Execute” button.

Creation of new result files is shown by clicking “Refresh” button.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Batch system

Rdweb requires a batch system. Several batch systems are available.

at, batchStandard batch system of Unix specified in XPG4 (X/Openportability guide Ver.4). It has simple queue mechanism.

OpenPBS (NASA etc.)Queuing and scheduling control system for cluster systems.Development stopped in 1998.

Torque (Cluster Resource Inc.)Free system based on Open PBS

Load Leveler(IBM)Batch system by a vender

LSF (Platform Computing Inc.)Commercial job controlling tool

SLURMFree resource control utility

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Platforms

Rdweb should work on almost all Unix-like OSs.We have checked the following systems in ISM and COM-ONE.

MPI OS BATCH SYSTEM

HP-MPI Linux LSF + slurmLAM-MPI Linux TorqueOpenMPI Linux TorqueLAM-MPI Linux OpenPBSLAM-MPI Linux atLAM-MPI Solaris atLAM-MPI AIX LoadLevelerLAM-MPI MacOSX at

Note: Installation of these batch systems is sometimes complicated.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Installation

We keep source codes of Rdweb athttp://prs.ism.ac.jp/~nakama/rdweb/

Required installation procedure

Prepare the skeleton of the shell file to a front-endDefine the system information on Web serverThey depend heavily on the cluster system. Details of the settinginformation can be seen in “README” file in Rdweb archive.

We put required packages for Debian GNU/Linux (Lenny) athttp://prs.ism.ac.jp/~nakama/debian/lenny-ism/. Theyinclude helper packages for GotoBLAS, Torque, and packages oflam-mpi and openmpi for Torque. (Unfortunately, these are stillbuggy.)

.

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Examples in ISM (1)

Ope

nPB

S

SGI ALTIX 350

SGI ALTIX 3700

LAM­MPI

node 4

node 3

node 2

node 1

Web Server

front end

Rdaemon

Apache

CGI

R Master

R Slave 01

R Slave 03

R Slave ..

R Slave 60

R Slave 02

R Slave 04

TCP:10024

Physical random

 number server

R Slave 61

R Slave 62

R Slave 63

SGI ALTIX­­­ front end ­­­Itanium2 8CPU32GB memory­­­ back end ­­­

Itanium2 64CPU / node512GB memory / node

total 4 nodes

Load

Leve

ler

Hitachi SR11000

node 4

node 3

node 2

Web Server

node1

LAM­MPIRdaem

in

Apache

CGIR Master

Physical random

 number server

Hitachi SR11000

Power4+ 16CPU / node32GB memory / node

total 4 nodes

LAM­MPI

LAM­MPI

LAM­MPI

R Slave

R Master

R Slave

R Master

R Slave

R Master

R Slave

TCP:10024

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Examples in ISM (2)

LFS

 + S

LUR

M

Web Server HP XC4000 Cluster

HP­MPI

front end

Rdaemon

Apache

CGI

node 1

R Master

R Slave

node 2

R Slave

node ...

R Slave

node 127

R Slave

node 128

R Slave

R Slave

R Slave

R Slave

R Slave

TCP:10024

Physical random

 number server

HP XC4000 ClusterOpteron252 2CPU / node

2GB or 4GB memory / nodetotal 128 nodes

Torq

ue

VXPRO R1400Web Server

LAM­MPI

node 2

node1Rdaem

on

Apache

CGIR Master

Physical random

 number server

VXPRO R1400

XEON E5430 8CPU / node16GB  memory ­ 2 nodes32GB memory ­ 2 nodes

R Slave 01TCP:10024

R Slave ..

R Slave 08

R Slave 09

R Slave ..

R Slave 16

Node 3

node 4

R Slave 17

R Slave ..

R Slave 24

R Slave 25

R Slave ..

R Slave 32

Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks

Concluding remarks

Advantages of Rdweb

Novices can use parallel execution functions of with less efforts.Number of parallel execution can be specified easily for parallelBLAS and snow.Secure authentication is available by PAM which can use LDAP orNIS.

Disadvantages of present Rdweb

System installation is complicatedand completely platform dependent

Future work

Encrypting communication between Web server and RdaemonPorting to various R

R with many BLASsR compiled by several compilersR on many OSs