web interface to r for high-performance computingnakama.pdf · introduction rdweb system examples...
TRANSCRIPT
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Web Interface to R for High-PerformanceComputing
Junji NAKANO † Ei-ji NAKAMA‡
†The Institute of Statistical Mathematics , Japan
‡COM-ONE Ltd., Japan
The R User Conference 2009July 8-10, Agrocampus-Ouest, Rennes, France
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
1 Introduction
2 Rdweb system
3 Examples of execution
4 Installing Rdweb
5 Concluding remarks
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
R and requirement for huge calculation
R: a free software environment for statistical computing andgraphics for
statisticians to implement new statistical methodspractitioners to analyze real data sets in various fields
Recently, both users require huge amount of calculation for theirown purposes
Parallel computing
is a practical method for realizing huge calculationby executing calculations on several computers and/or many CPUcores at the same time
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Parallel computing techniques on R
Parallel BLAS (Basic Linear Algebra Subprograms) using threads
ATLASFree parallel and optimized BLASGotoBLASFastest parallel and optimized BLASIntel MKL, AMD ACMLParallel and optimized BLAS provided by venders
MPI type libraries for R using clustered computers
Rpvman R interface to PVM (Parallel Virtual Machine)Rmpian R interface to MPI (Message Passing Interface)
snow (Simple Network of Workstations)
A package for realizing parallel computing by parallel apply functionsUsing lower level parallel libraries such as Socket, MPI, PVM, nwsfor transferring data among processesAs it conceals difference of lower level libraries, it is easy to use forparallel computing.
multicoreRunning parallel computations in R on machines with multiple coresor CPUs....
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Existing Web environments for R
RwebA Web based interface to R for submitting the code
RpadA workbook-style user interface to R through a Web browser
rapacheEmbedding R in the Apache Web server
RserveTCP/IP server that allows other programs to use facilities of R
RWebServicesExposing R functions as Web services through Java/Axis/Apache
...
Parallel computing is not the main concern of these programs.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Supercomputers in ISM
We have three supercomputer systems in the Institute of StatisticalMathematics (ISM), Japan. (We will replace them next year.)
Present supercomputers provide parallel computing facilities.
We use R on our supercomputers.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Our problems
Troubles
Each supercomputer uses different(Unix-like) environment.
Unix-like environments are not easy touse for novices.
Several parameters for parallelcomputing need to be specifieddifferently for each supercomputer.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Our solution
R script edit file transfer job resource management
Approach: Web interface
We have made “Rdweb”, a Web interfaceto R for using parallel computing functionsin R
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Structure of Rdweb
Rdweb (R daemon for Web) system consists of threecomponents:
Web interface (via Web browser on user’s computer)It is rather simple and programmed by HTML andJavaScript.JavaScript is used to assist users’ input slightly.Web server (on Rdweb gateway computer)It is a CGI program for authentication, file transfer, jobcontrol (start, stop and check), creation of JCL(JobControl Language) script and scattering the programto remote computers as a client of RdaemonRdaemon (on the front-end computer of clustersystem)It checks authentication, transfers required files, startsand ends jobs, and shows the status.
Web server
HTTP server
CGI made of perl
client
browser
user interface
data file
authorization
number of snow Slave
parallel number of BLAS
R machine
Rdaemon
NISor
PAMor
CRYPTbatch system
R Master
RSlave
RSlave
RSlave
RSlave
HTTP
TCP/10024
job control
R program
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Characteristics of Rdweb
Rdweb is designed for supercomputers and personal PC clustersystems.
Above stated three components of Rdweb and R slaves can reside ondifferent or same computers.
Text-based Web browsers can be used (with a little limitation).
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Rdweb on supercomputers in ISM
Shared-Memory
Ope
nPB
S
SGI ALTIX 350
SGI ALTIX 3700
LAMMPI
node 4
node 3
node 2
node 1
Web Server
front end
Rdaemon
Apache
CGI
R Master
R Slave 01
R Slave 03
R Slave ..
R Slave 60
R Slave 02
R Slave 04
TCP:10024
Physical random
number server
R Slave 61
R Slave 62
R Slave 63
SGI ALTIX front end Itanium2 8CPU32GB memory back end
Itanium2 64CPU / node512GB memory / node
total 4 nodes
Distributed-Memory
LFS
+ S
LUR
M
Web Server HP XC4000 Cluster
HPMPI
front end
Rdaemon
Apache
CGI
node 1
R Master
R Slave
node 2
R Slave
node ...
R Slave
node 127
R Slave
node 128
R Slave
R Slave
R Slave
R Slave
R Slave
TCP:10024
Physical random
number server
HP XC4000 ClusterOpteron252 2CPU / node
2GB or 4GB memory / nodetotal 128 nodes
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Differences between Rweb and Rdweb
From the user side, Rdweb is similar to Rweb.Rdweb can control system resources such as user, CPU, memory andqueue. Although Rweb does not allow the use of “system” commandfrom the security reason, Rdweb does not have such limitation becauseRdweb has rigid authentication mechanism.
Rweb and Rdweb
Rweb RdwebAuthentication none PAM, NIS or Unix passwardFile upload one file A lot of filesControl of parallel BLAS impossible Each sessionControl of snow impossible Each session
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Authentication of Rdweb (1) - Web server
Rdweb adopts two authentication stages. First stage utilizes Web serverauthentication mechanism when the user is connected to the Web serveron the gateway computer. The mechanism is realized by mod auth pamof Apache.
sites-enabled
<Directory ‘‘/www/’’>Options ....AllowOverride NoneOrder allow,denyAllow from allAuthPAM_Enabled onAuthType BasicAuthName "Rdweb User Login"Require valid-user</Directory>
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Authentication of Rdweb (2) - Rdaemon
As second stage of Rdweb authentication, Rdaemon utilizesauthentication methods such as PAM (recommended), NIS and Unixpassword. We can select one of them when we compile Rdweb system.
Cookie must be enabled in the Web browser for Web interface of Rdweb.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
PAM authentication
PAM (Pluggable AuthenticationModules) is the API forauthentication used in Linux,Solaris, MacOSX and AIX (5.3 orlater).
PAM uses NIS or LDAP or Unixpassword.
If PAM is not available, NIS orUnix password can be directly usedfor authentication in Rdaemon.
Web Server
Cluster System
ApplicationPAM API
login telnet Rdaemon
PAM Librarypam.conf
PAM Service Modules
LDAP NISUnix
password
Rdweb
TCP10024
browser HTTPS
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Location of files
“Rdweb” directory is created in the home directory on the front-end.
Directory for execution is˜/Rdweb/
Uploaded files are also stored in˜/Rdweb/
Logs and scripts are stored in˜/Rdweb/YYYYMMDD hhmmss/where YYYYMMDD hhmmss shows year, month, day, hour, minuteand second, according to the ISO-8601 date format.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Uploading files
To upload data and/or program files, we click “Choose” button,select a file, and click “upload” button.
These operations can be repeated without affecting edited script andother functions.
SCP or SFTP clients such as Filezilla client are recommended foruploading large files because HTTP upload sometimes causestimeout and stops.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Preparing data and program
By using a text editor, we prepare the following data file.
HW.csv
height,weight1.70,651.85,801.75,86
Save this file as “HW.csv”.We also prepare R program
BMI.R
BMI<-function(H,W){
W/H^2}
and save it as “BMI.R”.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Input
Upload two files “HW.csv” and “BMI.R”. Then input the following Rprogram
input text area
HW<-read.csv("HW.csv")source("BMI.R")HWB <- cbind(HW,BMI=BMI(HW$height,HW$weight))HWBplot(HWB)
in the editor area of Web interface which is connected to Rdweb gateway.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Execution
Job is started by clicking “Execute” button. Job status is shown in “JOBInformation”.Job information is refreshed by clicking “Refresh” button or top title.
Results of calculation are stored as files with extensions .Rout (textformat) and .pdf (pdf Graphics).
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Use of snow
Usually in R, we have to specify the number of processes differentlyaccording to the cluster type.
makeCluster normal# SOCK clustercl <- makeCluster(c("hostname1","hostname2"))# MPI cluster with 2 slave processescl <- makeCluster(2)
We add new function “setDefaultClusterOptions” to use parametersgiven in the Web interface in the same way for all cluster types.
makeCluster Rdweb
cl <- makeCluster(getClusterOption("spec"))
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Selection of parameters for parallel computing
We need to select queue, number of slave processes, number of threads ofparallel BLAS, and cluster type by using pull-down menus in this order.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Execution
Job is started by clicking “Execute” button.
Creation of new result files is shown by clicking “Refresh” button.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Batch system
Rdweb requires a batch system. Several batch systems are available.
at, batchStandard batch system of Unix specified in XPG4 (X/Openportability guide Ver.4). It has simple queue mechanism.
OpenPBS (NASA etc.)Queuing and scheduling control system for cluster systems.Development stopped in 1998.
Torque (Cluster Resource Inc.)Free system based on Open PBS
Load Leveler(IBM)Batch system by a vender
LSF (Platform Computing Inc.)Commercial job controlling tool
SLURMFree resource control utility
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Platforms
Rdweb should work on almost all Unix-like OSs.We have checked the following systems in ISM and COM-ONE.
MPI OS BATCH SYSTEM
HP-MPI Linux LSF + slurmLAM-MPI Linux TorqueOpenMPI Linux TorqueLAM-MPI Linux OpenPBSLAM-MPI Linux atLAM-MPI Solaris atLAM-MPI AIX LoadLevelerLAM-MPI MacOSX at
Note: Installation of these batch systems is sometimes complicated.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Installation
We keep source codes of Rdweb athttp://prs.ism.ac.jp/~nakama/rdweb/
Required installation procedure
Prepare the skeleton of the shell file to a front-endDefine the system information on Web serverThey depend heavily on the cluster system. Details of the settinginformation can be seen in “README” file in Rdweb archive.
We put required packages for Debian GNU/Linux (Lenny) athttp://prs.ism.ac.jp/~nakama/debian/lenny-ism/. Theyinclude helper packages for GotoBLAS, Torque, and packages oflam-mpi and openmpi for Torque. (Unfortunately, these are stillbuggy.)
.
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Examples in ISM (1)
Ope
nPB
S
SGI ALTIX 350
SGI ALTIX 3700
LAMMPI
node 4
node 3
node 2
node 1
Web Server
front end
Rdaemon
Apache
CGI
R Master
R Slave 01
R Slave 03
R Slave ..
R Slave 60
R Slave 02
R Slave 04
TCP:10024
Physical random
number server
R Slave 61
R Slave 62
R Slave 63
SGI ALTIX front end Itanium2 8CPU32GB memory back end
Itanium2 64CPU / node512GB memory / node
total 4 nodes
Load
Leve
ler
Hitachi SR11000
node 4
node 3
node 2
Web Server
node1
LAMMPIRdaem
in
Apache
CGIR Master
Physical random
number server
Hitachi SR11000
Power4+ 16CPU / node32GB memory / node
total 4 nodes
LAMMPI
LAMMPI
LAMMPI
R Slave
R Master
R Slave
R Master
R Slave
R Master
R Slave
TCP:10024
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Examples in ISM (2)
LFS
+ S
LUR
M
Web Server HP XC4000 Cluster
HPMPI
front end
Rdaemon
Apache
CGI
node 1
R Master
R Slave
node 2
R Slave
node ...
R Slave
node 127
R Slave
node 128
R Slave
R Slave
R Slave
R Slave
R Slave
TCP:10024
Physical random
number server
HP XC4000 ClusterOpteron252 2CPU / node
2GB or 4GB memory / nodetotal 128 nodes
Torq
ue
VXPRO R1400Web Server
LAMMPI
node 2
node1Rdaem
on
Apache
CGIR Master
Physical random
number server
VXPRO R1400
XEON E5430 8CPU / node16GB memory 2 nodes32GB memory 2 nodes
R Slave 01TCP:10024
R Slave ..
R Slave 08
R Slave 09
R Slave ..
R Slave 16
Node 3
node 4
R Slave 17
R Slave ..
R Slave 24
R Slave 25
R Slave ..
R Slave 32
Introduction Rdweb system Examples of execution Installing Rdweb Concluding remarks
Concluding remarks
Advantages of Rdweb
Novices can use parallel execution functions of with less efforts.Number of parallel execution can be specified easily for parallelBLAS and snow.Secure authentication is available by PAM which can use LDAP orNIS.
Disadvantages of present Rdweb
System installation is complicatedand completely platform dependent
Future work
Encrypting communication between Web server and RdaemonPorting to various R
R with many BLASsR compiled by several compilersR on many OSs