Development of a Compact Cluster with Embedded CPUs
Development of a Compact Cluster with Embedded CPUs
Sritrusta Sukaridhoto, Yoshifumi Sasaki,
Koichi Ito and Takafumi Aoki
Development of a Compact Cluster with Embedded CPUs
Introduction
Ubiquitous environment Electronics equipment in our
surrounding ( equipped with
embedded CPU ) Connected by network
http://www.kayoo.org/home/mext/joho-kiki/
http://bsc.jp.yamatake.com/products/secu_ftouchm.html
Network
NavigationSystem
Mobiles
Home Automation
EmbeddedCPU
Distributed cooperation / parallel processing
EmbeddedCPU
EmbeddedCPU
Development of a Compact Cluster with Embedded CPUs
Introduction
Development
Prototyping Environment
Embedded CPUs
Ubiquitous
Distributed / Parallel Computing
Ubiquitous Computing Cluster (UCC)
Low Cost
Development of a Compact Cluster with Embedded CPUs
Contents
Introduction Cluster Computer Structures Implementations Performance Evaluations Application: Fingerprint Verifications Conclusions and Future Plans
Development of a Compact Cluster with Embedded CPUs
Cluster Computer Structures
HUB
Terminal
100Mbps LAN Connections
Node#3
Node#2
Node#1
Node#0
Embedded DevicesPower consumption: 60W (TYP)
Ubiquitous Computing Cluster Hardware
Development of a Compact Cluster with Embedded CPUs
Cluster Computer Structures
Specification of Calculation Node Embedded Network Attached Storage (NAS)
Include: Embedded CPU ( SH4 ), memory , USB I/F , network
I/F , HDD. Able to act as network computer
Logically have function as general computer . Small space, low power consumption
CPU SH4 (SH7751R, 266MHz)Memory 64MB SDRAMHDD 120GB, ATA133, 5400rpmNIC 10/100 BASE-T (RTL-8139C+)I/F USB 2.0×2portPower 14W(TYP)
Development of a Compact Cluster with Embedded CPUs
Cluster Computer Structures
Ubiquitous Computing Cluster Software
Operating System Debian GNU Linux 2.4.21 for SH4
Stable inter-processor communication
Compact kernel and daemons
Servers and daemons Inter-processor communication ( rsh, rexec, rcp ) login ( telnet ), file transfer ( FTP ) Network File sharing ( NFS ), Network Information Services (NIS)
Development environment compiler (GNU gcc-3.0.4, g++, Fortran77)
editor (GNU Emacs, vi)
Parallel process interface (MPI, PVM)
Development of a Compact Cluster with Embedded CPUs
Cluster Computer Structures
How it works ???UCC
Node#3
HUB
Node#2
Node#1
Node#0
Inter-process communication(rsh )
Login, File Transfer( telnet, FTP )
Terminal
Node#0 is also working as administrator server (NIS, NFS )
100Mbps Fast Ethernet
Development of a Compact Cluster with Embedded CPUs
Cluster Computer Structures
Computing node is
embedded CPU Suitable for prototyping the
next generation computer
Using COTS product Low cost system
Using Linux as Operating
System A stable inter-processor
communication
Open Source
COTS: Commercial Off-The-Shelf
Embedded Devices
UCCNode#3
HUBNode#2
Node#1
Terminal
100Mbps Fast Ethernet
Node#0
Development of a Compact Cluster with Embedded CPUs
Features
Size : 390mm×280mm×150mm , Power Consumption : 60W(TYP)
Development of a Compact Cluster with Embedded CPUs
Implementation
UCCNode#3
HUBNode#2
Node#1
Terminal
Node#0
Login to Node#0
Write a parallel program: hello.c#include "mpi.h"#include <stdio.h>
void main(int argc, char *argv[]){ int numprocs, prognum; /* Initialize MPI */ MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &procnum); MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
printf ("Hello world! from %d of %d", procnum, numprocs);
MPI_Finalize(); return;}
Compile: mpicc –o hello hello.c
Run: mpirun –np 4 hello
Hello w
orld!
from
0 of
4
Hello w
orld!
from
1 of
4
Hello w
orld!
from
2 of
4
Hello w
orld!
from
3 of
4
Development of a Compact Cluster with Embedded CPUs
Advance Application USB port connect to another devices (Fingerprint sensor, USB-Audio, Camera, etc)
Fingerprint Verification System
Speech / VoiceRecognition System
Image Processing System
Development of a Compact Cluster with Embedded CPUs
Performance Evaluations
Pallas MPI Benchmark (PMB)* :Performance evaluations for MPI communication
Ping-Pongmeasuring delay time when transferring data between 2 processors
Broadcastmeasuring the biggest delay time when transferring from node#0 to
the other nodes.
*http://www.pallas.com/e/products/pmb/
Development of a Compact Cluster with Embedded CPUs
Performance Evaluations (cont.)
ping-pong communication test The biggest transfer ability is 70 Mbps
It gives enough performance using 100 Mbps ethernet
0
200
400
600
800
1000
0 4 32 256
2048
16384
131072
1048576
Data Size [Bytes]
Tim
e [m
sec]
0
2
4
6
8
10
Ban
dwid
th [M
byte
s/se
c]Time
Bandwidth
Development of a Compact Cluster with Embedded CPUs
Performance Evaluations (cont.)
Communication broadcast test The biggest broadcast communication ability is around 36Mbps
It gives enough performance using ordinary HUB
0
200
400
600
800
1000
0 4 32 256
2048
16384
131072
1048576
Data Size [Bytes]
Tim
e [m
sec]
0
2
4
6
8
10
Ban
dwid
th [M
byte
s/se
c]Time
Bandwidth
Development of a Compact Cluster with Embedded CPUs
Application: Fingerprint Verifications
Distributed processing for verifying the fingerprint in a
database.
Fingerprint matching algorithm→ POCNode#3
Node#2
Node#1
Node#0
FingerprintSensor Fingerprint matching in each node using POC
Registered Fingerprint
Input Fingerprint
Development of a Compact Cluster with Embedded CPUs
What is Phase-Only Correlation (POC) ? correlation using only image phase component according to similarity degree of the image, sharp peak produced Algorithm based on signal processing
Standard image
Input image 2
DFT
DFT
DFT
amplitude
Example of Phase-Only Correlation
Input image 1
DFT
DFT
DFT
phase
phase
correlation
correlation
phase
amplitude
amplitude
Development of a Compact Cluster with Embedded CPUs
Fingerprint Matching Algorithm Using POC Function
128×128FFT
phasing
Registered fingerprint( phase )
×
IFFT
Peak extraction
Peak comparison
#Node 0Fingerprint image
Check result
× IFFT
Peak extraction
#Node 3
Registered fingerprint( phase )
× IFFT
Peak extraction
#Node 2
Registered fingerprint( phase )
× IFFT
Peak extraction
#Node 1
Registered fingerprint( phase )
Development of a Compact Cluster with Embedded CPUs
Fingerprint Verification Performance Evaluation The number of computing nodes : 1, 2, or 4 (can be changed ) The number of registered finger print: 12 (3 images per node ) Evaluation of the matching time on input fingerprint
and registered fingerprint
Registered finger print
Node #0
Node #1
Node #2
Node #3
Image from sensor
Development of a Compact Cluster with Embedded CPUs
Result
Processing time: less than 2 seconds with 4 nodes.
Enough performance with embedded CPUs
00.5
11.5
22.5
33.5
1 2 4Processor Nodes
Exec
utio
n tim
e[se
c]
0
0.5
1
1.5
2
Spe
edup
Fac
tor
Execution Time Speedup Factor
Development of a Compact Cluster with Embedded CPUs
Conclusion
Development of a Ubiquitous Computing
Cluster with embedded CPUsWorld smallest cluster computer in size, power
consumption and cost
Suitable for prototyping the next generation
ubiquitous application
Application: Fingerprint verificationperformance evaluation shows satisfactory result
UCC is capable for advance applications
Development of a Compact Cluster with Embedded CPUs
Future Plans
Ubiquitous Computing Cluster
Fingerprintverification
Computer vision
Voice/Speech recognition
Next Generation Ubiquitous Application
Cipher
Fault tolerantsystem
Image tracking
RedundantServer system
RFID certificationRobot System
Face recognition
Development of a Compact Cluster with Embedded CPUs
Thank you