sinnott paper
DESCRIPTION
An Introduction to eScience and the Grid by Prof. Richard Sinnott.TRANSCRIPT
The e-Context of ENROLLER
Prof Richard O. SinnottTechnical Director National e-Science Centre
16th April 2010
e-Science and e-Research• Goal: to enable better research in all disciplines• Method: Develop collaboration supported by advanced
distributed computation– to generate, curate and analyse rich data resources
• From experiments, observations and simulations• Quality management, preservation and reliable evidence
– to develop and explore models and simulations• Computation and data at all scales• Trustworthy, economic, timely and relevant results
– to enable dynamic distributed collaboration• Facilitating collaboration with information and resource sharing• Security, trust, reliability, accountability, manageability and agility
The challenge is to develop an integrated approach to all three
Often realised through Grids and Grid infrastructures
The Grid Context• There are many Grids
– Data Grids, Compute Grids, Information Grids, Enterprise Grids, …• There are many ways to build Grids
– Grid middleware (many flavours), – Web services, – Clouds, – Web2.0, – internet computing, …
• There are many moving targets– changing middleware, changing standards, changing sciences, changing resources,
new questions, new funding streams…• There has been a lot of hype• There has been a lot of money invested• There are lots of projects and big scientific challenges• There is an urgent need to build user communities• There needs to have much more research pull than middleware push
– … there are many more things that could go here!
UK e-Science Core Programme• Major cross council initiative
– AHRC, BBSRC, EPSRC, ESRC, MRC, NERC, PPARC/STFC, …
• Over £250m funding over 7-8 years from 2001 – Does not include industry monies from
• Department of Trade and Industry • Technology Strategy Board• Europe• JISC• Regional development agencies• …
• Programme now completed and reviews/planning for future government spending in this area on-going
CeSC (Cambridge)
e-Science Institute
e-Science in the UK
Grid Operations
SupportCentre
National Institutefor Environmental
e-Science
NeSC4th Phase Platform
Grant
Core NGS Nodes +HECTOR
+partners/affiliates(HECTOR
investment £113m)
Digital Curation Centre
Digital Curation Centre
Digital Curation Centre
Digital Curation Centre
OMII-UKOMII-UKOMII-UK
NERCe-Science
Centre
NationalCentre for
Text Mining
NationalCentre fore-SocialScience
Software Sustainability
Institute
Core NGS Nodes +HECTOR
+partners/affiliates(HECTOR
investment £113m)
Core NGS Nodes +HECTOR
+partners/affiliates(HECTOR
investment £113m)
NationalCentre fore-SocialScience
NationalCentre fore-SocialScience
National Data Centres+ UK Federation
+ International dimensionincluding EGEE/EGI
+ SuperJanet+ Training/Education
+…
NeSC Background• E-Science Hub– Externally
• Glasgow end of NeSC– Involved in numerous UK wide activities/projects
– Internally • Focal point for e-Science research/activities at Glasgow• Work closely with foundation departments
– Department of Computing Science» Established first UK Grid Computing course
– Department of Physics & Astronomy• Also working with other groups including
– Bioinformatics Research Centre, – Biostatistics– Electronics and Electrical Engineering– Dept of Public Health, Dept. of Pathology,– Dept. of English, Arts & Humanities, – University Services,– Clinicians & numerous hospitals across Scotland,
» Yorkhill, Royal Infirmary, Western General, Southern General …
– NeSC GU now part of University IT Services
J. Jiang Chris Bayliss
David Martin(ScotGrid sys-admin)
C. MillarGordon Stewart
J.Mohammad(PhD)
T.DohertyVPman
S. Hussain(PhD)
M. Sarwar(ENROLLER)
Nurazian Mior Dahalan (PhD)
CameraShy
NeSC Glasgow Projects• National e-Science Centre (NeSC-I, NeSC-II, NeSC-III) • Dynamic Virtual Organisations for e-Science Education (DyVOSE)• Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES)• Grid Enabled Microarray Expression Profile Search (GEMEPS) • GridNet• Glasgow early adoption of Shibboleth (GLASS) • Joint Data Standards Survey (JDSS) • ESP-Grid• GridNet-2 • HPC Compute cluster award• Sun industrial sponsorship • OGC Collision • OMII-Security Portlets• OMII-RAVE• Integrating VOMS and PERMIS for Superior Grid Authorization (VPman)• NCeSS Technical Management • CESSDA PPP• Pharming of Therapeutic RNA• Grid Enabled Occupational Data Environment (GEODE)• Towards an e-Infrastructure for e-Science Digital Repositories• Grid enabled Biochemical Pathway Simulator• Virtual Organisations for Trials and Epidemiological Studies (VOTES)• Towards a European e-Infrastructure for e-Science Repositories• Modelling, Inference and Analysis for Biological Systems up to the Cellular Level• Drug Discovery Portal• Advanced Grid Authorisation through Semantic Technologies (AGAST)• ShinTau (Supporting Multiple Shibboleth Attribute Authorities)• Grid-enabled Virtual Safe Settings – Security & the State of the Nation
• Scottish Bioinformatics Research Network (SBRN) • Generation Scotland Scottish Family Health Study • Meeting the Design Challenges of nanoCMOS Electronics
(nanoCMOS)• EU FW7 Avert-IT• EU FW7 EuroDSD• Breast Cancer Tissue Biobank• Data Management through e-Social Science (DAMES)• NeSC Research Platform (NRP)• NeSC Information Network (NIN)• European Network for Study of Adrenal Tumors• Scottish Health Informatics Platform for Research (SHIP)• National E-Infrastructure for Social Simulation (NeISS)• Enhancing Repositories for Language and Literature
Researchers (ENROLLER)• Proxy Credential Auditing Infrastructure for the NGS• European Network for Study of Adrenal Tumors Cancer
Research Platform• Diagnostic Identification of Parkinsons (DiPAR)
Completed Running
Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Next Generation Transistor Design
3D+
Statistical
Inter-disciplinary e-Health ExampleN
ucl
eoti
de
seq
uen
ces
Nu
cleo
tid
e st
ruct
ure
s
Gen
e ex
pre
ssio
ns
Pro
tein
Str
uct
ure
s
Pro
tei n
fu
nct
ion
s
Pro
tein
-pro
tein
inte
ract
ion
(p
ath
way
s)
Cel
l
Cel
l sig
nal
lin
g
Tis
sues
Org
ans
Ph
ysio
logy
Org
anis
ms
Pop
ula
tion
s
Security!!!
biologists, bioinformaticians,
statisticians, clinicians,pharmacists, physicists,
epidemiologists,chemists, geospatial
modellers, public health...
+ environmental, social, geographic …+ environmental, social, geographic …
Bridges Project
Glasgow Edinburgh
Leicester Oxford
London
Netherlands
Publically Curated Data
Private data
Private data
Private data
Private data
Private data
Private data
CFG Virtual Organisation Ensembl
MGI
HUGO
OMIM
SWISS-PROT
… DATA HUB
RGD
SyntenyService
Information Integrator
OGSA-DAIMagna Vista Service
VO Authorisation
blast
+ + +
Grid Blast Interface
• Allows ‘genome scale’ blasting
• Transparently uses NGS, ScotGrid, other GU clusters, Condor pools
• Many databases already deployed across nodes
• No user certificates
• Fine grained security at back-end
MagnaVista
GeneVista
E-Security• Security
– Key is that should support • seamless access to a heterogeneous variety of “distributed” compute and
data (and other) resources– Often domain specific – especially data!
• single sign-on– Authenticate once and access numerous distributed resources
–AAAA (+privacy, confidentiality, integrity…)
– Authentication » (know who “they” are)
– Authorisation » (decide what “they” can do and enforce it)
– Auditing/accounting » (keeping track of who did what/when for security checks/charging etc)
Ease of Use• For Grids/e-Research to be truly successful
– have to be made as seamless to access and use as the internet
• Forget training, education for some (most?) users!
– have to be based on research pull and not middleware push
– experiences in various projects and across whole e-Science programme have shown that users don’t like digital certificates
User Oriented Security• A_ _ _
– Federated Authentication, e.g. through Shibboleth
Service provider
5. User accesses resource
Web site/e-Journal
Identity Provider
Home Institution
W.A.Y.F.
Federation
User1. User points browser at Grid
resource/portal (or non-Grid resource)
2. Shibboleth redirects
user to W.A.Y.F. service
3.User selects their
home institution
4. Home site authenticates user
AuthNLDAP
Log-in once and roam
_ A _ _• Authorisation
– Defining what they can do and define and enforce rules• Each site will have different rules/regulations
– Also known as Virtual Organisations (VO)• Collection of distributed resources shared by collection of users from one or
more organizations typically to work on common research goal– Provides conceptual framework for rules and regulations for resources to be
offered/shared between VO institutions/members – Different domains place greater/lesser emphasis on expression and enforcement of
rules and regulations (policies)
. . .
{Resources} {Users}
Org1
{Resources} {Users}
Orgn
VO
Privileges, Resources, Access Control and Trust
Service provider
ShibFrontend
5. Pass authentication info and attributes to authZ function
Grid Portal
6. Make final AuthZ decision
Grid Application
Identity Provider
Home Institution
W.A.Y.F.
Federation
User1. User points browser at Grid
resource/portal
2. Shibboleth redirects
user to W.A.Y.F. service
3.User selects their
home institution
4. Home site authenticates user and
pushes attributes to the service provider
AuthNLDAP
LDAPAuthZ