research computing @ uqscience and engineering, humanities, and social sciences, through advanced...

10
1 Research Computing @ UQ Research Computing @ UQ 3 February 2014 3 February 2014 Issue 2 Issue 2 The Research Computing Centre supports collaboration to facilitate discoveries in science and engineering, humanities, and social sciences, through advanced computation, data analysis and other digital research tools. Message from the Director Welcome to the first newsletter for 2014. There’s a lot happening in RCC, so this newsletter gives me a chance to highlight a few of the more important activities. Over the past six months I have been working with colleagues, both within RCC and outside, to build a new structure and engagement model. In this issue I will present a high level view of this, and give you the motivation and key features. Of course, I welcome feedback on how you feel it will work for you and your research groups. UQ now has a data management policy, having been ratified by the Academic Board and Council. This policy outlines the responsibilities for stewardship and protection of research data, and is an important step towards meeting our national and international obligations. RCC, together with partners, was successful in an ARC LIEF proposal to fund a new computational platform. The new system, codenamed Flashlite, will support data intensive computing, combining parallel computing facilities with high speed storage. Modelled on the Gordon machine at the University of California, San Diego, FlashLite will allow application to manipulate large amounts of data very rapidly, adding capabilities not previously available. Whilst the specifications are not locked down, we give a little information in the newsletter, and will b defining the system capabilities over the next few months. I am delighted to announce that UQ has chosen an o site data centre for hosting QRIScloud, FlashLite an other significant research computing infrastructure. More details will be announced over coming months The facility will support a substantial expansion ove on campus facilities.

Upload: others

Post on 08-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

1

Research Computing @ UQResearch Computing @ UQ

3 February 20143 February 2014 Issue 2Issue 2

The Research Computing Centre supports collaboration to facilitate discoveriesin science and engineering, humanities, and social sciences, through advancedcomputation, data analysis and other digital research tools.

Message from the Director

Welcome to the first newsletter for 2014. There’s a lothappening in RCC, so this newsletter gives me achance to highlight a few of the more importantactivities.

Over the past six months I have been working withcolleagues, both within RCC and outside, to build anew structure and engagement model. In this issue Iwill present a high level view of this, and give you themotivation and key features. Of course, I welcomefeedback on how you feel it will work for you andyour research groups.

UQ now has a data management policy, having beenratified by the Academic Board and Council. Thispolicy outlines the responsibilities for stewardship andprotection of research data, and is an important steptowards meeting our national and internationalobligations.

RCC, together with partners, was successful in anARC LIEF proposal to fund a new computationalplatform. The new system, codenamed Flashlite, willsupport data intensive computing, combining parallelcomputing facilities with high speed storage.Modelled on the Gordon machine at the University ofCalifornia, San Diego, FlashLite will allow

application to manipulate large amounts of data veryrapidly, adding capabilities not previously available.Whilst the specifications are not locked down, wegive a little information in the newsletter, and will bedefining the system capabilities over the next fewmonths.

I am delighted to announce that UQ has chosen an offsite data centre for hosting QRIScloud, FlashLite andother significant research computing infrastructure.More details will be announced over coming months.The facility will support a substantial expansion overon campus facilities.

2

Finally, I am delighted to welcome a number of newstaff to RCC, and look forward to working with themover coming months to support research at UQ.

Yours Sincerely,

David AbramsonDirector

The Research Data Policy aims to ensure that theresearch data required for validation of researchresults are properly managed according torecommendations made in the Australian Code forthe Responsible Conduct of Research and applicablelegislation. This code states that all individuals andinstitutions engaged in research have aresponsibility to manage research data effectively,by addressing ownership, storage, retention andaccess issues.

The University recognises the significant v alueof the research data generated by its largeinvestment in research. Durable research data areessential to justify, and defend when required, theoutcomes of research. Good stewardship canincrease the efficiency and maintain the integrity ofresearch results.

The potential cumulative value of research datashould also be considered and, where possible, itshould be made available for re-use. The universityrecognises that access to research data can raise theresearch profile of individuals and institutions,increase returns on public investment, promote openinquiry and debate, and enable innovative uses ofdata that may not have been foreseen by researchersat the time of its creation. The university iscommitted to supporting long-term research datamanagement to enable continuing access.

To optimise research outcomes, data must be stored,retained, documented and/or described, madeaccessible for use and reuse, and/or disposed of,according to legal, statutory, ethical and fundingbody requirements. Research data management is ashared responsibility. The university expects allresearchers, academic units, the Library and centraladministrative units to work collaboratively toimplement good research data management practice.

Research Data Descriptions nowlive in UQ eSpace

UQ Library's Data Collections Form allowsresearchers to describe their research data accordingto good practice, and further supports them byoffering either mediated access or open access totheir research data. It aids discovery, disseminationand preservation of research data as a first classresearch output at UQ. It also makes research datavisible via search engines such as Google, as well asthrough national data services including ResearchData Australia, and allows researchers to build anindex of their research data in eSpace, and link theirpublications to the underlying data. For moreinformation email [email protected].

UQ Launches Research Data Policy

3

FlashLite - Data Intensive Sciencecomes to UQData is predicted to transform the 21st century,fuelled by an exponential growth in the amount ofdata captured, generated and archived. Foster predictsa ten-fold increase every five years from 2000 to2015 alone. Microsoft has identified data intensivescience as the fourth scientific paradigm aftercomputation as the third. Not surprisingly, there arenumerous projects targeting the challenges thatunderpin the exploitation and management of thisdata explosion.

Australia has made significant progress towardsaddressing some of the opportunities andinfrastructure challenges posed by managing suchrapid increase in data volumes. Substantialgovernment investment, through the AustralianNational Data Service (ANDS), the Research DataStrategic Infrastructure (RDSI) and the NationaleResearch Collaboration Tools and Resources(NeCTAR) programs will advance our ability tosearch, manage and store large amounts of data.However, whilst important, these investments do notaddress the escalating scientific imperative to exploitand process data.

FlashLite will be designed explicitly for Australianresearch to conduct data intensive science andinnovation. It will support applications that need veryhigh performance secondary memory as well as largeamounts of primary (main) memory, and willoptimise data movement within the machine. Dataintensive applications are neither well served bytraditional supercomputers nor by modern cloud-based data centres. Conventional supercomputersmaximise Floating Point Operations per Second(FLOPS) and inter-processor communication ratesthrough high bandwidth and low latency networks.Conversely, modern Cloud systems minimise the costof ownership through reliance on virtual machinesand shared storage; they thus utilize relatively slowprocessors and networks and, by and large, do notsupport parallel processing.

FlashLite, on the other hand, will maximise InputOutput Operations per Second (IOPS) whileachieving competitive FLOPS ratings and high

performance networking, producing a balancedsystem for applications that exploit parallelism, high-speed arithmetic, and high performance Input/Output(IO). The machine will also incorporate novelsoftware mechanisms that provide seamless access todata regardless of its location, making it easier tobuild new data-intensive applications, whilstsupporting legacy codes using familiar techniques.

FlashLite is inspired by the US National ScienceFoundation (NSF) machine (called Gordon) that isalready achieving impressive performance on dataintensive applications at the San DiegoSupercomputer Centre (SDSC). An early prototype ofGordon machine accelerated real applications by 2-100 times.

The growth of digital data is leading to the so-called“data tsunami”, driven by advances in digitaldetectors, networks and storage systems. Each ofthese technologies are following a trajectory knownas Moore’s Law, which provides improvements indetector resolution (with associated lower costs),network transmission rates and memory densities that

are doubling at least every 18 months. The growth isalso driven by increased resolution of computationalmodels, which leverages improved computationalpower through multi-core devices, again followingMoore’s Law. Critically, the gap between thearithmetic processing speeds and transfers betweenmain and disk memory is increasing, making it moreand more difficult to

4

analyse and process increasingly large volumes of data in a timelyway.

Traditional High Performance Computing systems (HPC) aredesigned to deliver maximum computational performance onscientific codes, but are not typically optimised for maximisingmemory throughput – either primary (main) memory or secondary(disk) memory. In order to enable HPC applications over large datavolumes, it is time to rethink existing computer architectures toredress this imbalance. FlashLite is such a system, andincorporates three key innovations that are not typically embracedby traditional HPC systems:

high throughput solid state disk (instead of spinning disk); large amounts of main memory; andsoftware shared memory.

These innovations mean that applications have increased access to a high performance memory system,spanning high speed main memory (usually supported by DRAM technology) and low latency, highthroughput, solid state disk (also known as Flash memory). A novel (commercial) software abstraction,called Virtual Shared Memory (vSMP) bridges these components, thus simplifying programming.

•••

“Gordon is a unique machine inNSF's AdvancedCyberinfrastructure/XSEDEportfolio," said Barry Schneider, NSFprogram director for advancedcyberinfrastructure”. It was designedto handle scientific problemsinvolving the manipulation of verylarge data. [Using Gordon]researchers identified the hierarchicaltree of coherent gene groups andtranscription-factor networks thatdetermine the patterns of genesexpressed during brain development.

This application comprises 18 CIs from 5 collaborating universities,representatively spanning a wide range of research endeavours. The CIs, all ofwhom have high profile reputations in their respective fields and lead majorcomputational research efforts, are representative of their universities’ researchstrengths/priorities. This list includes current and former ARC Federation Fellows(Burrage), Professorial Fellows (Abramson), Future Fellows (Gu, Zhu), CIs inCentres of Excellence and CRCs (Abramson, Burrage, Tomlinson), and holders ofARC project grants (Discovery and Linkage).

All of the CIs maintain excellent international linkages, and these furtherenhance the research capability. For example, Abramson actively collaborates withresearch staff at SDSC, and regularly attends research meetings and workshopsthere. He is well placed to leverage the SDSC investment in Gordon and to leadFlashLite. Further, Abramson and Burrage collaborate with researchers at OxfordUniversity, bringing a significant international flavour to the cardiac modellingactivity at UQ and QUT. Other CIs maintain similar linkages.

5

The Research ComputingCentreNew techniques and technologies are enabling us to both ask, and answer, boldnew questions. This is causing a significant shift in the way research is conducted,leveraging new resources such as massive data stores, enormous computationalpower and new ways of communicating. This emergent infrastructure is callede-Research.

For researchers trained in recent decades, thisrepresents a substantial challenge. In order to evenremain competitive, let alone lead their field, theyneed to embrace the new ways of working, and mustfeel comfortable and productive in this newenvironment. In response, governments and privateorganisations around the world have invested not onlyin e-Research, but also in support services that allowresearchers to be productive.

The Research Computing Centre (RCC) is aUniversity-level Centre that provides coordinatedmanagement and support of the University’s sustainedand substantial investment in e-Research. RCC is aninnovative and multidisciplinary environment thatsupports collaboration to facilitate discoveries inscience and engineering, humanities, and socialsciences, through advanced computation, dataanalysis and other digital research tools. The centreenhances the University’s e-Research infrastructure,and provides support for interdisciplinary researchand education.

RCC has developed a structure and interaction modelthat matches the way research is conducted at UQ,leveraging expertise in faculties, Research Centres,Institutes and other support groups. This is no meanfeat – the UQ research landscape is complex, diverseand distributed, and employs widely different modelsthat are domain specific. RCC has developed a

unique, multi-tiered structure that both shrinks thegap between researchers and e-Researchinfrastructure, but empowers those who already havethe skills to excel.

RCC leverages investment by government ininitiatives such Queensland Cyber InfrastructureFoundation (QCIF); the National eResearchCollaboration Tools and Resources (NeCTAR); theResearch Data Storage Infrastructure project (RDSI);and the Australian National Data Service (ANDS). Italso builds on key support services in the University,namely Information Technology Services (ITS) andthe Library.

RCC aggregates expertise in core e-Researchtechnologies, such as Cloud Computing, DataManagement, High performance Computing (HPC),Workflow Tools and Visualisation. Over and above

this, it builds an expandable layer of domainexpertise, initially in Bio-informatics and Genomics;Computational Engineering; Environment andEcology; Humanities and Social Sciences; andAdvanced Imaging. These domain layers are theprimary interface to researchers, and are collaborativeventures with existing groups.

RCC uses a collaborative funding model, utilisingmultiple sources, including the Office of the Deputy

Vice-Chancellor (Research), QCIF, NeCTAR, RDSI,

6

The RCC model layers domain specific expertise on top of generic computing tools. Researchers can engageat a variety of levels depending on their own skills and needs. The innermost ‘Core’ layers contain generictechnologies that can be applied across a wide range of research areas. For example, workflow tools can beapplied in everything from digital humanities to molecular science. The outer ‘Theme’ layers are domain-specific, and contain multi-disciplinary teams with both domainknowledge and computing skills.

Agile teams are formed to pull expertise from a varietyof sectors. For example, a project in bio-science mayrequire skills in data management, highperformance computing and workflows.Another project in imaging may requireexpertise in data management alone.

Importantly, RCC will fund all activities inthe inner two layers, and will call for co-investment in the outer, domain-specific,layers.

RCC maintains a close working relationshipwith UQ Library around research datamanagement. Specifically, RCC will focus ontechnical issues associated with data storage andtransport, whilst the library will focus on thedevelopment of data management plans, meta-dataspecification and management and linking topublications. Together, these will underpin UQ’s DataManagement Policy.

6

7

QASMT embracessupercomputingRCC supported three students from the QueenslandAcademy of Science, Mathematics and Technologyto attend the 2013 IEEE Supercomputing conferencein Denver (SC13).

For a quarter of a century, the SupercomputingConference has served as the crossroads for theentire HPC community. From users and programmanagers to colleagues and vendors...fromgovernment to private industry to academia...SC hasprovided unparalleled cooperation, unequaledcollaboration, and unmatched exposure.

Spotlighting the most advanced scientific andtechnical applications in the world, SC13 brought

together theinternationalsupercomputingcommunity foran exceptionalprogram oftechnicalpapers, tutorialsand timelyresearch posters.The SC13

Exhibition Hall featured exhibits of the latest andgreatest technologies from industry, academia andgovernment research organisations; many of thesetechnologies were seen for the first time in Denver.

The conference gave the QASMT students a uniqueopportunity to learn about supercomputing. SteveBlair, science teacher from QASMT whoaccompanied the students, said: “SC13 was amazing.It has spurred the students into taking a more activeinterest in how supercomputing interacts with science.We will certainly be integrating this into ourcurriculum”

International Conference onComputational Science (ICCS)10-12 June 2014 - Cairns, Australia

"Big Data meets Computational Science"

The International Conference on ComputationalScience is an annual conference that bringstogether researchers from various applicationareas who are pioneering computational methodsin sciences such as physics, chemistry, lifesciences, and engineering, as well as in arts andhumanitarian fields, along with researchers andscientists from mathematics and computer scienceas basic computing disciplines. This mixture ofcomputational researchers are able to discussproblems and solutions in their areas,

7

identify new issues, and to shape future directionsfor research.

For more information see: http://iccs2014.ivec.org/

ICCS is well known for its excellent line up ofkeynote speakers. The keynotes for 2014 are:

Professor Vassil Alexandrov, ICREA ResearchProfessor in Computational Science, BarcelonaSupercomputing Centre, Spain

Professor Dr Luis Bettencourt, Santa Fe Institute,New Mexico, USA

Professor Professor Peter T. Cummings,Department of Chemical and BiomolecularEngineering, Vanderbilt University, USA

Professor John Mattick, Garvan Institute ofMedical Research, Sydney, Australia

Professor Bob Pressey, Australian ResearchCouncil Centre of Excellence for Coral ReefStudies, James Cook University, Australia

Professor Mark Ragan, Institute for MolecularBioscience, The University of Queensland,Australia

International Expert comes toRCCRCC recently hosted Professor Bob Panoff, from theSHODOR organisation at UQ. Bob Panoff is founderand Executive Director of The Shodor EducationFoundation, Inc., and has been a consultant at severalnational laboratories. He is also a frequent presenter atNSF- sponsored workshops on visualisation,supercomputing, and networking, and continues toserve as consultant for the education program at theNational Center for Supercomputing Applications atthe University of Illinois at Urbana-Champaign.

In October Bob presented a seminar at UQ on“Transforming learning through ComputationalThinking”. Bob also gave a seminar to school studentsat QASMT.

8

International Research Internships come to RCC

In today’s educational arena, universities must provide studentswith opportunities to work and study abroad to prepare them forglobal citizenship and professional competence in a multi-cultural workplace. Numerous reports have challengeduniversities to develop educational programs that provide anintegrated academic basis for developing students’cultural/global competencies.

Over the past five years, 26 Monash University students havetravelled to international partners at the University of California, San Diego, The National Center forSupercomputing Research in Illinois, The Technion in Israel and the University of Warwick. Students areplaced for a period of eight weeks, allowing them to integrate into the research groups as team members.Students have a local mentor in Australia as well as one in the remote site, and often bridge internationalresearch projects.

This year RCC has launched QURPA, and UQ students will join Monash students for the first time. This notonly allows UQ students to engage in some fascinating advanced computing projects, but opens the door totrans-national student-lead collaboration in undergraduate research. We have also added collaborators at theInstitute for Infocomm Research (I2R) in Singapore in addition to projects at UCSD. Projects span a wide

range of advanced computing technologiesand leading edge applications, and serve asincubators for RCC projects.

MURPA and QURPA involve anadvanced seminar scheme, in whichstudents can attend virtual seminars givenby world leading experts before they leave.These seminars also allow students to"meet" potential UCSD mentors and getsome information about potentialprojects. In 2013 seminars were sourcedfrom Faculty at UCSD, the University of

Indiana, I2R in Singapore, the National Centre forSupercomputing Applications (NCSA) and RensselaerInstitute of Technology. In 2013 seminars werebroadcast simultaneously to Monash (in Melbourne)and UQ (in Brisbane), with audiences able to askquestions from either venue. The seminar infrastructuresupports a wide range of video conference technologies(both open source and commercial), and is displayedon a 20 MPixel OptiPortal.

In addition, RCC recently hosted 2 UCSD students inan NSF funded program called PRIME. PRIME hasbeen running since 2004, and sends students fromUCSD to research partners across the Pacific Rim. This is the first year PRIME students have visited UQ.

Rasellus hendrerit pulvinar nibhLorem ipsum dolor sit amet, ligula suspendisse nulla pretium, rhoncus tempor placerat fermentum, enim

9