learning open source through gsoc

55
Apache Software Foundation Indiana University Science Gateways, Open Source & Google Summer of Code Suresh Marru

Upload: smarru

Post on 16-Dec-2014

370 views

Category:

Education


0 download

DESCRIPTION

The goal of this talk is to highlight open source opportunities for students especially through an opportunity to earn $5000 through Google Summer of Code program. I will discuss some of the tips on how to engage with open source communities, the befits for contributing. I will provide motivating examples on how students can gain significant experience in contributing challenging distributed systems problems while impacting scientific research. I will specifically focus with a concrete example of Apache Airavata software suite for Web-based science gateways. I will list some example GSoC topics of interest and provide some recipes for success in getting accepted and navigating through success.

TRANSCRIPT

Page 1: Learning Open Source through GSOC

Apache Software Foundation Indiana University

Science Gateways, Open Source & Google Summer of Code

Suresh Marru

Page 2: Learning Open Source through GSOC

Acknowledgements

Apache Software Foundation (ASF)

Extreme Science and Engineering Discovery Environments (XSEDE)

Science Gateways Group, Pervasive Technology Institute, Indiana University (SGG)

Page 3: Learning Open Source through GSOC

Credits to ….

Science Gateways Group @ IU Marlon Pierce: Group Lead Amila Jayasekara Chathuri Wimalasena Heshan Suriyaachchi Jun Wang Lahiru Gunathilake Raminder Singh Saminda Wijeratne Suresh Marru Viknes Balasubramanee Yu (Marie) Ma

Page 4: Learning Open Source through GSOC

Apache Airavata

What will you hear today?

Science GatewaysWeb 2.0, Social Networking, Grid & Cloud

Computing, BigData, everything-as-a-service -- churned into real-world scientific research.

Open Source Hack into Open Source projects – a good way to

cherish doing what you like as opposite to what you have to.

Google Summer of CodeReward yourself with $5000 while making a case

for Future Employments & Graduate School Admissions

Page 5: Learning Open Source through GSOC

Outline

Google Summer of Code

Apache Software Foundation

Getting your way in Open Source

What are Science Gateways?

Interested? Next Steps……

Page 6: Learning Open Source through GSOC

www.google-melange.orgwww.google-melange.com

Page 7: Learning Open Source through GSOC

What is Google Summer of Code?

Google Summer of Code is a program designed toencourage college student participation in

open source software development.

Page 8: Learning Open Source through GSOC

Key Goals of GSOC

• Inspire young developers to begin participating in open source development

• Provide students in computer science and related fields the opportunity to do work related to their academic pursuits during the summer

• Give students more exposure to real-world software development scenarios (e.g. distributed development, software licensing questions, mailing list etiquette, etc.)

• Get more open source code created and released for the benefit of all

• Help open source projects identify and bring in new developers and committers

Page 9: Learning Open Source through GSOC

GSoC in numbers: Countries

Page 10: Learning Open Source through GSOC

GSoC Top Schools

Page 11: Learning Open Source through GSOC

GSoC in numbers: Students

Number of students max’ed and stabilized around 1200.

This is not expected to grow in near future, understandable, still thank you Google!!

Page 12: Learning Open Source through GSOC

GSoC Win-Win Perspective

• Project Perspective:o Paid software developer for the summer.o Attracting a new member into the project

community.

• Student Perspectiveo Opportunity to gain (open source) software

development experience.o Good payment for rewarding work.o Ability to network and become known within a

structured, distributed setting.

Page 13: Learning Open Source through GSOC

What to look for in a project?

Can you engage with project (not just the mentor)?. Can they guide you with tutorials and hand hold early on?

For instance, will you get to experience “Apache Way”?

Is the project welcoming and appreciative?

Is there a mileage for your extra effort with long term commitments?

Page 14: Learning Open Source through GSOC

Key Success: Integrated Cross Apache Projects

• Whirr API

Success Story from Apache Airavata Student: Milinda Pathirage

Page 15: Learning Open Source through GSOC

Core Contributions beyond GSOC

Milinda realized he could execute his GSOC project, but had great thoughts on how we can fundamentally improve Airavata Architecture to make it easy for future extensions.

Developer community agreed to the new Architecture. Simple Easy extendibility.

Airavata has adopted his proposed new architecture

Page 16: Learning Open Source through GSOC

Enhanced Airavata Architecture

Global InHandlers

Global OutHandlers Provider specific OutHandlers

Application specific In Handlers

Application specific OutHandlers

Provider specific InHandlers

Job

Exe

cuti

on C

onte

xt

Pro

vide

r L

ogic

Page 17: Learning Open Source through GSOC

Pick what motivates you

Harness your skills and interests If possible pick a project relevant and “required”

by aligning with your’ academic curriculum As a final year (research) project As a Masters-level research project

Create an interesting and challenging research problem

Sense of satisfaction and achievements Research publications Presentations at ApacheCon and similar conferences Committership

Page 18: Learning Open Source through GSOC

What does a good mentor look for?

Free & Paid Contributions – the reality Long term participant in the project (not a

software developer for ~3 months)Accomplish meaningful research-oriented

goals either within the project or cross-cutting projects.

Teach open source/community participation to the next generation workforce

Page 19: Learning Open Source through GSOC

Apache Airavata

What will you hear today?

Science GatewaysWeb 2.0, Social Networking, Grid & Cloud

Computing, BigData, everything-as-a-service -- churned into real-world scientific research.

Open Source Hack into Open Source projects – a good way to

cherish doing what you like as opposite to what you have to.

Google Summer of CodeReward yourself with $5000 while making a case

for Future Employments & Graduate School Admissions

Page 20: Learning Open Source through GSOC

What Is Cyberinfrastructure?

“Cyberinfrastructure consists of computing systems,data storage systems, advanced instruments and

data repositories, visualization environments, andpeople, all linked together by software and high

performance networks to improve researchproductivity and enable breakthroughs not otherwise

possible.” –Craig Stewart, Indiana University

Page 21: Learning Open Source through GSOC

Knowledge and Expertise

Computational Resources

Scientific Instruments

Algorithms and Models

Archived Data and Metadata

Advanced Science Tools

Science Gateways: Enabling & Democratizing Scientific Research

Page 22: Learning Open Source through GSOC

On-DemandGrid Computing

Dynamic Adaptive Cyberinfrastructure - Reacting to real-time weather

StreamingObservations

Storms Forming

Forecast Model

Data Mining

Refine forecast

Instrument Steering

Envisioned by a multi-disciplinary team from OU, IU, NCSA, Unidata, UAH, Howard, Millersville, Colorado State, RENCI

Page 23: Learning Open Source through GSOC

Anatomy of a Science Gateway

Gateway User Interface Web Portals Desktop Clients Social/ Collaboration Capabilities

Security Infrastructure Analyses & Visualization Capabilities Workflow Execution Framework

Application Abstraction Workflow construction & Enactment Compute Resource Management Scheduling Messaging System

Data Management Provenance Collection

Page 24: Learning Open Source through GSOC

Knowledge and Expertise

Computational Resources

Scientific Instruments

Algorithms and Models

Archived Data and Metadata

Advanced Science Tools

Science Gateways: Enabling & Democratizing Scientific Research

Science Gateways enable and support communities of users associated with a scientific discipline to use cyber infrastructure through a common interface that is configured for optimal use. 

Page 25: Learning Open Source through GSOC

25

Page 26: Learning Open Source through GSOC

XSEDE Vision

The eXtreme Science and Engineering Discovery Environment (XSEDE):

enhances the productivity of scientists and engineers by providing them with new and innovative capabilities

and thusfacilitates scientific discovery while enabling transformational science/engineering and innovative educational programs

Page 27: Learning Open Source through GSOC

https://www.xsede.org/gateways-overview

Page 28: Learning Open Source through GSOC

Today, there are approximately 35 gateways using XSEDE

Page 29: Learning Open Source through GSOC

Apache Airavata

What will you hear today?

Science GatewaysWeb 2.0, Social Networking, Grid & Cloud

Computing, BigData, everything-as-a-service -- churned into real-world scientific research.

Open Source Hack into Open Source projects – a good way to

cherish doing what you like as opposite to what you have to.

Google Summer of CodeReward yourself with $5000 while making a case

for Future Employments & Graduate School Admissions

Page 30: Learning Open Source through GSOC

The Apache Software Foundation

Apache software powers 65% of web sites worldwide

501(c)3 non-profit foundation

Reasons for creating ASF

Create legal entity Protect contributors from

liability Protect Apache assets

Membership: individual

Apache Incubator

Governance and Staffing

Board of Directors Project Management

Committees ASF Members Committers Contributors

Funding All-volunteer

staffing/development resources

Donations Corporate investment

Page 31: Learning Open Source through GSOC

Apache Way:Beyond Open Source, Open Community

Transparency Decision-making and actions are observable Events of interest are published and recorded Transparency invites collaboration

Meritocratic Governance Influence on decisions is based on merit Merit is earned in public Community based governance

Community Common interest, Community interest, Common experience “Community before code”

Collaboration Systems supporting communication and coordination:

repositories, trackers, forums, build tools You can reuse what you can see and influence More eyeballs means better quality

Page 32: Learning Open Source through GSOC

• Apache is a meritocratic organization – Merit does not expire. You earn your keep and your credentials

• Start out as Contributor– Patches, mailing list comments, testing, documentation, etc.– No commit access

• Move onto Committer– Commit access, evolve the code

• PMC Members– Have binding VOTEs on releases/personnel

• Officer (VP, Project)– PMC Chair

• ASF Member– Have binding VOTE in the state of the foundation– Elect Board of Directors

• Director– Oversight of projects, foundation activities

Apache Organization

Page 33: Learning Open Source through GSOC

Our experience with Apache ..

Give up control and get back contributions. Being in apache by itself doesn’t guarantee sustainability but

open doors for sustainability. Google Summer of code has bought in students, increased

documentation, identified confined projects. Do not have to worry about getting sued by Oracle for using Java

API’s. Standing behind a shield of expert lawyers. Companies make in-kind contributions, some have concrete plans,

some or just evangelizing. Both are good. Todays, Cyberinfrastructure eco-system is not in a funding

situation to work on parallel independent implementation. Shared implementation is hard to achieve, but well thought

architectures can achieve it. Also encourage multiple implementations and let the communities

sort out. The winner sustains. Example: Apache Axis2, Apache CXF

Page 34: Learning Open Source through GSOC

Apache Contributions Aren’t Just Software

• Apache committers and PMC members aren’t just code writers.

• Successful communities also include– Important users– Project evangelists – Content providers: documentation, tutorials– Testers, requirements providers, and

constructive complainers • Using Jira and mailing lists

– Anything else that needs doing.

Page 35: Learning Open Source through GSOC

Apache Airavata

http://airavata.apache.org

Page 36: Learning Open Source through GSOC

Science Gateways with Airavata

Page 37: Learning Open Source through GSOC

Workflow Interpreter

Application Factory

Message Box

Registry

Apache Airavata

API

Lorem ipsum

insolens

p1m5

duo x

End

Use

rsG

atew

ay

Dev

elop

er

Scientific Applicati

on

Core Developer

Computational Resources

Apache Airavata

Page 38: Learning Open Source through GSOC

Apache Airavata Components

Component Description

XBaya Workflow graphical composition tool.

Registry Service Insert and access application, host machine, workflow, and provenance data.

Workflow Interpreter Service

Execute the workflow on one or more resources.

Application Factory Service (GFAC)

Manages the execution and management of an application in a workflow

Messaging System WS-Notification and WS-Eventing compliant publish/subscribe messaging system for workflow events

Airavata API Single wrapping client to provide higher level programming interfaces.

Page 39: Learning Open Source through GSOC

Key Airavata Features

Graphical user interface to construct, execute, control, manage and reuse scientific workflows.

Desktop tools and browser-based web interface components to manage applications, workflows and generated data.

Sophisticated server-side tools to register, schedule and manage scientific applications on high performance computational resources.

Ability to Interface and interoperate with various external (third party) data, workflow and provenance management tools.

Page 40: Learning Open Source through GSOC

A Classic Scientific Workflow

Workflows are composite applications built out of independent parts.

Parts are executables wrapped as network accessible services The classic example is that codes A, B, and C need

to be executed in a specific sequence. A, B, C: parallel codes compiled and executable on a cluster,

supercomputer, etc. by schedulers. A, B, and C do not need to be co-located A, B, and C may be sequential or parallel A, B and C may have date or control dependencies

Data may need to be staged in and out Some variations on ABC:

Conditional execution branches Dynamic execution resource binding Iterations (Do-while, For-Each) over all or parts of the sequence Triggers, events, data streams

Page 41: Learning Open Source through GSOC

Challenges in Scientific Workflows

Accommodating wide range of execution patterns Iterations: for-each, do-while, dot and

Cartesian products Interactivity, adaptivity, non-determinism

Accommodating error and uncertainties

Page 42: Learning Open Source through GSOC

NextGen Workflow Systems: Need for Interactivity Across Layers

Scientific workflow systems and compiled workflow languages have focused on modeling, scheduling, data movement, dynamic service creation and monitoring of workflows.

Building on these foundations Airavata extends to a interactive and flexible workflow systems.

Airavata Workflow Features include: interactive ways of interfering and steering the

workflow execution interpreted workflow execution model high level instruction set flexibility to execute individual workflow activity and

wait for further analysis.

Page 43: Learning Open Source through GSOC

Interactivity Contd.

Derivations during workflow Execution that does not affect the structure of the workflow dynamic change workflow inputs, workflow rerun.

interpreted workflow execution model. dynamic change in point of execution, workflow

smart rerun. Fault handling and exception models.

Derivation that change the workflow DAG during runtime Reconfiguration of activity.. dynamic addition of activities to the workflow. Dynamic remove or replace of activity to the

workflow

Page 44: Learning Open Source through GSOC

Interactivity Mathematical uncertainty:

PDE’s from domain problems do not have analytical solution and thereby look at numerical methods to find solutions

These solvers may not converge depending on method, PDE system, initial conditions and expected output tolerances

statistical techniques lead to nondeterministic results. closer observation at computational output ensure acceptability of results.

Domain uncertainty: Scenarios of running against range of parameter values in an attempt to find the

most appropriate input set. Initial execution providing estimate of the accuracy of the inputs and facilitating

further refinement. Outputs are diverse and nondeterministic

Resource uncertainty: Failures in distributed systems are norm than an exception transient failures can be retried if computation is side-effect free/Idempotent. persistent failures require migration

Real-time Model refinement Real-time event processing systems not having data available prior to initialization

of model. models evolve over time and can take advantage of more and more events as

they become available

Page 45: Learning Open Source through GSOC

Illustrating Interactivity

Page 46: Learning Open Source through GSOC

Domain Description

Astronomy Image processing pipeline for One Degree Imager instrument on XSEDE

Astrophysics Supporting workflow of Dark Energy Survey simulations working group on XSEDE

Bioinformatics Supported workflow executions on Amazon EC2 for BioVLAB project

Biophysics Manage large scale data analysis of analytical ultracentrifugation experiments on XSEDE and campus resources

Computational Chemistry

Manage workflows to support computational chemistry parameter studies for ParamChem.org on XSEDE

Nuclear Physics Workflows for nuclear structure calculations using Leadership Class Configuration Interaction (LCCI) computations on DOE resources

Apache Airavata in Action

Page 47: Learning Open Source through GSOC

Apache Airavata

What will you hear today?

Science GatewaysWeb 2.0, Social Networking, Grid & Cloud

Computing, BigData, everything-as-a-service -- churned into real-world scientific research.

Open Source Hack into Open Source projects – a good way to

cherish doing what you like as opposite to what you have to.

Google Summer of CodeReward yourself with $5000 while making a case

for Future Employments & Graduate School Admissions

Page 48: Learning Open Source through GSOC

Apache Airavata

• Engage Early

• Familiarize Projects

• Propose Ideas

• Win, Code, Earn… Cherish !!!

1 2 3 4

How to crack GSoC?

Page 49: Learning Open Source through GSOC

Be Part of the project Community

• Play with different popular open source software ..

• Experiment with the emerging technologies …

• Learn & Engage with a multidisciplinary community..

Page 50: Learning Open Source through GSOC

Be pro-active instead of being reactive:

come up with your own ideas

Page 51: Learning Open Source through GSOC

GSoC Win-Win Perspective

• Project Perspective:o Paid software developer for the summer.o Attracting a new member into the project

community.

• Student Perspectiveo Opportunity to gain (open source) software

development experience.o Good payment for rewarding work.o Ability to network and become known within a

structured, distributed setting.

Page 52: Learning Open Source through GSOC

What to look for in a project?

Engage with project (not just the mentor). Can they guide you with tutorials and hand hold early on?

For instance, will you get to experience “Apache Way”?

Is the project welcoming and appreciative?

Is there a mileage for your extra effort with long term commitments?

Page 53: Learning Open Source through GSOC

Pick what motivates you

Harness your skills and interests If possible pick a project relevant and “required”

by aligning with your’ academic curriculum As a final year (research) project As a Masters-level research project

Create an interesting and challenging research problem

Sense of satisfaction and achievements Research publications Presentations at ApacheCon and similar conferences Committership

Page 54: Learning Open Source through GSOC

What does a good mentor look for?

Free & Paid Contributions – the reality Long term participant in the project (not a

software developer for ~3 months)Accomplish meaningful research-oriented

goals either within the project or cross-cutting projects.

Teach open source/community participation to the next generation workforce

Page 55: Learning Open Source through GSOC

Join the mailing list

Google Group - sgw-gsoc-discuss: https://groups.google.com/d/forum/sgw-gsoc-

discussNeed more info – [email protected]

Apache Airavata