an open source framework for teaching bioinformatics
DESCRIPTION
Title: An Open Source Framework for Teaching BioinformaticsAuthor: Kam DalquistTRANSCRIPT
![Page 1: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/1.jpg)
BOSCVienna, Austria
July 20, 2007
Kam D. DahlquistDepartment of Biology
John David N. DionisioDepartment of Electrical Engineering
& Computer Science
Loyola Marymount University
An Open Source Frameworkfor Teaching Bioinformatics
![Page 2: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/2.jpg)
Outline
• Motivation
• Open source culture
• Implementation--Computer science curriculum--Bioinformatics course
![Page 3: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/3.jpg)
Scientific Computing and the Digital Divide
Wilson GV (2006) Where’s the real bottleneck in scientific computing? American Scientist 94:5–6.
Scientists who come to computer science after being trained in a different primary discipline often have to rediscover, relearn, or keep up with work in the computer science and software development realms in order to get the most out of their work.
This causses unecessary and unknowing repetitions of past discoveries and errors.
Tools or paradigms that are out-of-date in computer science and software engineering remain in place.
At worst, software flaws slow or impede research.
Baxter SM, Day SW, Fetrow JS, and Reisinger SJ (2006) Scientific software development is not an oxymoron. PLoS Computational Biology 2:e87.
![Page 4: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/4.jpg)
The Disconnect Between Undergraduate Computer Science Training and Expectations and Skill Sets
Required for Industry and Research
Undergraduate Training
Industry Expectation
Work alone Work in a team
“Toy” programs and algorithms
Large, modular project
Throwaway code
Code longevity (for better or
worse)
![Page 5: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/5.jpg)
inroads – The SIGCSE Bulletin, Volume 39, Number 2, 2007 June, pp. 70-74
http://recourse.cs.lmu.edu/
![Page 6: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/6.jpg)
Official Open Source Definition (version 1.9)
Free redistribution
Source code
Derived works
Integrity of the author’ssource code
No discrimination againstpersons or groups
No discrimination againstfields of endeavor
Distribution of license
License must not bespecific to a product
License must notrestrict other software
License must betechnology-neutral
![Page 7: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/7.jpg)
Open Source Values
• Source code is available, modifiable, and long-lived
• Accountability implies community--explicit “paper trail” for how a particular program changes over time--communication and answerability
• Responsibilities accompany rights--explicit consideration of the license--giving credit where it is due--appropriate access to software, documentation, and communities--acknowledging security, privacy, legal issues
![Page 8: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/8.jpg)
Open Source Teaching Framework
Source Code:• All code resides in a centralized, public repository• As much as possible, everyone’s code is visible to everyone else for code review or team fixing• No code is thrown away, it remains available to future “generations”
![Page 9: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/9.jpg)
Open Source Teaching Framework
Source Code:• All code resides in a centralized, public repository• As much as possible, everyone’s code is visible to everyone else for code review or team fixing• No code is thrown away, it remains available to future “generations”
Quality & Community:• Documentation, inline and online• Automated tests• Constructive code review, beyond “does it work?”• Long-term projects release early, release often• Form collaborative communities among faculty, students, classes, and projects
![Page 10: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/10.jpg)
Progression through Computer Science Curriculum
• Sample code bazaar--the creation and maintenance of live, organized, searchable, student accessible sample code libraries--students check out, modify, and check in code
• Test infection (Erich Gamma)--test suite vs. implementation matrix
• Life cycle of code--as juniors and seniors, students revisit code written the first year--direct experience with need for documentation
• Release early, release often--applies to both students and faculty
![Page 11: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/11.jpg)
“CourseForge”A Hardware + Software Infrastructure
for Supporting the Teaching Framework
• Certain teaching elements are impractical without some degree of automation
• Support platform for management of student work --revision control --test frameworks --communication --issue tracking
• Derived from open source software, delivered as open source software — the system will interoperate with existing open source tools
![Page 12: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/12.jpg)
CMSI 698: Special Studies in Bioinformatics
• Team-taught by a biologist and a computer scientist
![Page 13: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/13.jpg)
CMSI 698: Special Studies in Bioinformatics
• Team-taught by a biologist and a computer scientist
• Enrollment in Spring 2006: -- eight students from Master’s degree
program in Computer Science -- several coming from aerospace industry-- none with more than college-level introductory biology
![Page 14: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/14.jpg)
CMSI 698: Special Studies in Bioinformatics
• Team-taught by a biologist and a computer scientist
• Enrollment in Spring 2006: -- eight students from Master’s degree
program in Computer Science -- several coming from aerospace industry-- none with more than college-level introductory biology
• Project-based class began development of XMLPipeDB
![Page 15: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/15.jpg)
CMSI 698: Special Studies in Bioinformatics
• Team-taught by a biologist and a computer scientist
• Enrollment in Spring 2006: -- eight students from Master’s degree
program in Computer Science -- several coming from aerospace industry-- none with more than college-level introductory biology
• Project-based class began development of XMLPipeDB
• XMLPipeDB development continued by four students in summer session course entitled Open Source Software Development Workshop
![Page 16: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/16.jpg)
CMSI 698: Special Studies in Bioinformatics
• Team-taught by a biologist and a computer scientist
• Enrollment in Spring 2006: -- eight students from Master’s degree
program in Computer Science -- several coming from aerospace industry-- none with more than college-level introductory biology
• Project-based class began development of XMLPipeDB--authentic bioinformatics problem to solve
• XMLPipeDB development continued by four students in summer session course entitled Open Source Software Development Workshop
• One student continued development for Master’s thesis
![Page 17: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/17.jpg)
XMLPipeDB Project Management: Lessons Learned
• Students on the project had varying levels of maturity, knowledge, and skill coming into the project
-- some naturally took on a leadership role-- some hung back or did the minimum required to get by
![Page 18: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/18.jpg)
XMLPipeDB Project Management: Lessons Learned
• Students on the project had varying levels of maturity, knowledge, and skill coming into the project
-- some naturally took on a leadership role-- some hung back or did the minimum required to get by
• Needed to increase communication and sense of team-- students preferred to interact with faculty for questions, rather than each other-- bug trackers and developer’s forum used only sporadically-- implemented weekly reports on Wiki to increase accountability
![Page 19: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/19.jpg)
XMLPipeDB Project Management: Lessons Learned
• Students on the project had varying levels of maturity, knowledge, and skill coming into the project
-- some naturally took on a leadership role-- some hung back or did the minimum required to get by
• Needed to increase communication and sense of team-- students preferred to interact with faculty for questions, rather than each other-- bug trackers and developer’s forum used only sporadically-- implemented weekly reports on Wiki to increase accountability
• [SourceForge servers were frequently down during class]
![Page 20: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/20.jpg)
XMLPipeDB Project Management: Lessons Learned
• Students on the project had varying levels of maturity, knowledge, and skill coming into the project
-- some naturally took on a leadership role-- some hung back or did the minimum required to get by
• Needed to increase communication and sense of team-- students preferred to interact with faculty for questions, rather than each other-- bug trackers and developer’s forum used only sporadically-- implemented weekly reports on Wiki to increase accountability
• [SourceForge servers were frequently down during class]
• 6 months from conception to product
![Page 21: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/21.jpg)
XMLPipeDB Project Management: Lessons Learned
• Students on the project had varying levels of maturity, knowledge, and skill coming into the project
-- some naturally took on a leadership role-- some hung back or did the minimum required to get by
• Needed to increase communication and sense of team-- students preferred to interact with faculty for questions, rather than each other-- bug trackers and developer’s forum used only sporadically-- implemented weekly reports on Wiki to increase accountability
• [SourceForge servers were frequently down during class]
• 6 months from conception to product
• Even the weakest student contributed useable code
![Page 22: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/22.jpg)
Take-home Messages
• Catch them early
• Making open source values and software development best practices explicit will produce better software and better computer scientists
![Page 23: An Open Source Framework for Teaching BIoinformatics](https://reader034.vdocuments.site/reader034/viewer/2022051513/5479d539b479598a098b4870/html5/thumbnails/23.jpg)
XSD-to-DBXSD-to-DBAdam CarassoAdam CarassoJeffrey NicholasJeffrey NicholasScott SpicerScott Spicer
XMLPipeDBUtilsXMLPipeDBUtilsDavid HoffmanDavid HoffmanBabak NaffasBabak NaffasJeffrey NicholasJeffrey NicholasRyan NakamotoRyan Nakamoto
UniProtDBUniProtDBJoe BoyleJoe BoyleJoey BarrettJoey Barrett
GODBGODBScott SpicerScott SpicerRoberto RuizRoberto Ruiz
GenMAPP BuilderGenMAPP BuilderJoey BarrettJoey BarrettJeffrey NicholasJeffrey NicholasScott SpicerScott Spicer
Special ThanksGenMAPP.org Development GroupCaskey L. Dickson, Wesley T. CittiNSF CCLI Program (http://recourse.cs.lmu.edu)
http://xmlpipedb.cs.lmu.edu
LMU Bioinformatics Group
Kam D. Dahlquisthttp://myweb.lmu.edu/[email protected]
John David N. Dionisiohttp://myweb.lmu.edu/[email protected]