day 3 - hands-on session...in this hands-on session you can engage with other participants and a...
TRANSCRIPT
day 3 - hands-on session
4th June 2020
Welcome!Its is great that you took the time to go through the tutorials and engage with the material.
In this hands-on session you can engage with other participants and a mentor.
Important: This group is for you to get help / provide help. Everyone (mentors included) can and should contribute with their niche of expert knowledge.
How to use the next 2 hours for your benefitFirst, let’s do two round-tables to get to know each other and your projects:
1. Let’s introduce ourselves [10 min]:a. What is your name?b. On what experiment do you work? What is your main research interest?c. Say three things about yourself: 2 truths and one lie
2. Let’s discuss your experience with pipelines [20 min]:a. How was your experience with the course material?b. What were the most valuable insights from yesterday?c. What is confusing / challenging you?d. How do you plan to apply the knowledge you gained in your projects?
Then, we can either discuss your questions, work more on your specific use-cases, or work on the challenges [90 min].
Let’s introduce ourselvesPlease turn on your cameras (if you are comfortable with showing your face).
Introduce yourself:
a. What is your name?b. On what experiment do you work? What is your main research interest?c. Say three things about yourself: 2 truths and one lie
Let’s introduce our projectsHow were your experiences with the tutorial:
1. How was your experience with the course material?2. What were the most valuable insights from yesterday?3. What is confusing / challenging you?
Let’s get more hands-on!
1. Which of your projects could benefit from CI/CD?2. How do you plan to implement CI/CD elements in your project?
Additional material
Challenge 0: rinse/repeatWrite a CI test for each one of the samples in the payload. All of these should be separate CI jobs that run in parallel.
References:● https://hsf-training.github.io/hsf-training-cms-analysis-webpage ● https://hsf-training.github.io/hsf-training-cicd
Challenge 1: “only” keywordSometimes you have multiple collaborators on a project and you may want a certain test or pipeline to only run on a given branch of a project. One example (below) may be that you only want to build and update documentation pages on the master branch. Or perhaps you know you need to test a suite of backgrounds, but only on the “bkg_dev” branch. In this challenge, use the “only” keyword (go look for it in the documentation linked above) to ensure that one of your tests is only executed when you create a “bkg_dev” branch and push a change to the repo on that branch.
References:● https://docs.gitlab.com/ee/ci/yaml/#onlyexcept-advanced
Challenge 2: Perhaps your repo is updating software used by others and you want to ensure that it is *always* working and robust. As such, instead of just running your tests when a change is made, go discover how you can schedule automatic runs of your pipeline tests to occur on a recurring basis, like once per day. (If you aren’t sure of where to start, try googling … and then asking on the mattermost)
References:
● https://docs.gitlab.com/ce/ci/pipelines/schedules.html
Challenge 3: dOxygen documentationDocumentation for code is mission critical if you want to move beyond “But how do I get this working?” Would you be able to use any large codebase (e.g. python, ROOT) without it? As such, your code should be documented as well. And this can be done by automatically building dOxygen documentation of your code in a CI test. This can then further be pushed to an EOS (this is CERN specific) space that is linked to a webpage and therefore automatically viewable by your collaborators. This one is a bit advanced because it may require a bit of an appreciation for docker images, but thats what the challenges are for right? So what you should do is put some dOxygen documentation commenting in your code and then write a CI test using this image - gitlab-registry.cern.ch/cholm/docker-ubuntu-doxygen:latest - and then pushes it to your personal EOS space. If you already have a CERN webpage setup then pushing it there will make it viewable!
References:
● https://gitlab.com/pages/doxygen/-/blob/master/README.md
Challenge 3: dOxygen documentationTo deploy the documentation, you need a docker image which runs doxygen. You can either take one of the many ones on docker hub or build your own (shown here). This custom image extends the CERN IT Docker image for deploying stuff to your CERN website (clickable link).
source: https://gitlab.cern.ch/xmonoh/sphinx
Challenge 3: dOxygen documentationUse your custom docker image for building and deploying the documentation.
In your project (here: ATLAS analysis code: “XAMPPmonoS” framework”), you need to build doxygen (here: also python sphinx) documentation)
Challenge 3: dOxygen documentationUse your custom docker image for building and deploying the documentation:
after you built the documentation,
deploy it to your CERN IT website
you must set your CERN website name and password as hidden environment variables to
Challenge 4: cachesCaches are powerful tools that allow GitLab to save artifacts and effectively use them between different pipelines. This can be very powerful when trying to do automatic regression tests as you can cache the results (e.g. a cutflow output) from *only* the master branch, and then while working on a dev branch, you can refer back to and retrieve those results that were caches from the master. Then, when your developments are finished, when you push to master, it can automatically cache the new results. Then in the future, the dev branch tests will be referring to the new “gold standard”. In this way, we get around having to hardcode cutflow comparisons into our repository and/or could also cache things like histograms for our regression tests. Try doing this for the CMS OpenData payload.
References:
● https://docs.gitlab.com/ee/ci/yaml/README.html#cache● https://docs.gitlab.com/ee/ci/caching/index.html
Challenge 5: static code checksYou already learned about setting up a CI to build your code and run jobs. This is great and allows you to check if your code compiles and even allows you to check if your analysis code is doing what you expect it to do. Another way to check e.g. some python code are static code checks. Introducing additional, automatic static code analysis inside your code repository is a very simple way to ensure that your code is clean from known security issues and bad practices. A nice introduction to static code checks is provided by the CERN IT.
Resources:https://gitlab.cern.ch/gitlabci-examples/static_code_analysis
Challenge 6: typeset LaTeX with DockerGitlab CI jobs are extremely flexible, as they can use docker containers to get almost any payload. In the CI/CD tutorial, you noticed the “image” keyword in the CI jobs and were told not to worry about Docker. In fact, Docker containers are extremely useful and allow you to run almost any code in your CI/CD pipeline.Assuming you are working on a LaTeX document (paper, a thesis, ...) and want GitLab to typeset a PDF as an artifact every time you push a new commit to your project.All you have to do is to add a .gitlab-ci.yml file to your project. Try to find out using google how to use a LaTeX docker container to automatically typeset your document.
Challenge 7: enforced code styleYou may be collaborating with many people on a project, everyone with their own preference on how to format code. Wouldn’t it be great if there would be an automated script going over every addition to your common code base nicely formatting it, so that the code base has a consistent layout? You can use tools like “clang” or “black” to automatically format your code. A possible implementation for “clang” is linked here (and the code used to create the docker container is linked here, credits go to Andreas Hoenle).
Issues
YAML isn’t formatting correctlyIf your yaml file is poorly formatted, you can use this tool
http://www.yamllint.com
to spot some missing spaces/tabs/ etc.
or use the gitlab internal solution
https://gitlab.cern.ch/pgadow/awesome-analysis-eventselection/-/ci/lint
(change “pgadow” to your username and the “awesome-analysis-eventselection” to your project to make it work)
I want to learn about gitlab CI yaml commandsVisit the (very good) documentation here:
https://docs.gitlab.com/ee/ci/yaml/
I want my code to become a docker container
build_docker: stage: build variables: TO: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_NAME # all submodules will be cloned recursively upon start of CI job GIT_SUBMODULE_STRATEGY: recursive GIT_SSL_NO_VERIFY: "true" tags: - docker-image-build script: - ignore
Add a Dockerfile to your repository and add this job to your CI. Now you built a docker container of your analysis code which is stored in your gitlab CI!
You can even use it downstream in your pipeline!(learn more about the environment variable CI_REGISTRY_IMAGE here)