How to make your scientific code accessible and reproducible?

Using Git and Github

Steve Vissault

inSileco

April 22, 2025

Who am I?

Who am I?

  • Steve Vissault, M.Sc in forest ecology
    • How climate changes will shape forest distributions at the end of this century
  • Research professional at UdS for 3 years
  • Then I did software developement for Omnimed, an Electronic Medical Records (EMR) for 3 years
  • I founded a consulting company inSileco 4 years ago with 2 amazing ecologists
  • Now, taking a position at INRS with Valérie Langlois for the next 2 years

Workshop 1
What is Git, why and how to use it?

What is Git?

  • Git is a version control system: a software that allows you to track additions and modifications for a set of files within a folder, called a repository
  • Its original purpose was to help groups of developers work collaboratively on big software projects.
  • Think git as the “Track Changes” features from Microsoft Word on steroids.

Git - Context

Compiled software are a bunch of files. At each software improvment, developers want the software to be stable with no regression. For that, they have to know what files have changed between snapshots.

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Why is this relevant to a scientist?

Git fondations

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: add & commit

Git fondations: branches

Git fondations: branches

  • Une branche (main par défault): c’est un série de commentaires (commit)
  • Le dernier commentaire (commit) est ce que l’on appelle la tête de la branche (HEAD), elle contient la version la plus à jour des fichiers.
  • À chaque commentaire d’édition (commit) est attachée une version des fichiers.

Git fondations: local vs remote

How git helps in science reproducibility

  • Git is a coding notebook: it documents not only the code but also the coding process
  • It makes your code findable and reusable by publishing on a remote repository Github
  • All the code source, data and figs are accessible from one location

In summary

  1. Initiate the repository (git init)
  2. Edit your files
  3. Stage the modifications to be committed (git add)
  4. Create a new commit object (git commit)
  5. Go back to step 2.
  6. Publish your code on the remote server (git fetch & git push)
  7. Go back from vacation
  8. Watch if another developer has release new version of the code on the remote repository (git fetch & git pull)

How to use Git in practice?

Prerequisites

  1. Make sure R version ≥ 4.0 is installed on your computer
  2. Install Git:
  • Windows, Install Git for Windows (also known as msysgit or Git Bash) to get Git and tools like the Bash shell. Download the executable here
  • macOS, open a terminal and type:
    • git —version
    • git config
    • If Git isn’t installed, macOS will prompt you to install it.
  • Linux, nothing to do—Git is likely pre-installed.

For more details: https://happygitwithr.com/install-git

What we’re going to do

  1. Create a folder
  2. Initiate a git repo for this new folder
  3. Create a new R script (file 1) and edit it
  4. Stage this new R script in the git repo
  5. Add a commit to document the added file
  6. Create a second R script (file 2) in the git repo and edit (file 3)
  7. Repeat steps 4 and 5 for this second script
  8. Check the git log
  9. Get back to the first commit

Initiate a Git repo

In order to reduce the learning curve, we will learn the R Studio way of interacting with git

So many other ways to interact with git:

  • Using R with the gert package
  • Using the terminal
  • Using as GUI such as Github Desktop

All the key concepts and process still the same!

Initiate a Git repo

To use git in Rstudio, you need to create a new Project. Let’s start by creating our project directory.

Open RStudio, in the top right corner create a new project

  • Select New Directory Note that Version Control is reserved for when you already created a repo on GitHub.
  • Select New Projects
  • Choose a name for your repo and a location. Make sure that create a git repository is checked

Initiate a Git repo

You can now look under the git tab, you should see this:

Create a new R script

Create a new file: go to the menu File => New file => R Script. Make sure to save it with the name script_1.R.

What happens in the git tab?

Understand file status

  • Files that are untracked are represented by a yellow question mark.
  • Files that have been added (see next section) are represented by a green A
  • Files that have been tracked and modified are represented by a blue M.
  • Files that are tracked but not modified do not show.
  • Files that have been deleted are shown with a red D.

Stage a new R script

This is as simple as “checking off” the file in the Git tab. This is called Staging the file.

What is a gitignore? A gitignore is a file that lists the file you never want git to track. It can match certain file names (for instance, .csv or .tif files). This can be useful in case you need to make sure certain files (like data files or large files), do not get added.

Create a commit

Click on commit on the top of the file list. This window should appear:

Before committing anything you need to add a commit message. It is important to add a useful message to your commit, a bit like a journal entry, so that you can remember what you committed.

Click on commit in order to commit the changes!

Edit your script

  • Note that once you have staged a file, you could do more changes, and you would need to re-run git add to add them to the index. Those changes not yet fully registered by git, they are like a draft, not until you commit.
  • When you want to take a snapshot of a file, it means you are ready to commit that change to the index.

Workshop 2
What is Github and how to use it to collaborate?

Before we start

Let’s make sure you have the following packages installed on your computer:

  • usethis to create a GitHub token
  • gitcreds to store the token in your local system
install.packages(c("usethis", "gitcreds"))

Review of Workshop 1

In the first workshop, we learned the basics of Git. We covered:

  1. How to create a Git repository from an RStudio project
  2. How to stage a file or a group of files
  3. How to make make a commit
  4. View the history of all commits
  5. Tell git to ignore files with a .gitignore file (relevant for sensitive data)

git Workflow Overview Reminder

The typical workflow:

  • Step 1: You modify files in your working directory and save them as usual.
  • Step 2: You stage files to mark your intention to “commit” them (in RStudio this is done by checking the box next to the file in the “Git” tab).
  • Step 3: You commit the staged files, which permanently stores them as snapshots to your git directory.

Tip

We can make an analogy with taking a family picture, where each family member would represent a file.

Staging files is like deciding which family member(s) are going to be on your next picture Committing is like taking the picture

Git Glossary from Workshop 1

  • Git repository: A project folder that Git tracks. It contains all the files and a hidden .git folder where Git stores the complete history of changes.
  • Staged file(s) : Files that are ready to be committed
  • Unstaged file(s) : Files that are not ready to be committed
  • Untracked file(s) : Files that are not tracked by Git
  • Commit: A saved version of your project at a specific point in time (also called snapshot), including a message that explains what changed.

In this Workshop

  • How Git and GitHub work together
  • How to synchronize your local repository:
    • Push your commits to GitHub
    • Pull your collaborators’ commits from Github
  • How conflicts are generated and how to resolve them when applying your collaborators’ commits

What is Github?

What is Github?

  • Github is a web-based platform.
  • Github stored your Git repository on a remote server.
  • Github allows you to backup and make your work foundable and accessible to others.

Connect your local repository to GitHub

Step 1. Create a GitHub account

Step 1. For those who don’t have an account, sign up on GitHub using this link

Step 2. Set up your credentials so GitHub recognizes you

We have to store a token in our local system. This token is used to authenticate your computer with Github.

Tip

Token = password

Step 2. Set up your credentials so GitHub recognizes you

Run the following command in RStudio

usethis::create_github_token()
  1. This command will open your browser at Github page, log in
  2. Name explictly the token (e.g based on your current R project), copy the token starting with ghp_

Step 2. Set up your credentials so GitHub recognizes you

Run the following command in RStudio

gitcreds::gitcreds_set()

This command will open a prompt in RStudio

Paste the token you copied in the previous step

Step 3. Connect your local repository to GitHub

Assess if everything is fine and the credentials are OK

Run the following command in RStudio

usethis::git_sitrep()

Note

This way, you will not have to enter your credentials every time you push or pull from GitHub. If you want to know more on this authentification process, have a look at the usethis documentation

Step 3. Connect your local repository to GitHub

Run the following command in RStudio

usethis::use_github()

Note

usethis::use_github(private = TRUE) if you want your repository to be private, because you are working with sensitive data

The configuration is done and the local is now sync! Lets have a quick look at the Github repo on the website.

Quick overview of the Github repo website

  • Red: List of files
  • Blue: Latest commit
  • Green: when the files and directories were last modified

Quick overview of the Github repo website

Quick overview of the Github repo website

How to synchronize with the remote (Github)?

How to synchronize with the remote (Github)?

Lets go back on RStudio and check the Git tab. Two new buttons Pull and Push are now available.

How to synchronize with the remote (Github)?

  • Step 1: Edit your files
  • Step 2: Commit, move a changed local file to your local staging area
  • Step 3: Pull, will get the latest remote version and try to merge it with your local version
  • Step 4: Push, will send your local version to the remote version of the repository (in our case GitHub)

You have to repeat step 2-4, every time you edit, add, delete a new file if you want to keep your local and remote repository in sync.

Collaborate to an existing repository

Collaborate to an existing repository

If I’m a collaborator on a project, I need to be able to get the latest version of the code from the remote repository.

To accomplish this, I Clone the remote repository to my local computer. Clone means to copy the entire contents of a GitHub repository to your local computer.

Collaborate to an existing repository

Navigate to the repository on GitHub and click on Code. Select HTTPS and copy the link.

Collaborate to an existing repository

  1. And finally, in RStudio, go to File > New Project > Version Control > Git
  2. Paste the link you copied from GitHub.
  3. Select a local directory where you want to save the repository.

To sum up on the new git vocabulary

  • Clone, copy the entire contents of a GitHub repository to your local computer (done once per computer)
  • Commit, move a changed local file to your local staging area
  • Pull, get file(s) from the remote to your local computer – opposite of a “push”
  • Push, move file(s) to the remote from your local computer – opposite of a “pull”

Let’s practice

  • Create a new R script (file 2) and edit it
  • Stage this new R script in the git repo
  • Add a commit to document the added file
  • Create a second R script (file 3) in the git repo and edit it
  • Repeat steps 4 and 5 for this second script
  • Check the git log
  • Publish the new files on the remote server (Github)
20:00

Let’s practice

Tip

Reminder on git file status:

Collaborate on Github
can generate conflicts…

Before we start

Invite a collaborator to your repository on GitHub.

  • Go to your repository on GitHub
  • Click on the Settings tab
  • Click on Manage access on the left
  • Click on Invite a collaborator
  • Enter the GitHub username of your collaborator
  • Click on Add to send the invitation

Before we start

  • Accept the invitation from your collaborator
  • Clone the repository on your local computer
  • Edit an existing file
  • Commit the changes
  • Push the changes to the remote repository

git conflicts

  • A conflict occurs when two or more people edit the same line of code in a file and then try to push their changes to the remote repository.

Resolve a git conflicts on RStudio

Resolve a git conflicts on RStudio

Strategies to avoid git conflicts

  1. Communicate with your team members/collaborators often to avoid working on the same files at the same time! Let each other know who is working on what
  2. Commit often in small chunks, to avoid complex merges
  3. Pull often especially before starting to edit and pushing your changes

Let’s practice

In teams of 2, edit the same file at the same time in the same repository. Then, try to push your changes to the remote repository. A conflict should occur, let’s try to resolve it.

15:00

Complementary resources

A lot of the content of this workshop come from or are based on the following resources:

Workshop 3
Create and deploy a Quarto document on Github

What we will learn in this workshop

  1. What is a Quarto document?
  2. How Quarto document are structured?
  3. How deploy it on GitHub Pages?

What is Quarto?

Quarto is a tool for creating documents, presentations, websites, and more from plain text files.

Illustration by Allison Horst

What is Quarto?

  • Quarto is a next-generation R Markdown. It is an open-source scientific and technical publishing system built on Pandoc.
  • Pandoc is a universal document converter that can convert between many different file formats.
  • In Quarto, you can write in Markdown and include code chunks in R, Python, Julia, or Observable JavaScript.

Illustration by Allison Horst

Quarto formats

  • HTML document for web and dynamic reporting: user can interact with the document via a web browser.
  • PDF and word document are static documents and are a more conventional way to share documents.

Create a Quarto document

Let’s practice a bit

  1. Create a new Quarto document in RStudio: File > New File > Quarto Document
  2. Choose a name for your document and save it in the same folder as your Git repository
  3. Edit the YAML header to set the document author, date
  4. Click on the Render button in the same panel as the Quarto document
05:00

Create a Quarto document

Tip

In Rstudio, you can use the Visual mode to edit your Quarto document. This mode allows you to see the document as it will be rendered, while still being able to edit the source code.

Create a Quarto document

Tip

You can render the document in a viewer Rstudio panel instead of opening a new window in the browser.

Quarto document structure

Metadata: YAML syntax

Header: YAML Metadata

The YAML header is the first part of a Quarto document

  • Starts and ends with three hyphens (---)
  • Contains the metadata of the document
  • Comes first in the .Qmd file
  • Sets Pandoc document options with key: value (YAML syntax)
  • Available options depend on the output format
---
title: "My document title"
date: "01-03-2021"
author: Me, you, them
output: pdf_document  
---

Document options

It depends on the output format. Options are available in the Quarto documentation:

  1. HTML Options
  2. PDF Options
  3. Word Options

Corpus: Markdown syntax

Headers

  • # Heading 1
  • ## Heading 2
  • ### Heading 3
  • #### Heading 4
  • ##### Heading 5
  • ###### Heading 6

Text formatting

  • Normal text
  • _Italic_ *text*
  • __Bold__ **text**
  • ***Bold italic*** **_text_**
  • ~~Strikethrough text~~

Text formatting

  • Blockquote: > This is a blockquote
  • Unordered list:
- item 1
- item 2
  • Ordered list:
1. first
2. second
  • Equation: $e^{\pi i} + 1 = 0$

Markdown syntax

Insert a image using the following syntax:

![caption](link)

Insert a link using the following syntax:

[alt](link)

Let’s practice a bit

  1. Insert a header
  2. Insert text in a blockquote
  3. Insert an image
05:00

Code: code chunk syntax

Code chunk syntax

How to insert a code chunk

Question: Where is the code chunk and where is the markdown text?

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

```{r}
data(iris)
head(iris)
```

Code chunk options

```{r}
#| label: chunk-options
#| fig-cap: "Comparison on Sepal Length and Sepal Width in the iris dataset"
#| echo: true 

library(ggplot2)
data(iris)
ggplot(
      data=iris,
      aes(x = Sepal.Length,
          y = Sepal.Width)
  ) +
geom_point(
  aes(color=Species, shape=Species)
) +
xlab("Sepal Length") +
ylab("Sepal Width") +
ggtitle("Sepal Length-Width")
```

Code chunk options

Option Description
eval Evaluate the code chunk (if false, just echos the code into the output).
echo Include the source code in output.
output Include the results of executing the code in the output (true, false, or asis to indicate that the output is raw markdown and should not have any of Quarto’s standard enclosing markdown).
warning Include warnings in the output.
error Include errors in the output (note that this implies that errors executing code will not halt processing of the document).
include Catch-all for preventing any output (code or results) from being included.

Inline code

It is also possible to include inline code in the text.

Dans le dataset iris, il y a une taille d'échantillonnage 
de `{r} nrow(iris)` individus pour l'ensemble des espèces.

Which gives:

Dans le dataset iris, il y a une taille d’échantillonnage de 150 individus pour l’ensemble des espèces.

Let’s practice a bit

  1. Create a figure using the ggplot2 package
  2. Write a short text that gives the mean and standard deviation for each species of the Sepal.Length variable in the iris dataset.
  3. Write a text that quickly describe the figure and make a reference to the figure in the text
15:00

Deployment on the HTML report on Github

Deployment on Github

  • Quarto documents can be deployed on GitHub Pages
  • GitHub Pages is a static site hosting service that takes files from a repository on GitHub and publishes them as a website

Step 1. Create a Quarto project

To do so, we have to transform our Quarto document into Quarto project. A Quarto project is a directory that contains a _quarto.yml file.

project:
  type: website
  output-dir: docs

Step 2. Render the Quarto document

We have to render the Quarto document to HTML. This will create a docs folder in the project directory, as specified in the _quarto.yml file.

This folder will contain the HTML files of the Quarto document. Then we commit and push the changes to the remote repository.

Step 3. Configure GitHub Pages

Finally, configure your GitHub repository to publish from the docs directory of your main branch: