Skip to content

A GitHub action to build data science environment images with repo2docker and push them to registries.

License

Notifications You must be signed in to change notification settings

jupyterhub/repo2docker-action

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

MLOps

repo2docker GitHub Action

Trigger repo2docker to build a Jupyter enabled Docker image from your GitHub repository and push this image to a Docker registry of your choice. This will automatically attempt to build an environment from configuration files found in your repository in the manner described here.

Read the full docs on repo2docker for more information: https://repo2docker.readthedocs.io

Images generated by this action are automatically tagged with both latest and <SHA> corresponding to the relevant commit SHA on GitHub. Both tags are pushed to the Docker registry specified by the user. If an existing image with the latest tag already exists in your registry, this Action attempts to pull that image as a cache to reduce uncessary build steps.

What Can I Do With This Action?

  • Use repo2docker to pre-cache images for your own BinderHub cluster, or for mybinder.org.
    • You can use this Action to pre-cache Docker images to a Docker registry that you can reference in your repo. For example if you have the file Dockerfile in the binder/ directory relative to the root of your repository with the following contents, this will allow Binder to start quickly by pulling an image you have already built:

      # This is the image that is built and pushed by this Action (replace this with your image name)
      FROM myorg/myimage:latest
      ...
  • Provide a way to Dockerize data science repositories with Jupyter server enabled that you can deploy to VMs, serverless computing or other services that can serve Docker containers as-a-service.
  • Maximize reproducibility by allowing authors, without any prior knowledge of Docker, to build and share containers.
  • Run tests after the image has been built, to make sure package changes don't break your code.

API Reference

See the examples section is very helpful for understanding the inputs and outputs of this Action.

Optional Inputs

  • DOCKER_USERNAME: description: Docker registry username. If not supplied, credentials must be setup ahead of time.
  • DOCKER_PASSWORD: description: Docker registry password or access token (recommended). If not supplied, credentials must be setup ahead of time.
  • DOCKER_REGISTRY: description: domain name of the docker registry. If not supplied, this defaults to DockerHub
  • IMAGE_NAME: name of the image. Example - myusername/myContainer. If not supplied, this defaults to <DOCKER_USERNAME>/<GITHUB_REPOSITORY_NAME> or <GITHUB_ACTOR>/<GITHUB_REPOSITORY_NAME>.
  • NOTEBOOK_USER: description: username of the primary user in the image. If this is not specified, this is set to joyvan. NOTE: This value is also overriden with jovyan if the parameters BINDER_CACHE or MYBINDERORG_TAG are provided.
  • REPO_DIR: Path inside the image where contents of the repositories are copied to, and where all the build operations (such as postBuild) happen. Defaults to /home/<NOTEBOOK_USER> if not set.
  • APPENDIX_FILE: Path to file containing Dockerfile commands to run at the end of the build. Can be used to customize the resulting image after all standard build steps finish.
  • LATEST_TAG_OFF: Setting this variable to any value will prevent your image from being tagged with latest. Note that your image is always tagged with the GitHub commit SHA.
  • ADDITIONAL_TAG: An optional string that specifies the name of an additional tag you would like to apply to the image. Images are already tagged with the relevant GitHub commit SHA.
  • NO_PUSH: If "true". this variable will prevent any images from being pushed to a registry. Furthermore, verbose logging will be enabled in this mode. Default value is "false".
  • BINDER_CACHE: Setting this variable to any value will add the file binder/Dockerfile that references the docker image that was pushed to the registry by this Action. You cannot use this option if the parameter NO_PUSH is set. This is disabled by default.
    • Note: This Action assumes you are not explicitly using Binder to build your dependencies (You are using this Action to build your dependencies). If a directory binder with other files other than Dockerfile or a directory named .binder/ is detected, this step will be aborted. This Action does not support caching images for Binder where dependencies are defined in binder/Dockerfile (if you are defining your dependencies this way, you probably don't need this Action).

      When this parameter is supplied, this Action will add/override binder/Dockerfile in the branch checked out in the Actions runner:

      ### DO NOT EDIT THIS FILE! This Is Automatically Generated And Will Be Overwritten ###
      FROM <IMAGE_NAME>
  • COMMIT_MSG: The commit message associated with specifying the BINDER_CACHE flag. If no value is specified, the default commit message of Update image tag will be entered.
  • MYBINDERORG_TAG: This the Git branch, tag, or commit that you want mybinder.org to proactively build from your repo. This is useful if you wish to reduce startup time on mybinder.org. Your repository must be public for this work, as mybinder.org only works with public repositories.
  • PUBLIC_REGISTRY_CHECK: Setting this variable to any value will validate that the image pushed to the registry is publicly visible.
  • REPO2DOCKER_EXTRA_ARGS: Any extra commandline parameters to be passed to the repo2docker build command
  • FORCE_REPO2DOCKER_VERSION: Force a specific version of repo2docker to be installed. Either in the form of repo2docker==<version> for install from pypi, or git+https://<github-url>@<commit-hash for installing from a github repo / branch

Outputs

  • IMAGE_SHA_NAME The name of the docker image, which is tagged with the SHA.
  • IMAGE_SHA_TAG The short tag used as the 'tag' part (after the :) of the docker image name.
  • PUSH_STATUS: This is false if NO_PUSH is provided or true otherwhise.

Testing the built image

You can automatically test your built image to make sure package additions or removals do not break your code, allowing you to make changes with confidence. pytest is used to run the tests, and pytest-notebook is used to run any Jupyter Notebooks as tests.

This works with any Jupyter kernel. This action will use the Jupyter kernel defined in any notebook you put in image-tests/. This can be used to execute and test notebooks from any language.

Warning

If you are using a Dockerfile to configure your environment, then include COPY --chown=${NB_USER}:${NB_USER} image-tests ${REPO_DIR}/image-tests in your Dockerfile to ensure the correct ownership of this folder within the image.

To use automatic image testing, follow these steps:

  1. Create a directory named image-tests/ in your GitHub repository.
  2. Any .py files you add inside this directory will be discovered and run with pytest inside the built image after the image has successfully built.
  3. Any Jupyter Notebook (.ipynb) files inside this directory will be run with pytest-notebook, and the notebook is considered to have failed if the outputs of the code execution do not match the outputs already in the notebook. A nice diff of the outputs is shown if they differ. See the pytest-notebook docs for more information.
  4. Optionally, a requirements.txt file inside the image-tests/ directory can list additional libraries installed just for the test.

For example, look at the following image environment repository structure:

my-image/
├── environment.yml
└── image-tests
    ├── mytestnotebook.ipynb
    └── mytest.py

This defines three things:

  • environment.yml is a repo2docker environment file, which defines the packages for the user image
  • image-tests/mytestnotebook.ipynb is a Jupyter notebook that is already executed so its outputs are included in the ipynb file. When the image is built, this notebook will be re-executed, and the outputs compared against the version stored with the repository.
  • image-tests/mytest.py is a Python file that will be run with Pytest, and any failures will be reported.

Examples

mybinder.org

A very popular use case for this Action is to cache builds for mybinder.org. If you desire to cache builds for mybinder.org, you must specify the argument MYBINDERORG_TAG. Some examples of doing this are below:

Cache builds on mybinder.org

Proactively build your environment on mybinder.org for any branch. Alternatively, you can use using GitHub Actions to build an image for BindHub generally, including mybinder.org.

name: Binder
on: [push]

jobs:
  Create-MyBinderOrg-Cache:
    runs-on: ubuntu-latest
    steps:
    - name: cache binder build on mybinder.org
      uses: jupyterhub/repo2docker-action@master
      with:
        NO_PUSH: true
        MYBINDERORG_TAG: ${{ github.event.ref }} # This builds the container on mybinder.org with the branch that was pushed on.

Cache Builds On mybinder.org And Provide A Link

Same example as above, but also comment on a PR with a link to the binder environment. Commenting on the PR is optional, and is included here for informational purposes only. In this example the image will only be cached when the pull request is opened but not if the pull request is updated with subsequent commits.

In this example the image will only be cached when the pull request is opened but not if the pull request is updated with subsequent commits.

name: Binder
on:
  pull_request:
    types: [opened, reopened]

jobs:
  Create-Binder-Badge:
    runs-on: ubuntu-latest
    steps:
    - name: cache binder build on mybinder.org
      uses: jupyterhub/repo2docker-action@master
      with:
        NO_PUSH: true
        MYBINDERORG_TAG: ${{ github.event.pull_request.head.ref }}

    - name: comment on PR with Binder link
      uses: actions/github-script@v1
      with:
        github-token: ${{secrets.GITHUB_TOKEN}}
        script: |
          var BRANCH_NAME = process.env.BRANCH_NAME;
          github.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: `[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/${context.repo.owner}/${context.repo.repo}/${BRANCH_NAME}) :point_left: Launch a binder notebook on this branch`
          })
      env:
        BRANCH_NAME: ${{ github.event.pull_request.head.ref }}

Use GitHub Actions To Cache The Build For BinderHub

Instead of forcing mybinder.org to cache your builds, you can optionally build a Docker image with GitHub Actions and push that to a Docker registry, so that any BinderHub instance, including mybinder.org only has to pull the image. This might give you more control than triggering a build directly on mybinder.org like the method illustrated above. In this example, you must supply the secrets DOCKER_USERNAME and DOCKER_PASSWORD so that Actions can push to DockerHub. Note that, instead of your actual password, you can use an access token — which may be a more secure option.

In this case, we set BINDER_CACHE to true to enable this option. See the documentation for the parameter BINDER_CACHE in the Optional Inputs section for more information.

name: Test
on: push

jobs:
  binder:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout Code
      uses: actions/checkout@v2
      with:
        ref: ${{ github.event.pull_request.head.sha }}

    - name: update jupyter dependencies with repo2docker
      uses: jupyterhub/repo2docker-action@master
      with:
        DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
        BINDER_CACHE: true
        PUBLIC_REGISTRY_CHECK: true

Push Repo2Docker Image To DockerHub

We recommend creating a personal access token and use that as DOCKER_PASSWORD instead of using your dockerhub password.

name: Build Notebook Container
on: [push] # You may want to trigger this Action on other things than a push.
jobs:
  build:
    runs-on: ubuntu-latest
    steps:

    - name: checkout files in repo
      uses: actions/checkout@main

    - name: update jupyter dependencies with repo2docker
      uses: jupyterhub/repo2docker-action@master
      with:
        DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}

Push Repo2Docker Image To quay.io

DockerHub now has some pretty strong rate limits, so you might want to push to a different docker repository. quay.io is a popular place, and isn't tied to any particular cloud vendor.

  1. Login to quay.io

  2. Create a new repository. This will determine the name of your image, and you will push / pull from it. Your image name will be quay.io/<username>/<repository-name>.

  3. Go to your account settings (under your name in the top right), and select the 'Robot Accounts' option on the left menu.

  4. Click 'Create Robot account', give it a memorable name (such as <hub-name>_image_builder) and click 'Create'

  5. In the next screen, select the repository you just created in step (2), and give the robot account Write permission to the repository.

  6. Once done, click the name of the robot account again. This will give you its username and password.

  7. Create these GitHub secrets for your repository with the credentials from the robot account:

    1. QUAY_USERNAME: user name of the robot account
    2. QUAY_PASSWORD: password of the robot account
  8. Use the following config for your github action.

    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
    
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with: # make sure username & password/token matches your registry
            DOCKER_USERNAME: ${{ secrets.QUAY_USERNAME }}
            DOCKER_PASSWORD: ${{ secrets.QUAY_PASSWORD }}
            DOCKER_REGISTRY: "quay.io"
            IMAGE_NAME: "<quay-username>/<repository-name>"
    

Push Repo2Docker Image To Amazon ECR

  1. Login to Amazon AWS Console

  2. Create an individual IAM user who's access key will be used by the GitHub Actions. Make sure the user has permissions to make calls to the Amazon ECR APIs and to push/pull images to the repositories you need. Checkout and follow Amazon IAM best practices for the AWS credentials used in GitHub Actions workflows.

  3. Create a new private repository. This will determine the name of your image, and you will push / pull from it. Your image name will be <aws-account-id>.dkr.ecr.<aws-region>.amazonaws.com/<username>/<repository-name>.

  4. Go to the IAM dashboard, 'Users' section and click on the username created at Step 2. Click on 'Security credentials' tab, right below the 'Summary' section. In the 'Access keys' section, click on the 'Create access key' button. Once done, it will give you an 'Access key ID' and the 'Secret access key'.

  5. Create these GitHub secrets for your repository with the credentials from the robot account:

    1. AWS_ACCESS_KEY_ID: access key id of the IAM user
    2. AWS_SECRET_ACCESS_KEY: secret access key of the IAM user
  6. Use the following config for your github action.

    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
        env:
          DOCKER_CONFIG: $HOME/.docker
        steps:
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: Configure AWS Credentials
          uses: aws-actions/configure-aws-credentials@v1
          with:
            aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
            aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
            aws-region: <region>
    
        - name: Login to Amazon ECR
          id: login-ecr
          uses: aws-actions/amazon-ecr-login@v1
    
    
        - name: Update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with:
            DOCKER_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
            IMAGE_NAME: "<aws-username>/<repository-name>"
    

Push Repo2Docker Image To Google Container Registry

  1. Login to Google Cloud Console

  2. Create (or use an existing) Google Cloud Project with the billing activated. This will be the place where the registry hosting the repo2docker image will live.

  3. Make sure Container Registry API is enabled for this project.

  4. The repository will be created automatically once the first image is pushed. Your image name will be grc.io/<gcp-project-id>/<repository-name>.

  5. Create a Service Account to authenticate the calls made by GitHub Actions to our GCP project:

    • In the Cloud Console, go to the Service Accounts page.
    • Make sure the right project is selected in the drop-down menu above.
    • Click on Create Service Account
    • Enter a service account name — give it a memorable name (such as <hub-name>_image_builder).
    • Grant this service account access to project. As a best practice, grant it only the minimum permissions: Cloud Run Admin, Service Account User, and Storage Admin.
  6. Click on the service account's name you just created and select the Keys tab. Click on the ADD KEY button, select Create new key, then create a JSON key type. The private key will be saved to your computer. Make sure to store it somewhere secure!

  7. Create these GitHub secrets for your repository with the credentials from the robot account:

    1. GCP_SA_KEY: the private key of the service account created in the previous step
    2. GCP_PROJECT_ID: the id of the Google Cloud Project
  8. Use the following config for your github action.

    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
        env:
          DOCKER_CONFIG: $HOME/.docker
    
        steps:
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: Login to GCR
          uses: docker/login-action@v1
          with:
            registry: gcr.io
            username: _json_key
            password: ${{ secrets.GCP_SA_KEY }}
    
        - name: Update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with:
            DOCKER_REGISTRY: gcr.io
            IMAGE_NAME: ${{ secrets.GCP_PROJECT_ID }}/<repository-name>

Push Repo2Docker Image To Google Artifact Registry

  1. Login to Google Cloud Console

  2. Create (or use an existing) Google Cloud Project with the billing activated. This will be the place where the registry hosting the repo2docker image will live.

  3. Make sure Artifact Registry API is enabled for this project.

  4. Create a new artifact repository. This will determine the name and location of your image. Your image name will be <location>-docker.pkg.dev/<gcp-project-id>/<repository-name>

  5. Create a Service Account to authenticate the calls made by GitHub Actions to our GCP project:

    • In the Cloud Console, go to the Service Accounts page.
    • Make sure the right project is selected in the drop-down menu above.
    • Click on Create Service Account
    • Enter a service account name — give it a memorable name (such as <hub-name>_image_builder).
    • Grant this service account access to project. As a best practice, grant it only the minimum permissions: Cloud Run Admin, Service Account User, Storage Admin, Artifact Registry Repository Administrator.
  6. Click on the service account's name you just created and select the Keys tab. Click on the ADD KEY button, select Create new key, then create a JSON key type. The private key will be saved to your computer. Make sure to store it somewhere secure!

  7. Create these GitHub secrets for your repository with the credentials from the robot account:

    1. GCP_SA_KEY: the private key of the service account created in the previous step
    2. GCP_PROJECT_ID: the id of the Google Cloud Project
  8. Use the following config for your github action.

    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
        env:
          DOCKER_CONFIG: $HOME/.docker
    
        steps:
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: Login to GAR
          uses: docker/login-action@v1
          with:
            registry: <location>-docker.pkg.dev
            username: _json_key
            password: ${{ secrets.GCP_SA_KEY }}
    
        - name: Update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with:
            DOCKER_REGISTRY: <location>-docker.pkg.dev
            IMAGE_NAME: ${{ secrets.GCP_PROJECT_ID }}/<repository-name>
    

Push Repo2Docker Image To Azure Container Registry

  1. Login to Azure Portal

  2. Create a new container registry. This will determine the name of your image, and you will push / pull from it. Your image name will be <container-registry-name>.azurecr.io/<repository-name>.

  3. Go to Access Keys option on the left menu.

  4. Enable Admin user so you can use the registry name as username and admin user access key as password to docker login to your container registry.

  5. Create these GitHub secrets for your repository with the credentials from the robot account:

    1. ACR_USERNAME: the registry name
    2. ACR_PASSWORD: the access key of the admin user
  6. Use the following config for your github action.

    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
    
        steps:
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: Update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with:
            DOCKER_USERNAME: ${{ secrets.ACR_USERNAME }}
            DOCKER_PASSWORD: ${{ secrets.ACR_PASSWORD }}
            DOCKER_REGISTRY: <container-registry-name>.azurecr.io
            IMAGE_NAME: <repository-name>
    

Push Repo2Docker Image To GitHub Package Registry

You can push the image to the package registry associated with the GitHub Repository that the repo2docker files are in. No set-up is necessary.

  1. Use the following config for your github action.
    name: Build container image
    
    on: [push]
    
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
    
        - name: checkout files in repo
          uses: actions/checkout@main
    
        - name: update jupyter dependencies with repo2docker
          uses: jupyterhub/repo2docker-action@master
          with: 
            DOCKER_USERNAME: ${{github.actor}}
            DOCKER_PASSWORD: ${{secrets.GITHUB_TOKEN}}
            DOCKER_REGISTRY: "ghcr.io"
            IMAGE_NAME: "<github-username or orgname>/<repository-name>"

The image url will be ghcr.io/<github-username or orgname>/<repository-name> and appear in "packages" in the right side of the repository window.

Push Repo2Docker Image To Other Registries

If the docker registry accepts a credentials to be passed as a username and password string, you can do it like this.

name: Build Notebook Container
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:

    - name: checkout files in repo
      uses: actions/checkout@main

    - name: update jupyter dependencies with repo2docker
      uses: jupyterhub/repo2docker-action@master
      with: # make sure username & password/token matches your registry
        DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
        DOCKER_REGISTRY: "gcr.io"

If the docker registry doesn't credentials to be passed as a username and password strong, or if you want to do it in another way, you can configure credentials to the docker registry ahead of time instead. Below is an incomplete example doing that.

name: Build Notebook Container
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:

    - name: checkout files in repo
      uses: actions/checkout@main

    # TODO: add a step here to setup credentials to push to your
    #       docker registry before running the repo2docker-action

    - name: update jupyter dependencies with repo2docker
      uses: jupyterhub/repo2docker-action@master
      with:
        DOCKER_REGISTRY: your-registry.example.org
        IMAGE_NAME: your-image-name

Change Image Name

When you do not provide an image name your image name defaults to DOCKER_USERNAME/GITHUB_REPOSITORY_NAME. For example if the user hamelsmu tried to run this Action from this repo, it would be named hamelsmu/repo2docker-action. However, sometimes you may want a different image name, you can accomplish by providing the IMAGE_NAME parameter as illustrated below:

name: Build Notebook Container
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:

    - name: checkout files in repo
      uses: actions/checkout@main

    - name: update jupyter dependencies with repo2docker
      uses: jupyterhub/repo2docker-action@master
      with:
        DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
        IMAGE_NAME: "hamelsmu/my-awesome-image" # this overrides the image name

Test Image Build

You might want to only test the image build withtout pushing to a registry, for example to test a pull request. You can do this by setting the NO_PUSH parameter to 'true':

name: Build Notebook Container
on: [pull_request]
jobs:
  build-image-without-pushing:
    runs-on: ubuntu-latest
    steps:  
    - name: Checkout PR
      uses: actions/checkout@v2
      with:
        ref: ${{ github.event.pull_request.head.sha }}

    - name: test build
      uses: jupyterhub/repo2docker-action@master
      with:
        NO_PUSH: 'true'
        IMAGE_NAME: "hamelsmu/repo2docker-test"

When you specify a value for the NO_PUSH parameter, you can omit the otherwhise mandatory parameters DOCKER_USERNAME and DOCKER_PASSWORD.

Contributing To repo2docker-action

See the Contributing Guide.