jonelantha: Blog


Refactoring your GitHub Actions workflow into a script

Is your GitHub Actions workflow file getting a bit large and perhaps a little difficult to follow? Or maybe you want to share logic between multiple workflows in the same repo and want to keep things 'DRY'? In this article we'll compare using different scripting engines to extract out code from a workflow.

The problem

Here's a workflow which deploys files to an Amazon S3 bucket and then invalidates a Cloudfront Distribution using the AWS CLI. This is a common situation for anyone deploying a static website site to AWS.

name: Deployment Workflow

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    ...
    - name: Deploy
      run: |
        aws s3 sync ./public/ s3://my-test-bucket --delete        aws cloudfront create-invalidation  --distribution-id MYCLOUDFRONTID --paths /*

This example has only two commands and those two commands don't have too many command line options. However, real world solutions will often be more complex.

The best practice is to create your own GitHub Action (see the official GitHub guide for more details). There's quite a lot of overhead with this approach and if you're only working on a single repo it's probably easiest just to move the logic into a script within the same repo.

So how should our script work?

Ideally the script should tick the following boxes:

  • Fast - CI/CD is all about quick feedback, we don't want to make things noticably slower. Also, time is money with GitHub Actions!
  • Named parameters - we're trying to make our workflow file easier to read by extracting out implementation details. Let's build on this by making sure the parameters we're sending to the script are properly labelled in the workflow file.
  • Clean and maintainable - we want to make things easier to work with so let's make sure the script we build doesn't end up adding more complexity.

1. A Bash script

deploy.sh

#!/bin/bash

# Read named command line arguments into an args variable

declare -A args

while (( "$#" )); do
    if  [[ $1 == --* ]] && [ "$2" ]; then
        args[${1:2}]=$2
        shift
    fi
    shift
done

# Do the deployment

aws s3 sync ./build/ "s3://${args[bucketname]}" --delete

if [ -n "${args[cloudfrontid]}" ]; then
    aws cloudfront create-invalidation  --distribution-id "${args[cloudfrontid]}" --paths /*
fi

And inside the workflow yml file:

...
    steps:
    ...
    - name: Deploy
      run: ./scripts/deploy.sh --bucketname my-test-bucket --cloudfrontid MYCLOUDFRONTID

Pros:

  • It's fast
  • One file and no dependencies
  • The AWS commands inside the bash file are nice and clear

Cons:

  • It's a Bash script with logic in - not for everyone. That code to parse the command line is pretty impenetrable to anyone who isn't a Bash ninja...

2. A Python script

deploy.py

import argparse
import os

parser = argparse.ArgumentParser(description='Deploy')
parser.add_argument('--bucketname', dest='bucketname', required=True)
parser.add_argument('--cloudfrontid', dest='cloudfrontid')

args = parser.parse_args()

os.system(f'aws s3 sync ./build/ s3://{args.bucketname} --delete')

if args.cloudfrontid:
    os.system(f'aws cloudfront create-invalidation  --distribution-id {args.cloudfrontid} --paths /*')

And inside the workflow yml file:

...
    - name: Deploy
      run: python3 ./scripts/deploy.sh --bucketname my-test-bucket --cloudfrontid MYCLOUDFRONTID

Pros:

  • It's fast
  • One file and no dependencies
  • The code to parse the command line arguments is nice and explicit

Cons:

  • The shell commands aren't as clear as the Bash script (because of the explicit os.system call), but overall it's pretty readable.

3. A PowerShell script

deploy.ps1

param (
    [Parameter(Mandatory)]
    [string]$BucketName,
    [string]$CloudfrontID
)

iex "aws s3 sync ./build/ s3://$BucketName --delete"

if ($CloudfrontID) {
    iex "aws cloudfront create-invalidation  --distribution-id $CloudfrontID --paths /*"
}

And the entry in the workflow yml file:

...
    - name: Deploy
      run: pwsh ./scripts/deploy.ps1 -BucketName my-test-bucket -CloudfrontID MYCLOUDFRONTID

Pros:

  • One file and no dependencies
  • The command line argument expectations are clear

Cons:

  • It's a little slow to get going. 15s latency on Ubuntu (probably much faster on a Windows runner!)
  • Like the Python script, issuing the AWS commands isn't as clear as it is in the Bash script.

4. A Node script

deploy.js

const { execSync } = require('child_process');
const yargs = require('yargs');

const args = yargs.options({
  'bucketname': { demandOption: true },
  'cloudfrontid': {},
}).argv;

execSync(
  `aws s3 sync ./build/ s3://${args.bucketname} --delete`,
  {stdio: 'inherit'}
);

if (args.cloudfrontid) {
  execSync(
    `aws cloudfront create-invalidation  --distribution-id ${args.cloudfrontid} --paths /*`,
    {stdio: 'inherit'}
  );
}

And the workflow yml file:

...
    - name: Deploy
      run: node ./scripts/deploy.js --bucketname my-test-bucket --cloudfrontid MYCLOUDFRONTID

Pros:

  • Fast, no latency
  • The command line argument expectations are clear

Cons:

  • Relies on another package ('yargs') to do the argument parsing. If your project is node based then just include this in your package.json file and you're done. Otherwise you'll have to setup a mini node project within your repo, not impossible but not that nice either...
  • execSync adds some visual overhead versus just calling the command. In addition, an extra parameter is needed to make sure we see the command's output in the logs.

5. A Node based GitHub Action in the same repo

For a step by step guide to using this method see the article 'Create your own local js GitHub Action with just two files'

deploy.js

const core = require("@actions/core");
const { exec } = require("@actions/exec");

async function deploy() {
  const bucketName = core.getInput('bucket-name');
  await exec(`aws s3 sync ./build/ s3://${bucketName} --delete`);

  const cloudfrontID = core.getInput('cloudfront-id');
  if (cloudfrontID) {
    await exec(`aws cloudfront create-invalidation  --distribution-id ${cloudfrontID} --paths /*`);
  }
}

deploy()
  .catch(error => core.setFailed(error.message));

action.yml

name: "Deploy"
description: "Deploy action"
inputs:
  bucket-name:
    description: "S3 Bucket"
    required: true
  cloudfront-id:
    description: "Cloudfront ID"
runs:
  using: "node12"
  main: "deploy.js"

And the workflow yml file:

...
    - name: Deploy
      uses: ./actions/deploy        with:          bucket-name: my-test-bucket          cloudfront-id: MYCLOUDFRONTID

Pros:

  • Fast, no latency
  • Parameters are nicely documented by the action.yml file.
  • The entry in the workflow file is clearer then cramming everything onto one line (as we do with the other scripts).
  • Can be made into a standalone GitHub Action in the future.

Cons:

  • Dependant on the GitHub toolkit packages. If you're working on a node project then just add these packages to your package.json file. Otherwise you can include the package files in the repo, see the GitHub Actions documentation for an example.
  • The async/await handling adds some extra complexity.
  • Requires an additional action.yml file and also we need a certain directory structure - but on the other hand this provides a nice seperation of concerns, seperating the parameter definition from the code.

TL;DR

  • The Python script offers a good balance of clarity and maintainability.

  • Go with Bash or Powershell if that's your thing!

  • If you're working on a node project then creating a local GitHub Action is a nice option. It integrates well with the workflow yaml file and in the future you've got options to upgrade to a fully fledged action in its own repo.

    See this tutorial for a step-by-step guide to create a local javascript GitHub Action.


© 2003-2024 jonelantha