I've seen teams spend weeks setting up elaborate CI/CD pipelines that break on day one of actual use. The problem isn't GitHub Actions—it's that most engineers treat workflows like magic incantations copied from Stack Overflow. After maintaining CI/CD for teams ranging from 5 to 50+ engineers, I've learned that good automation isn't about having the fanciest YAML. It's about understanding the trade-offs and building pipelines that fail fast, recover gracefully, and don't become a bottleneck.
The Baseline: Fast Feedback Loops
Your CI pipeline has one job: tell engineers if they broke something, as fast as possible. Every minute your pipeline takes is a minute an engineer is context-switching or waiting. I aim for sub-5-minute feedback on PRs. Here's a production workflow that prioritizes speed through parallelization and caching:
name: CI
on:
pull_request:
push:
branches: [main]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci --prefer-offline
- name: Lint
run: npm run lint
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [18, 20]
steps:
- uses: actions/checkout@v4
- name: Setup Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci --prefer-offline
- name: Run tests
run: npm test -- --coverage --maxWorkers=2
- name: Upload coverage
if: matrix.node-version == 20
uses: codecov/codecov-action@v3
with:
fail_ci_if_error: true
build:
runs-on: ubuntu-latest
needs: [lint, test]
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- name: Install dependencies
run: npm ci --prefer-offline
- name: Build
run: npm run build
- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: build
path: dist/
retention-days: 7
--prefer-offline flag and cache: 'npm'. These small optimizations cut 30-60 seconds from every run. Also, --maxWorkers=2 prevents Jest from spawning too many workers in CI, which paradoxically slows things down.Conditional Workflows: Don't Run What You Don't Need
One of the biggest mistakes I see is running the entire pipeline for every change. If someone updates documentation, you don't need to run integration tests. GitHub Actions supports path filtering, but here's the pattern I actually use in production—it's more explicit and easier to debug:
name: Smart CI
on:
pull_request:
paths:
- 'src/**'
- 'tests/**'
- 'package*.json'
- '.github/workflows/**'
jobs:
changes:
runs-on: ubuntu-latest
outputs:
backend: ${{ steps.filter.outputs.backend }}
frontend: ${{ steps.filter.outputs.frontend }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v2
id: filter
with:
filters: |
backend:
- 'src/api/**'
- 'src/db/**'
frontend:
- 'src/components/**'
- 'src/pages/**'
test-backend:
needs: changes
if: needs.changes.outputs.backend == 'true'
runs-on: ubuntu-latest
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Run backend tests
run: npm run test:api
env:
DATABASE_URL: postgresql://postgres:postgres@localhost:5432/test
test-frontend:
needs: changes
if: needs.changes.outputs.frontend == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run frontend tests
run: npm run test:ui
This approach saves significant CI minutes on large repos. The dorny/paths-filter action is more reliable than GitHub's built-in path filtering, and the explicit outputs make it clear what's running and why.
Deployment: Progressive Rollouts with Confidence
Deployment workflows need guardrails. I use environments with protection rules and required reviewers for production, but here's the part most tutorials skip: integration with your actual infrastructure. This example deploys to AWS but shows the pattern for any platform:
name: Deploy
on:
push:
branches: [main]
jobs:
deploy-staging:
runs-on: ubuntu-latest
environment:
name: staging
url: https://staging.example.com
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Deploy to staging
run: |
aws s3 sync dist/ s3://staging-bucket/ --delete
aws cloudfront create-invalidation --distribution-id ${{ secrets.STAGING_CF_ID }} --paths "/*"
- name: Run smoke tests
run: npm run test:smoke -- --env=staging
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment:
name: production
url: https://example.com
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Deploy to production
run: |
aws s3 sync dist/ s3://prod-bucket/ --delete
aws cloudfront create-invalidation --distribution-id ${{ secrets.PROD_CF_ID }} --paths "/*"
- name: Notify deployment
if: always()
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: 'Production deployment ${{ job.status }}'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
role-to-assume pattern uses GitHub's OIDC provider to get temporary AWS credentials. No long-lived secrets in your repo. This is significantly more secure and is how you should authenticate to cloud providers in 2024.Reusable Workflows: DRY for YAML
Once you have multiple repos, you'll want to share workflow logic. GitHub supports reusable workflows, which I use extensively. Here's a reusable workflow for Node.js testing that I call from multiple repositories:
- Create a
.github/workflows/reusable-node-test.ymlin a central repo withon: workflow_call - Define inputs for customization (Node version, test command, etc.)
- Call it from other repos using
uses: org/repo/.github/workflows/reusable-node-test.yml@main - Version your reusable workflows with tags, not
@main, once they're stable - Keep secrets at the caller level—reusable workflows inherit secrets from the calling workflow
The key insight: reusable workflows are for process, not configuration. If you find yourself passing 15 inputs to customize behavior, you're doing it wrong. Instead, encode your team's standards (run these checks, in this order, with these quality gates) and let individual repos provide minimal configuration.
Debugging and Observability
When workflows fail at 2 AM, you need visibility. Enable debug logging with ACTIONS_STEP_DEBUG and ACTIONS_RUNNER_DEBUG secrets. Use job summaries to surface important information directly in the Actions UI. Most importantly: make your failure messages actionable. Don't just say 'tests failed'—tell engineers which test failed and link to the logs. I add custom annotations using echo "::error file=app.js,line=10::Something broke here" to make failures scannable.
GitHub Actions isn't perfect—the YAML can get verbose, the minute limits on free tiers are restrictive, and debugging can be painful. But it's good enough for most teams, and the integration with GitHub's ecosystem is unmatched. Focus on fast feedback, clear failures, and progressive rollouts. Your future self (and your team) will thank you.