Skip to main content
← Back to list
01Issue
FeatureClosedSwamp Club
AssigneesNone

Add allowFailure flag on workflow steps

Opened by swampadmin · 1/25/2025

Problem

When building test or diagnostic workflows, some steps may fail due to external constraints (e.g., billing plan limitations, optional features not configured) rather than actual errors. Currently, any failed step marks the entire job and workflow as "failed", even when the failure is expected and acceptable.

There is no way to mark a step as "optional" or "allowed to fail." The existing dependency condition system (completed, always, etc.) controls whether downstream steps run, but doesn't prevent the failed step from counting toward the overall workflow status.

Proposed Solution

Add an allowFailure (or continueOnError) boolean flag to the step schema:

steps:
  - name: check-log-streaming
    description: Check log streaming config (may 403 on free plans)
    allowFailure: true
    task:
      type: model_method
      modelIdOrName: my-log-config
      methodName: get
      inputs:
        logType: configuration

When allowFailure: true:

  • If the step succeeds, it reports as succeeded as normal
  • If the step fails, it reports as something like failed_allowed or warning instead of failed
  • The step's failure does NOT propagate to the job or workflow status
  • Downstream steps with dependsOn: succeeded still skip (the step did fail), but dependsOn: completed would fire

Use Case

We have a monolith test workflow (test-all-models) that exercises all 10 Tailscale model types in parallel. Two of the steps call logConfig.get, which returns HTTP 403 ("feature not available on current billing plan") on free-tier tailnets. The API call and error handling work correctly — it's just that the feature isn't available.

Without allowFailure, the workflow reports as "failed" even though 14 of 16 steps pass and the 2 failures are expected. This makes the workflow unusable as an automated health check because the exit code is always non-zero.

Alternatives Considered

  • Remove the steps: Works but loses visibility into which features are available
  • Restructure into separate workflows: Adds complexity without solving the underlying problem
  • Use dependency conditions: completed/always let downstream steps run, but the overall workflow status is still "failed"
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED

Closed

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.