env var overrides for globalArguments silently change which environment a model targets

When building test or diagnostic workflows, some steps may fail due to external constraints (e.g., billing plan limitations, optional features not configured) rather than actual errors. Currently, any failed step marks the entire job and workflow as "failed", even when the failure is expected and acceptable.

There is no way to mark a step as "optional" or "allowed to fail." The existing dependency condition system (completed, always, etc.) controls whether downstream steps run, but doesn't prevent the failed step from counting toward the overall workflow status.

Proposed Solution

Add an allowFailure (or continueOnError) boolean flag to the step schema:

steps:
  - name: check-log-streaming
    description: Check log streaming config (may 403 on free plans)
    allowFailure: true
    task:
      type: model_method
      modelIdOrName: my-log-config
      methodName: get
      inputs:
        logType: configuration

When allowFailure: true:

If the step succeeds, it reports as succeeded as normal
If the step fails, it reports as something like failed_allowed or warning instead of failed
The step's failure does NOT propagate to the job or workflow status
Downstream steps with dependsOn: succeeded still skip (the step did fail), but dependsOn: completed would fire

Use Case

We have a monolith test workflow (test-all-models) that exercises all 10 Tailscale model types in parallel. Two of the steps call logConfig.get, which returns HTTP 403 ("feature not available on current billing plan") on free-tier tailnets. The API call and error handling work correctly — it's just that the feature isn't available.

Without allowFailure, the workflow reports as "failed" even though 14 of 16 steps pass and the 2 failures are expected. This makes the workflow unusable as an automated health check because the exit code is always non-zero.

Alternatives Considered

Remove the steps: Works but loses visibility into which features are available
Restructure into separate workflows: Adds complexity without solving the underlying problem
Use dependency conditions: completed/always let downstream steps run, but the overall workflow status is still "failed"

02Bog Flow

Closed

No activity in this phase yet.

03Sludge Pulse

Quality harness issue 1 by qualityfounder

Quality harness issue 11 by qualitymaven

Quality harness issue 10 by qualitymaven

Quality harness issue 9 by qualitymaven

Quality harness issue 8 by qualitymaven

Quality harness issue 7 by qualitymaven

Quality harness issue 6 by qualitymaven

Quality harness issue 5 by qualitymaven

Quality harness issue 4 by qualitymaven

Quality harness issue 3 by qualitymaven

Quality harness issue 2 by qualitymaven

Quality harness issue 1 by qualitymaven

Quality harness issue 1 by qualityrook

env var overrides for globalArguments silently change which environment a model targets

deno lint no-import-prefix conflicts with swamp's required npm:/jsr: inline specifiers

AI agents should search community extensions before offering to create custom models

feat: add --content-type filter to extension search

Add a 'reports' feature

Port remaining CLI commands to libswamp + renderer pattern

Persistent runner / server mode to eliminate per-invocation CLI startup overhead

AI agent should prefer fan-out model methods over shell loops for fleet operations

Support unbundled helper scripts in extension packages

Add progress indicator for long-running model methods

Automatic trace context propagation across workflow steps

Native OpenTelemetry tracing for swamp CLI internals

Claude Code should prefer extending swamp extensions over using external CLI tools

Framework log lines written to stdout pollute model method output

Swamp skill should guide AI agents to create extension models instead of inline shell scripts

Shared team collectives for collaborative extension publishing

Delete methods should return last known state data

Add smoke-test skill for extension models

Workflow run output is unreadable — resource data and errors are buried in log noise

Recommend or manage lockfile strategy for extension model npm dependencies

Add macOS Keychain vault type

bug: Vault secret shell escaping incomplete — $(), semicolons, and pipes are interpreted

Swamp should support "Scheduled"-type status for execution

feat: add support for Nix via a flake

ModelType.normalize replaces all dots with slashes, corrupting dotted type names

fix: collective validator error message omits drivers and datastores

Skill docs: document configSchema inline requirement and .describe() constraints for vault extensions

It takes 42 seconds to run swamp --help

After today upgrade

Auto-resolver fails when extensions/models/ directory does not exist for vault-only extensions

Azure Key Vault extension vault bundle fails to load in compiled binary

RangeError: Maximum call stack size exceeded when loading extension models with npm dependencies after upgrade to 20260317

Extension Zod schema incompatibility after upgrade to 20260312

Data garbage collection never fully removes expired data from disk

extension push --dry-run requires authentication unnecessarily

Typed context APIs to eliminate no-explicit-any in extension models

AWS extensions fail to load: missing _lib/aws.ts shared dependency

getContent() should resolve vault expressions for sensitive fields

getContent() should return parsed objects, not raw Uint8Array

Add context.readResource() to MethodContext for user extension models

Datastore lock: allow concurrent reads across different models

extension install should be an alias of extension pull

feat: Execution Driver Abstraction — Pluggable Isolation for Model Execution

Support --global-arg on model create

consider a libswamp

Running workflow should not block read-only operations

Update from 20260227 to 20260311 breaks existing model definitions (symlink + path-based type)

loadUserModels ignores --repo-dir, uses cwd instead

datastore lock release --force deadlocks on the stale lock it's trying to release

Datastore lock not released when interactive command fails in non-TTY context

vault put to 1Password stores empty password field

Leaderboard shows 0-event users whose profiles 404

Data instance name uniqueness is global across specs instead of per-spec

Audit timeline misses today's entries in UTC+ timezones

AI agent discoverability: reduce trial-and-error for CLI syntax

GlobalArguments CEL expressions fail when factory inputs are omitted — selective evaluation not working

Document factory pattern for model reuse in skills

Delete method uses identifier as data name instead of instance name

CLI method run ignores required arguments without .default()

Workflow run output mixes data, errors, and logs without clear separation — hard to find what actually executed

data.latest() in workflow step inputs resolved at workflow start, not at step execution time

Update method fails when globalArguments contain cross-model CEL expressions

fix: extension npm packages fail to resolve in compiled binary

swamp repo upgrade replaces entire CLAUDE.md with generic template instead of merging

Add drift detection to reconcile stored state against live cloud resources

Model inputs schema validated on all methods — delete fails requiring create-time inputs

Support wait-until-ready semantics for async resource provisioning