Skip to main content
← Back to list
01Issue
FeatureOpenSwamp Club
AssigneesNone

Add smoke-test skill for extension models

Opened by swampadmin · 1/17/2026

Problem statement

When developing extension models, AI agents frequently believe the model is correct after unit tests pass, but subtle corner-case bugs survive into pushes and extension publishes. These bugs are typically discovered only during manual smoke testing against live APIs — after the code has already been committed or published. Examples from real development sessions:

  • Content-Type mismatches: v2 API required application/vnd.api+json but the model used application/json — unit tests with stubbed fetch didn't catch this
  • Stale bundle caching: Source fixes weren't reflected at runtime because .swamp/bundles/*.js wasn't cleared — agents didn't know about this caching layer
  • API validation quirks: Honeycomb boards require type: "flexible" in the body, not just a name — only discovered during live create
  • delete_protected defaults: Honeycomb creates environments with delete_protected: true, making delete fail unless update is called first
  • Read-only resource guards: Attempting create/update/delete on read-only resources like dataset-definitions or auth should be rejected before making API calls

These are the kinds of issues that unit tests with mocked responses can't catch, but a structured smoke-test protocol would.

Proposed solution

A swamp-smoke-test skill that agents can invoke (or that hooks trigger automatically) before git push, swamp extension push, or similar publish actions. The skill would:

  1. Discover the extension's method surface: Parse the model to enumerate all methods × resource types × argument combinations
  2. Generate a smoke-test plan: For each method, identify:
    • Safe read-only operations (GET/list) that can run against live APIs without side effects
    • CRUD cycle candidates: resources that can be safely created, updated, and deleted (with unique test names to avoid collisions)
    • Error-path tests: missing required args, read-only resource rejection, invalid auth
    • Corner cases specific to the API: required fields beyond name, default flags that block deletion, etc.
  3. Execute the plan: Run each test via swamp model method run, verify success/failure matches expectations
  4. Report results: Produce a structured summary table (method × resource × result) suitable for PR descriptions
  5. Clean up: Ensure all created test resources are deleted, even if intermediate steps fail

Key design considerations

  • The skill should be API-aware but not API-specific — it reads the model's method schemas and resource registry to generate tests, rather than hard-coding per-service knowledge
  • It should never touch pre-existing resources — all created resources use unique names (e.g. smoke-test-{resource}-{timestamp})
  • It should handle permission errors gracefully — a 401 on slos because the key lacks permission is not a test failure, it's an expected constraint
  • Bundle cache clearing (.swamp/bundles/) should be part of the pre-test setup
  • The skill could optionally integrate with git hooks to block pushes when smoke tests fail

Alternatives considered

  • Manual smoke testing: Current approach — works but is tedious, error-prone, and depends on the agent remembering to do it
  • Enhanced unit tests: Better mocks could catch some issues, but can't catch Content-Type mismatches, bundle caching, or API validation quirks that only surface with real HTTP calls
  • CI-based integration tests: Would require live API credentials in CI, which adds secret management complexity

Additional context

This was motivated by developing the @bixu/honeycomb extension, where multiple bugs survived unit tests and were only caught during manual smoke testing sessions. The pattern of "agent thinks it's done → smoke test reveals bugs → fix → re-test" repeated across several sessions. A skill that codifies this testing protocol would catch these issues earlier and more consistently.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

1/17/2026, 5:48:32 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.