Add smoke-test skill for extension models
Opened by swampadmin · 1/17/2026
Problem statement
When developing extension models, AI agents frequently believe the model is correct after unit tests pass, but subtle corner-case bugs survive into pushes and extension publishes. These bugs are typically discovered only during manual smoke testing against live APIs — after the code has already been committed or published. Examples from real development sessions:
- Content-Type mismatches: v2 API required
application/vnd.api+jsonbut the model usedapplication/json— unit tests with stubbed fetch didn't catch this - Stale bundle caching: Source fixes weren't reflected at runtime because
.swamp/bundles/*.jswasn't cleared — agents didn't know about this caching layer - API validation quirks: Honeycomb boards require
type: "flexible"in the body, not just aname— only discovered during live create - delete_protected defaults: Honeycomb creates environments with
delete_protected: true, making delete fail unless update is called first - Read-only resource guards: Attempting create/update/delete on read-only resources like
dataset-definitionsorauthshould be rejected before making API calls
These are the kinds of issues that unit tests with mocked responses can't catch, but a structured smoke-test protocol would.
Proposed solution
A swamp-smoke-test skill that agents can invoke (or that hooks trigger automatically) before git push, swamp extension push, or similar publish actions. The skill would:
- Discover the extension's method surface: Parse the model to enumerate all methods × resource types × argument combinations
- Generate a smoke-test plan: For each method, identify:
- Safe read-only operations (GET/list) that can run against live APIs without side effects
- CRUD cycle candidates: resources that can be safely created, updated, and deleted (with unique test names to avoid collisions)
- Error-path tests: missing required args, read-only resource rejection, invalid auth
- Corner cases specific to the API: required fields beyond
name, default flags that block deletion, etc.
- Execute the plan: Run each test via
swamp model method run, verify success/failure matches expectations - Report results: Produce a structured summary table (method × resource × result) suitable for PR descriptions
- Clean up: Ensure all created test resources are deleted, even if intermediate steps fail
Key design considerations
- The skill should be API-aware but not API-specific — it reads the model's method schemas and resource registry to generate tests, rather than hard-coding per-service knowledge
- It should never touch pre-existing resources — all created resources use unique names (e.g.
smoke-test-{resource}-{timestamp}) - It should handle permission errors gracefully — a 401 on
slosbecause the key lacks permission is not a test failure, it's an expected constraint - Bundle cache clearing (
.swamp/bundles/) should be part of the pre-test setup - The skill could optionally integrate with git hooks to block pushes when smoke tests fail
Alternatives considered
- Manual smoke testing: Current approach — works but is tedious, error-prone, and depends on the agent remembering to do it
- Enhanced unit tests: Better mocks could catch some issues, but can't catch Content-Type mismatches, bundle caching, or API validation quirks that only surface with real HTTP calls
- CI-based integration tests: Would require live API credentials in CI, which adds secret management complexity
Additional context
This was motivated by developing the @bixu/honeycomb extension, where multiple bugs survived unit tests and were only caught during manual smoke testing sessions. The pattern of "agent thinks it's done → smoke test reveals bugs → fix → re-test" repeated across several sessions. A skill that codifies this testing protocol would catch these issues earlier and more consistently.
Open
No activity in this phase yet.
Sign in to post a ripple.