TDD Policy - Atlantis Core Docs

AtlantisV2 treats tests as part of the architecture. The tests are not just regression checks. They are how the core boundary is kept honest while legacy Atlantis behavior is extracted.

Required Loop

For core behavior, migrations, and bug fixes:

failing test or parity fixture
  -> smallest implementation
  -> refactor
  -> verify

Documentation-only edits, mechanical formatting, generated files, and pure renames are exempt. Adapter wiring can be pragmatic, but adapter boundaries still need tests.

What Must Be Tested

Every core slice should test at least:

deterministic output
reference-time cutoff
missing, stale, invalid, and withheld handling
reason codes for rejects and defers
lineage and schema version propagation
old/new parity when replacing legacy behavior
no venue, broker, database, or UI leakage into core objects

Current Coverage Shape

The current suite covers:

Test group	Protects
`test_time_model.py`	reference time, clocks, cycle/session/freshness scaffolds
`test_market_snapshot.py`	typed capability blocks and strict capability lookup
`test_snapshot_builder.py`	event dedupe, ordering, invalid payloads, freshness, future cutoff
`test_feature_kernel_*`	feature registry execution, lookback windows, future safety
`test_pattern_engine.py`	availability, staged matching, definition hashes, evidence
`test_scoring_engine.py`	confluence, conflict, quality penalties, mixed-underlying guard
`test_signal_intent.py`	thesis identity, preferred exposure, invalidation, target, evidence
`test_permission_decision.py`	status precedence, kill switch, duplicate suppression, reason codes
`test_lifecycle_*`	admission, opening, marks, closes, outcomes, idempotency
`test_adapter_boundary.py`	raw adapter payloads stop at `EventEnvelope`
`test_legacy_outcome_parity.py`	legacy path-vs-economic outcome semantics

Agent Rules

Codex and Claude Code should follow the same policy:

Read the relevant core docs and current implementation.
Identify the legacy behavior if the slice migrates old Atlantis logic.
Write the failing test or parity fixture first.
Implement the smallest contract or behavior needed.
Keep adapters and persistence out of core modules.
Run python -m pytest, ruff check ., and ruff format --check ..
Update tasks/todo.md.
Add durable lessons to tasks/lessons.md after corrections.

Stop Conditions

Stop and re-plan if:

core code starts importing Delta, Zerodha, OpenAlgo, FastAPI, TimescaleDB, or React
a feature reads hidden latest data instead of reference-time-bounded input
a missing value is collapsed into 0.0
a broker order leaks into signal intent, permission, lifecycle, or outcome records
replay and live would require different object shapes

Those are architectural failures, not implementation details.