Skip to content

SDK Dogfooding (Self-Observability)

Use z0 primitives to observe z0 itself.

Prerequisites: core-concepts.md, SystemLedger

Updated for v0.10.0: All SDK components now use Config pattern and emit Facts for state changes.


The z0 SDK uses its own primitives to track internal behavior. This is called “dogfooding” - the SDK observes itself using the same patterns developers use for their domains.

v0.10.0 Enhancements:

  • All components use Config - ProjectionEngine, MeterEngine, CircuitBreaker, RateLimiter, ThresholdMonitor, RetryConfig, WebhookConfig
  • Fact emission everywhere - State changes emit structured Facts with FACT_TYPES constants
  • Zod schema exports - Runtime validation for all config types
  • Config versioning - Track which config version produced which behavior

When an observabilityStub is provided, SDK components automatically emit Facts about their own behavior:

  • CachedStateManager: Cache hit/miss rates
  • ProjectionEngine: Migration events (storageVersion changes)
  • MeterEngine: Usage tracking, budget checks
  • CircuitBreaker: State transitions (CLOSED→OPEN→HALF_OPEN)
  • RateLimiter: Request denial and recovery
  • ThresholdMonitor: Threshold crossings and recovery
  • SchemaManager: Schema initialization events (future)
  • HydrationManager: Replay completion events (future)

This provides platform-wide observability without special-case code. Observability is enabled automatically when you wire the stub - no mode toggle required.


SystemLedger (id: 'system')
├── Receives: SDK observability facts
└── Uses: appendFact(), same as any ledger
┌─────────────┼─────────────┐
▼ ▼ ▼
EntityLedger EntityLedger EntityLedger
(your domain) (your domain) (your domain)
└── emitSdkObservability() ──→ SystemLedger

Each EntityLedger can emit observability facts upstream to SystemLedger. This uses the same ParentDOClient pattern as config inheritance - fire-and-forget with circuit breaker protection.

For multi-tenant deployments using Cloudflare Workers for Platforms, the architecture extends to support isolated customer namespaces:

┌─────────────────────────────────────────────────────────────────┐
│ Platform Provider Account │
│ │
│ SystemLedger (platform-wide observability) │
│ │ │
│ ├── Aggregates SDK facts from all tenants │
│ └── Platform-level dashboards & alerts │
│ │ │
├────────────────────┼────────────────────────────────────────────┤
│ │ Workers for Platforms (Dispatch Namespace) │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Customer A Namespace │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │EntityLedger │ │EntityLedger │ │EntityLedger │ │ │
│ │ │ (account) │ │ (project) │ │ (invoice) │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ └────────────────┴─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ TenantSystemLedger │ │
│ │ (customer's observability facts) │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Customer B Namespace │ │
│ │ ... (same structure) │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Key patterns:

  • Each customer gets their own Durable Object namespace (isolation)
  • TenantSystemLedger aggregates observability within a customer
  • Platform SystemLedger aggregates across all customers
  • Facts flow up the hierarchy: Entity → Tenant → Platform

TenantSystemLedger extends SystemLedger with tenant-specific aggregation:

import { TenantSystemLedger, TenantSystemLedgerEnv } from '@z0-app/sdk';
export class MyTenantSystemLedger extends TenantSystemLedger {
constructor(ctx: DurableObjectState, env: TenantSystemLedgerEnv) {
super(ctx, env, {
aggregationIntervalMs: 60_000 // Flush every minute (default)
});
}
}

Features:

  • Receives SDK observability facts from tenant’s EntityLedgers
  • Aggregates stats over time (reduces write amplification)
  • Flushes to platform via PLATFORM_METRICS_QUEUE or PLATFORM_SYSTEM_LEDGER
  • Uses DO alarms for periodic flushing

Entities can contain child entities, forming a recursive tree structure. Observability facts propagate up this hierarchy:

Platform (entity)
├── Org (entity)
│ ├── Tenant (entity)
│ │ ├── Project (entity)
│ │ │ ├── Asset (entity)
│ │ │ └── Asset (entity)
│ │ └── Project (entity)
│ └── Tenant (entity)
└── Org (entity)

Each entity can:

  1. Emit facts about its own state changes
  2. Receive facts from child entities (rollup aggregation)
  3. Forward facts to parent entities (bubbling)
// Example: Project entity receives Asset facts and aggregates
class ProjectLedger extends EntityLedger<Env> {
protected async onChildFact(childId: string, fact: Fact): Promise<void> {
if (fact.type === 'asset' && fact.subtype === 'processed') {
// Aggregate asset processing into project metrics
await this.appendFact({
type: 'project',
subtype: 'asset_rollup',
data: {
asset_id: childId,
processing_time_ms: fact.data.duration_ms
}
});
}
}
}

Hierarchy benefits:

  • Natural isolation boundaries (tenant can’t see other tenant’s facts)
  • Automatic aggregation (platform sees total, tenant sees their total)
  • Consistent patterns at every level (same SDK, same primitives)

Observability is automatically enabled when you provide an observabilityStub. No mode toggle required.

Pass the stub when creating EntityLedger instances:

import { EntityLedger, BootstrapConfig } from '@z0-app/sdk';
// Optional: Configure fallback behavior
const bootstrapConfig: BootstrapConfig = {
fallback_on_failure: true, // Continue if observability target unavailable (default: true)
warn_on_fallback: true // Log warnings on fallback (default: true)
};
// In your DO class
export class AccountLedger extends EntityLedger<Env> {
constructor(ctx: DurableObjectState, env: Env) {
super(ctx, env, {
// Wire the observability target - this enables SDK observability automatically
observabilityStub: env.SYSTEM_LEDGER.get(env.SYSTEM_LEDGER.idFromName('system')),
bootstrapConfig // Optional
});
}
}
[[durable_objects.bindings]]
name = "SYSTEM_LEDGER"
class_name = "SystemLedger"
[[durable_objects.bindings]]
name = "ACCOUNT_LEDGER"
class_name = "AccountLedger"
worker.ts
export { SystemLedger, AccountLedger } from './ledgers';
export default {
async fetch(request: Request, env: Env) {
// Your routing logic...
}
};

Simply omit the observabilityStub option. The SDK operates normally without emitting observability facts.


All SDK components support the z0 Config pattern for versioned, auditable configuration.

Traditional configuration has problems:

  • No audit trail of changes
  • No rollback capability
  • Can’t track which config version produced which behavior
  • No validation at runtime

The Config pattern solves these:

interface Config<T> {
id: string; // Unique config identifier
type: string; // Config category
scope: 'platform' | 'tenant' | 'entity';
version: number; // Auto-incrementing version
settings: T; // Actual configuration
effective_at: number; // When this version became active
superseded_at?: number; // When replaced by newer version
tenant_id?: string; // For tenant-scoped configs
}
ComponentSettings InterfaceZod Schema Export
ProjectionEngineProjectionConfigSettingsProjectionConfigSettingsSchema
MeterEngineMeterConfigSettingsMeterConfigSettingsSchema
CircuitBreakerCircuitBreakerConfigSettingsCircuitBreakerConfigSettingsSchema
RateLimiterRateLimitConfigRateLimitConfigSchema
ThresholdMonitorN/A*N/A*
RetryConfigRetryConfigN/A*
WebhookConfigWebhookConfigSettingsWebhookConfigSettingsSchema

*ThresholdMonitor and RetryConfig accept config_version parameter but don’t require full Config wrapper.

All components accept either Config or raw config object for backward compatibility:

import { CircuitBreaker, CircuitBreakerConfigSettings } from '@z0-app/sdk';
import type { Config } from '@z0-app/sdk';
// Option 1: Raw config (backward compatible)
const cb1 = new CircuitBreaker({
failureThreshold: 5,
resetTimeoutMs: 30000,
halfOpenSuccesses: 2,
});
// Option 2: Config<T> wrapper (recommended for production)
const config: Config<CircuitBreakerConfigSettings> = {
id: 'cb_api_gateway',
type: 'circuit_breaker',
scope: 'platform',
version: 3,
tenant_id: 'system',
settings: {
failureThreshold: 5,
resetTimeoutMs: 30000,
halfOpenSuccesses: 2,
},
effective_at: Date.now(),
};
const cb2 = new CircuitBreaker(config, factManager);
// Config version tracked in state and facts
const state = cb2.getState();
console.log(state.config_version); // 3

All config types have exported Zod schemas for runtime validation:

import {
ProjectionConfigSettingsSchema,
MeterConfigSettingsSchema,
CircuitBreakerConfigSettingsSchema,
RateLimitConfigSchema,
WebhookConfigSettingsSchema,
} from '@z0-app/sdk';
// Validate config before using
const result = ProjectionConfigSettingsSchema.safeParse(userInput);
if (!result.success) {
console.error('Invalid projection config:', result.error.issues);
return;
}
const engine = new ProjectionEngine(result.data);
import { ProjectionEngine, ProjectionConfigSettingsSchema } from '@z0-app/sdk';
import type { Config, ProjectionConfigSettings } from '@z0-app/sdk';
// Define versioned config
const projectionConfig: Config<ProjectionConfigSettings> = {
id: 'proj_daily_api_usage',
type: 'projection',
scope: 'platform',
version: 2, // Incremented when settings change
tenant_id: 'system',
settings: {
source: 'api_request',
factTypes: ['completed'],
timeWindow: 'day',
groupBy: ['data.endpoint'],
aggregations: [
{ field: 'count', function: 'count' },
{ field: 'data.duration_ms', function: 'avg' },
],
materialize: true,
storageVersion: 2, // Sparse storage
},
effective_at: Date.now(),
};
// Validate before use
const validation = ProjectionConfigSettingsSchema.safeParse(projectionConfig.settings);
if (!validation.success) {
throw new Error('Invalid projection config');
}
// Create engine with Config
const engine = new ProjectionEngine(projectionConfig, {
factManager,
entityId: 'proj_daily_api_usage',
tenantId: 'system',
});
// Process facts - config_version included in metadata
const results = engine.process(facts, options);
console.log(results._meta.config_version); // 2
// Migration facts include config_version
// {
// type: 'projection.migration_started',
// data: {
// projection_id: 'proj_daily_api_usage',
// from_version: 1,
// to_version: 2,
// config_version: 2
// }
// }
import { MeterEngine, MeterConfigSettingsSchema } from '@z0-app/sdk';
import type { Config, MeterConfigSettings } from '@z0-app/sdk';
const meterConfig: Config<MeterConfigSettings> = {
id: 'meter_api_calls',
type: 'meter',
scope: 'tenant',
version: 1,
tenant_id: 'tnt_acme',
settings: {
name: 'api_calls',
windows: ['hour', 'day', 'month'],
budget: {
hour: 1000,
day: 10000,
month: 250000,
},
},
effective_at: Date.now(),
};
const meter = new MeterEngine(meterConfig, sqlStorage, factManager);
// Usage facts include config_version
await meter.incrementUsage('entity_123', 5);
// Emits: {
// type: 'meter.usage',
// data: {
// meter_id: 'meter_api_calls',
// count: 5,
// config_version: 1
// }
// }
  1. Audit trail - Know exactly which config produced which results
  2. Debugging - “Why did this behave differently last week?” → check config_version
  3. A/B testing - Run two config versions side-by-side
  4. Rollback - Revert to previous version by superseding current
  5. Compliance - Regulatory requirement to track configuration changes

Facts emitted by SDK components use typed constants exported as *_FACT_TYPES objects. This provides:

  • Type safety - String literals checked at compile time
  • Discoverability - IDE autocomplete for all fact types
  • Consistency - Single source of truth for fact patterns
import {
PROJECTION_FACT_TYPES,
METER_FACT_TYPES,
CB_FACT_TYPES,
RL_FACT_TYPES,
THRESHOLD_MONITOR_FACT_TYPES,
} from '@z0-app/sdk';
// ProjectionEngine
PROJECTION_FACT_TYPES.MIGRATION_STARTED // 'projection.migration_started'
PROJECTION_FACT_TYPES.MIGRATION_COMPLETED // 'projection.migration_completed'
PROJECTION_FACT_TYPES.MIGRATION_FAILED // 'projection.migration_failed'
// MeterEngine
METER_FACT_TYPES.USAGE // 'meter.usage'
METER_FACT_TYPES.BUDGET_CHECK // 'meter.budget_check'
// CircuitBreaker
CB_FACT_TYPES.STATE_CHANGED // 'circuit.state_changed'
CB_FACT_TYPES.CONFIG_UPDATED // 'circuit.config_updated'
// RateLimiter
RL_FACT_TYPES.TRIGGERED // 'rate_limit.triggered'
RL_FACT_TYPES.RECOVERED // 'rate_limit.recovered'
// ThresholdMonitor
THRESHOLD_MONITOR_FACT_TYPES.CROSSED // 'threshold.crossed'
THRESHOLD_MONITOR_FACT_TYPES.RECOVERED // 'threshold.recovered'

All SDK facts follow {component}.{action} pattern:

  • type: Component name (e.g., ‘projection’, ‘meter’, ‘circuit’)
  • subtype: Action or event (e.g., ‘migration_started’, ‘state_changed’)

This matches the convention used in domain-specific facts and enables unified querying.

Emitted by CachedStateManager on interval (default: 60s):

{
type: 'sdk',
subtype: 'cache_stats',
tenant_id: 'entity_tenant_id',
data: {
entity_id: 'account_123',
entity_type: 'account',
period_start: 1700000000000,
period_end: 1700000060000,
hits: 42,
misses: 8
}
}

Emitted when ProjectionEngine storageVersion changes:

{
type: 'projection',
subtype: 'migration_started',
entity_id: 'proj_daily_usage',
tenant_id: 'system',
data: {
projection_id: 'proj_daily_usage',
from_version: 1,
to_version: 2,
config_version: 3,
bucket_count: 365
}
}

Emitted after successful migration:

{
type: 'projection',
subtype: 'migration_completed',
entity_id: 'proj_daily_usage',
tenant_id: 'system',
data: {
projection_id: 'proj_daily_usage',
from_version: 1,
to_version: 2,
config_version: 3,
duration_ms: 450,
buckets_migrated: 365
}
}

Emitted if migration fails (original data intact):

{
type: 'projection',
subtype: 'migration_failed',
entity_id: 'proj_daily_usage',
tenant_id: 'system',
data: {
projection_id: 'proj_daily_usage',
from_version: 1,
to_version: 2,
error_message: 'Storage quota exceeded',
config_version: 3
}
}

Emitted by MeterEngine on usage increment:

{
type: 'meter',
subtype: 'usage',
entity_id: 'entity_123',
tenant_id: 'tnt_acme',
data: {
meter_id: 'meter_api_calls',
count: 5,
windows: ['hour', 'day', 'month'],
totals: {
hour: 42,
day: 523,
month: 12450
},
config_version: 1
}
}

Emitted when budget check occurs (warning or denial):

{
type: 'meter',
subtype: 'budget_check',
entity_id: 'entity_123',
tenant_id: 'tnt_acme',
data: {
meter_id: 'meter_api_calls',
window: 'day',
current: 9500,
budget: 10000,
allowed: true, // or false if denied
utilization: 0.95,
config_version: 1
}
}

Emitted on circuit breaker state transitions:

// CLOSED → OPEN
{
type: 'circuit',
subtype: 'state_changed',
tenant_id: 'system',
data: {
circuit_id: 'cb_api_gateway',
from_state: 'closed',
to_state: 'open',
failures_at_transition: 5,
last_failure_at: 1700000123000,
config_version: 2
}
}
// OPEN → HALF_OPEN
{
type: 'circuit',
subtype: 'state_changed',
tenant_id: 'system',
data: {
circuit_id: 'cb_api_gateway',
from_state: 'open',
to_state: 'half_open',
opened_at: 1700000123000,
elapsed_ms: 30000,
config_version: 2
}
}
// HALF_OPEN → CLOSED (recovery)
{
type: 'circuit',
subtype: 'state_changed',
tenant_id: 'system',
data: {
circuit_id: 'cb_api_gateway',
from_state: 'half_open',
to_state: 'closed',
successes_at_recovery: 2,
config_version: 2
}
}
// HALF_OPEN → OPEN (probe failure)
{
type: 'circuit',
subtype: 'state_changed',
tenant_id: 'system',
data: {
circuit_id: 'cb_api_gateway',
from_state: 'half_open',
to_state: 'open',
probe_failure: true,
config_version: 2
}
}

Emitted when circuit breaker config changes dynamically:

{
type: 'circuit',
subtype: 'config_updated',
tenant_id: 'system',
data: {
circuit_id: 'cb_api_gateway',
old_config_version: 2,
new_config_version: 3,
changes: {
failureThreshold: { from: 5, to: 10 },
resetTimeoutMs: { from: 30000, to: 60000 }
}
}
}

Emitted when rate limit denies request:

{
type: 'rate_limit',
subtype: 'triggered',
entity_id: 'entity_123',
tenant_id: 'tnt_acme',
data: {
key: 'api:/v1/users:entity_123',
requests: 101,
max_requests: 100,
window_ms: 60000,
config_version: 1
}
}

Emitted when entity recovers from rate limiting:

{
type: 'rate_limit',
subtype: 'recovered',
entity_id: 'entity_123',
tenant_id: 'tnt_acme',
data: {
key: 'api:/v1/users:entity_123',
previous_utilization: 1.01, // Was over limit
current_utilization: 0.85, // Now under
config_version: 1
}
}

Emitted when value drops below threshold:

{
type: 'threshold',
subtype: 'crossed',
entity_id: 'account_123',
tenant_id: 'tnt_acme',
data: {
monitor_id: 'balance_monitor', // optional
threshold_value: 10000, // $100 in cents
old_value: 12000,
new_value: 8000,
config_version: 1 // if configured via Config
}
}

Emitted when value rises back above threshold:

{
type: 'threshold',
subtype: 'recovered',
entity_id: 'account_123',
tenant_id: 'tnt_acme',
data: {
monitor_id: 'balance_monitor',
threshold_value: 10000,
old_value: 8000,
new_value: 15000,
config_version: 1
}
}

Emitted when SchemaManager initializes tables:

{
type: 'sdk',
subtype: 'schema_init',
data: {
entity_id: 'account_123',
entity_type: 'account',
tables_created: ['facts', 'entities', 'configs'],
duration_ms: 15
}
}

Emitted when HydrationManager completes replay:

{
type: 'sdk',
subtype: 'hydration_done',
data: {
entity_id: 'account_123',
entity_type: 'account',
facts_replayed: 1523,
duration_ms: 450,
source: 'r2'
}
}

Use SystemLedger methods to query observability data:

const systemLedger = await getSystemLedger();
// Get all SDK facts for an entity
const facts = systemLedger.getSdkFacts('account_123');
// Get cache stats specifically
const cacheStats = systemLedger.getCacheStats('account_123');
// Returns: CacheStatsData[]
// Get platform-wide stats
const stats = systemLedger.getStats();
// Returns: { tenant_count, entity_count, sdk_facts_count }

When fallback_on_failure: true (default):

  • If the observability target is unavailable, SDK continues operating
  • Observability facts are dropped silently (or with warning if warn_on_fallback: true)
  • No impact on primary functionality

When fallback_on_failure: false:

  • Errors propagate if observability target fails
  • Use only when observability is critical

To prevent write amplification, CachedStateManager aggregates stats:

ConfigDefaultDescription
min_interval_ms60000 (1 min)Minimum time between flushes

Stats accumulate in memory and flush when:

  1. Interval expires (checked on each cache operation)
  2. flushStats() called explicitly
  3. Stats tracking disabled
// Custom interval
this.cachedStateManager.enableStatsTracking((stats) => {
this.emitSdkObservability('cache_stats', stats);
}, { min_interval_ms: 30_000 }); // 30 seconds

OperationOverhead
Cache get/set~1 counter increment (in-memory)
Stats flush1 HTTP call to SystemLedger (async, fire-and-forget)
Circuit breakerPrevents thundering herd on SystemLedger failure

SystemLedger receives facts from all EntityLedgers. At scale:

  1. Aggregation - Stats batched to 1 fact per minute per entity
  2. Fire-and-forget - EntityLedgers don’t wait for response
  3. Circuit breaker - Protects SystemLedger from overload

For very high scale, consider:

  • Longer aggregation intervals
  • Sampling (emit stats for subset of entities)
  • Sharding SystemLedger by tenant (future)

ledgers/system.ts
import { SystemLedger } from '@z0-app/sdk';
export { SystemLedger };
// ledgers/account.ts
import { EntityLedger, LedgerOptions } from '@z0-app/sdk';
interface AccountEnv {
SYSTEM_LEDGER: DurableObjectNamespace;
}
export class AccountLedger extends EntityLedger<AccountEnv> {
constructor(ctx: DurableObjectState, env: AccountEnv) {
const options: LedgerOptions = {
// This enables SDK observability automatically
observabilityStub: env.SYSTEM_LEDGER.get(
env.SYSTEM_LEDGER.idFromName('system')
),
bootstrapConfig: {
fallback_on_failure: true,
warn_on_fallback: true
}
};
super(ctx, env, options);
}
protected async updateCachedState(fact: Fact): Promise<void> {
// Your domain logic...
// CachedStateManager automatically tracks hits/misses
}
}

Multi-Tenant Setup (Workers for Platforms)

Section titled “Multi-Tenant Setup (Workers for Platforms)”

For multi-tenant deployments, use TenantSystemLedger to aggregate per-tenant:

ledgers/tenant-system.ts
import { TenantSystemLedger, TenantSystemLedgerEnv } from '@z0-app/sdk';
export { TenantSystemLedger };
// ledgers/account.ts
import { EntityLedger, LedgerOptions } from '@z0-app/sdk';
interface TenantEnv extends TenantSystemLedgerEnv {
TENANT_SYSTEM_LEDGER: DurableObjectNamespace;
TENANT_ID: string;
}
export class AccountLedger extends EntityLedger<TenantEnv> {
constructor(ctx: DurableObjectState, env: TenantEnv) {
const options: LedgerOptions = {
// Point to TenantSystemLedger instead of SystemLedger
observabilityStub: env.TENANT_SYSTEM_LEDGER.get(
env.TENANT_SYSTEM_LEDGER.idFromName(env.TENANT_ID)
),
};
super(ctx, env, options);
}
}
# wrangler.toml (tenant worker)
[[durable_objects.bindings]]
name = "TENANT_SYSTEM_LEDGER"
class_name = "TenantSystemLedger"
[[durable_objects.bindings]]
name = "ACCOUNT_LEDGER"
class_name = "AccountLedger"
# Environment variable set per-tenant by Workers for Platforms
[vars]
TENANT_ID = "" # Set dynamically via dispatch binding

// Bootstrap configuration (optional)
interface BootstrapConfig {
fallback_on_failure: boolean; // Default: true
warn_on_fallback?: boolean; // Default: true
}
// SDK observability fact
interface SdkObservabilityFact {
type: 'sdk';
subtype: SdkObservabilitySubtype;
entity_id: string;
entity_type: string;
tenant_id: string;
data: SdkObservabilityData;
}
type SdkObservabilitySubtype =
| 'cache_stats'
| 'fact_appended'
| 'schema_init'
| 'hydration_done';
interface CacheStatsData {
period_start: number;
period_end: number;
hits: number;
misses: number;
}
// Aggregated tenant stats (from TenantSystemLedger)
interface TenantStatsPayload {
tenant_id: string;
period_start: number;
period_end: number;
cache_hits: number;
cache_misses: number;
facts_count: number;
entities_count: number;
}

ConceptImplementation
What it isSDK observing itself via z0 primitives
WhyBattle-test the SDK, platform observability
How to enablePass observabilityStub option (auto-enables)
Current supportCachedStateManager hit/miss tracking
Future supportSchemaManager, HydrationManager, FactManager
DefaultDisabled (no stub = no observability)

The SDK dogfooding architecture demonstrates a key principle: the same patterns that work for your domain work for the platform itself.