Skip to content

Telegram bot

The pipeline runs unattended for long stretches: a /jkz:pipeline invocation can spend an hour cycling through plan, build, and QA without anyone watching the terminal. The Telegram bot is the layer that watches for you. It runs a fixed set of health checks on a timer, surfaces anything that has gone wrong, and proposes work that is waiting to start — but it never acts on its own. Every suggestion lands as a button you press. The bot observes and recommends; you decide.

Two loops drive it: a monitoring loop that samples system health, and a task discovery loop that scans GitHub for actionable work. Both report to the project’s Telegram chat.

The monitoring loop

Every ten minutes the bot runs eleven health checks concurrently and folds their results into one report. Each check returns a status — ok, warn, or error — and a short detail string. The checks are independent and isolated: a check that throws is caught and downgraded to a warn rather than taking the whole loop down.

CheckWhat it watches
active_agentsAgents that have gone idle, via lastActivity tracking
ci_statusThe health of the latest CI runs
stale_worktreesWorktrees lingering past their expected lifetime
deliberation_errorsErrors recorded in agent deliberation logs
github_apiRemaining headroom on the GitHub API rate limit
git_driftWhether local HEAD has fallen behind origin/main (a sign git-sync failed)
cli_versionsOutdated CLIs and breaking changes — read from cache, never the network
slo_compliancePipeline SLOs evaluated over a rolling window
worktree_cleanupRemoves worktrees for issues that have reached a terminal state
quota_restorationProbes whether an exhausted Codex quota has recovered
stale_locksRecovers worktree locks whose owning issue is already closed

The report is persisted to state/bot-monitor-report.json after every cycle, so the most recent system snapshot is always available on disk even between Telegram messages.

The bot is Docker-aware: when the project runs inside a container it issues shell commands through the container and reads state files natively from the mounted workspace.

The on-demand /health command in the bot runs a smaller subset of these checks (the eight that are cheap and side-effect-free) for a quick status readout. The full eleven-check sweep — including the cleanup and lock-recovery checks that mutate state — only runs on the timed loop.

Alerts and debouncing

Health findings do not flood the chat. Each distinct alert is debounced with a thirty-minute cooldown: once an alert fires, the same alert stays silent for thirty minutes even if the underlying condition persists across loop cycles. This keeps a single ongoing problem (a rate-limit warning, a stuck agent) from posting every ten minutes.

Background refreshes

A few maintenance tasks ride the same ten-minute clock but run on longer intervals, counted in cycles:

TaskCadenceTrigger
Monitoring loopevery cycle10 min
Task discoveryevery 3rd cycle30 min
Changelog cache refreshevery 6th cycle60 min

The changelog refresh is fire-and-forget: it kicks off changelog-review.js in the background to keep the CLI-version cache warm, so the cli_versions check stays fast (it only ever reads the cache).

Proactive task discovery

Every thirty minutes — every third monitoring cycle — the bot scans GitHub for work that is ready but idle. It looks for four conditions:

  • jkz:ready issues with no active pipeline — work that is queued but unstarted.
  • Stale PRs open longer than 24 hours — surfaced as informational, not as a problem to fix.
  • Blocked issues — issues waiting on an unresolved dependency.
  • Stale pipelines older than two hours — a run that has stalled mid-phase.

Each finding is offered as a Telegram inline keyboard: a message with buttons the human can tap to act. The system suggests; the human acts. Nothing starts a pipeline, merges a PR, or unblocks an issue automatically.

Staying under the callback limit

Telegram caps callback_data — the payload attached to an inline button — at 64 bytes. A jkz command with an issue number and context easily exceeds that. The bot works around the limit with a pendingCommands pattern: the full command is stored server-side and the button carries only a short key that points to it. When the button is pressed, the bot looks up the real command by its key. This keeps every button well under the 64-byte ceiling regardless of how long the underlying command is.

Where this lives

The checks themselves are defined in scripts/monitoring-checks.js; task discovery lives in scripts/task-discovery.js; and the bot that schedules both loops, debounces alerts, and renders the inline keyboards is scripts/telegram-bot.js.