Dead Man's Switch: The Safety Net Every Trader Needs

Every automated trading system has a failure mode that no amount of code can prevent: the operator becoming unavailable. A medical emergency, a lost phone, a natural disaster, or simply forgetting that bots are running while on vacation. The bots continue trading, the market moves against them, and there is no human available to intervene. The losses compound because the system is doing exactly what it was programmed to do, in conditions where a human would have stopped it.

A dead man's switch is the solution. If the operator does not check in within a defined window, the system assumes something is wrong and takes protective action. The concept is borrowed from heavy equipment safety: a train's dead man's switch stops the train if the operator releases the handle, on the assumption that a conscious operator keeps the handle pressed.

The 24-Hour Window

Our dead man's switch requires a check-in within 24 hours. The timer starts when the server launches and resets on every check-in. The check-in mechanism is a Telegram command: the operator sends a message to the bot, confirming they are aware the system is running and available to intervene if needed.

Twenty-four hours is a considered choice. Shorter windows (4 to 8 hours) would trigger false alarms during sleep, travel, or busy days. Longer windows (48 to 72 hours) defeat the purpose because a lot can happen in two or three days of unmonitored trading. Twenty-four hours provides enough slack for normal life while ensuring that a genuinely unavailable operator is detected within a day.

The Four-State Machine

The dead man's switch operates as a four-state machine, not a simple timer. Each state has defined behavior and a clear transition condition.

OK state. The operator has checked in within the last 24 hours. The timer shows how much time remains. All bots trade normally. The dashboard shows a green indicator. This is the steady-state for normal operation.

Warning state. The timer has reached 83 percent of the 24-hour window (approximately 20 hours elapsed, 4 hours remaining). The system sends a Telegram alert: "Dead man's switch warning. Check in within 4 hours." The dashboard indicator turns yellow. All bots continue trading. This is a reminder, not an action.

Critical state. The timer has reached 96 percent of the window (approximately 23 hours elapsed, 1 hour remaining). A second Telegram alert fires with higher urgency: "Dead man's switch critical. Check in within 1 hour or all bots will be stopped." The dashboard indicator turns red. Bots still trade but the operator has very limited time to respond.

Triggered state. The 24-hour window has fully elapsed with no check-in. The system executes protective actions: all running bots are paused, open positions are not forcibly closed (to avoid selling into a crash), and a final Telegram alert fires confirming the switch has triggered. The dashboard shows a red banner indicating the dead man's switch is active.

The transition from each state to the next is monotonic. The timer always moves forward. Once the switch enters Warning, it only moves to Critical or back to OK (if the operator checks in). Once it enters Triggered, only an explicit reset can return it to OK.

The Latch Mechanism

The most important design decision in our dead man's switch is the latch. Once the switch enters the Triggered state, a regular check-in command does not reset it. The operator must issue a separate, explicit reset command to acknowledge the trigger and return to OK state.

This latch exists to prevent a specific attack on the safety mechanism: automated check-ins. If a simple Telegram message reset the switch, you could write a cron job that sends the check-in message every 12 hours. The dead man's switch would never trigger, and the safety mechanism would be completely bypassed. If the operator is actually unavailable (in a hospital, on a plane without connectivity, or worse), the automated check-in keeps the bots trading without any human oversight.

The latch makes automated bypass much harder. Even if a script sends check-in messages on schedule, the switch will trigger eventually if the operator is truly unavailable because the reset command requires a different interaction pattern. The reset is not a simple message but a command with confirmation, designed to require conscious human input.

To be clear: the latch does not make automated bypass impossible. A sufficiently motivated person could automate the full reset flow. But the latch raises the bar from trivially bypassable to deliberately circumvented. If someone deliberately writes code to defeat their own safety mechanism, the dead man's switch has done its job: it forced a conscious decision to override the safety, rather than allowing passive drift into an unmonitored state.

What Happens When It Triggers

The triggered state pauses all bots but does not close positions. This is a deliberate choice. Closing all positions during a trigger could mean selling into a crash (if the trigger happened during a market drawdown) or exiting profitable positions unnecessarily (if the trigger happened during a quiet market).

Pausing means no new trades are opened and no existing positions are modified by the bot logic. Stop-losses remain active at the exchange level (for live trading) because they are exchange-side orders that execute regardless of the bot's state. Take-profit orders similarly remain active. The only thing that stops is the bot's ability to enter new positions or modify existing exit orders.

This approach minimizes the damage from a false trigger. If the operator was simply busy and checks in 26 hours after the last check-in, they reset the switch, unpause the bots, and everything resumes. No positions were liquidated, no exits were forced at bad prices. The 2-hour interruption in new trade entry is the only cost of the false trigger.

For genuine operator unavailability, the paused state buys time. Positions held with stop-losses will be protected by those stops. Positions without stops (which should be rare given our risk framework requires stops) will sit until the operator returns or a designated emergency contact intervenes.

Integration with the Risk Framework

The dead man's switch integrates with the broader risk hierarchy but operates independently. It is not overridden by the per-bot risk manager, the portfolio risk manager, or the AI service. It is a top-level safety mechanism that supersedes all other systems.

When the switch triggers, it publishes a CIRCUIT_BREAKER_FIRED event on the event bus with the trigger source identified as "dead_man_switch." This event is persisted to the risk_events table, visible in the Risk Log tab, and sent to Telegram. The event includes the last check-in timestamp and the elapsed time, providing a clear audit trail.

The dead man's switch check runs on every tick cycle for every bot. Before a bot's strategy analyze method is called, the tick loop checks the switch state. If the state is Triggered, the tick exits early without generating a signal. This means the switch does not need to interact with the scheduler or the bot manager. It operates at the individual bot level, making it effective even if the bot manager or scheduler has issues.

Configuring the Window

The 24-hour window is configurable via the environment file. Some operators may want a shorter window (12 hours) for higher-risk strategies or a longer window (48 hours) for conservative strategies that trade infrequently.

The warning and critical thresholds scale with the window. At 83 percent of the window, the warning fires. At 96 percent, the critical warning fires. These percentages are fixed: changing the window duration automatically adjusts the absolute times of the warnings.

For a 12-hour window, the warning fires at approximately 10 hours and the critical warning at approximately 11 hours 30 minutes. For a 48-hour window, the warning fires at approximately 40 hours and the critical warning at approximately 46 hours.

The Broader Lesson

The dead man's switch embodies a principle that applies beyond trading: every automated system that can cause harm needs a mechanism to detect the absence of human oversight. The trading bots are not dangerous in the traditional sense. They cannot transfer funds off the exchange. But they can open positions, accumulate losses, and consume capital in ways that compound over hours and days.

The dead man's switch is our acknowledgment that automation is a tool, not an autonomous agent. It operates within a framework designed by a human, monitored by a human, and subject to a human's judgment about when to continue and when to stop. When the human is absent, the right default is to stop. The latch ensures this default cannot be circumvented by accident or by poorly considered automation. The 24-hour window provides the operational slack to make the mechanism livable in daily use rather than a constant nuisance that gets disabled permanently.

Dead Man's Switch: The Safety Net Every Trader Needs

Operations & Reliability

The 24-Hour Window

The Four-State Machine

The Latch Mechanism

What Happens When It Triggers

Integration with the Risk Framework

Configuring the Window

The Broader Lesson

Related Posts

Monitoring 45 Live Bots: Dashboards, Alerts, and What to Watch

Self-Hosting a Trading Platform: Docker, Backups, Security

Crash Recovery: The State Machine Approach