Back to Blog
Trading Education

Dead Man's Switch: Automated Safety When You Walk Away

QFQuantForge Team·April 3, 2026·7 min read

Automated trading systems are designed to run without human intervention. That is the entire point. But there is a difference between a system running autonomously because everything is working and a system running autonomously because nobody is watching and everything is broken.

The dead man's switch addresses the second scenario. It requires periodic human check-in and progressively escalates when the operator is unavailable, ultimately shutting down all trading if nobody confirms they are watching. It is the mechanism that lets us sleep while 45 bots trade, knowing that if something happens to us, the system will stop itself.

Why Automated Trading Needs a Kill Switch

Consider the scenarios that a dead man's switch protects against. A network outage at home disconnects you from the monitoring dashboard for 12 hours during a market crash. A medical emergency takes you offline for two days while your bots continue trading through a black swan event. A power failure takes down your monitoring but not your trading server, which continues executing trades without human oversight.

In each case, the bots will continue operating within their individual risk parameters. The per-bot circuit breaker will fire if any bot exceeds 20 percent drawdown. The portfolio halt will trigger at 15 percent aggregate decline. But these are reactive mechanisms that require losses to occur before they activate.

The dead man's switch is proactive. It does not wait for losses. It asks a simpler question: is anyone watching? If the answer is no for long enough, it stops everything regardless of current performance. A system that is profitable but unmonitored is a system one black swan away from catastrophe.

The Four-State Machine

Our dead man's switch operates as a state machine with four states tied to the elapsed time since the last operator check-in. The default check-in window is 24 hours.

During the first 20 hours (0 to 83 percent of the window), the status is OK. All systems operate normally. No alerts are sent. This is the normal operating state that covers most of a day including sleep.

At 83 percent of the window (approximately 20 hours), the status transitions to Warning. The system logs a warning-level event and sends a Telegram notification. This is a gentle reminder: you have about 4 hours before the system shuts down. Under normal circumstances, this alert is received during morning hours and the operator checks in as part of their daily routine.

At 96 percent of the window (approximately 23 hours), the status transitions to Critical. The system logs an error-level event and sends an urgent Telegram alert. This is the final warning: you have approximately one hour before all trading stops. If the operator is available at all, this alert should trigger an immediate check-in.

At 100 percent of the window (24 hours), the switch triggers. All bots are forced to stop. Open positions remain but no new trades are placed. This is the failsafe that assumes the operator is genuinely unavailable and the system should not continue trading without human oversight.

The Latch: Why Simple Check-In Is Not Enough

The most important design decision in our dead man's switch is the latch mechanism. Once the switch triggers, a simple check-in does not restart the system. The operator must send an explicit reset command to un-latch and resume trading.

This seems like an annoying design choice until you consider why it exists. Without a latch, an automated script could send periodic check-ins without any human actually monitoring the system. A cron job that pings the check-in endpoint every 12 hours would keep the dead man's switch permanently happy while no human ever looks at the bots. That defeats the entire purpose.

The latch ensures that after a trigger, a human must actively decide to resume trading. The reset process is intentionally separate from the check-in process. You cannot accidentally reset while doing something else. You must specifically issue the reset command, which means you have opened Telegram, seen the trigger notification, reviewed the state of the bots, and decided that resuming is appropriate.

If the operator sent the reset without reviewing, that is their choice. But the system will not resume automatically, and the friction of the reset process provides a natural pause for assessment.

The Check-In Routine

In practice, the check-in integrates into a daily monitoring routine. The operator reviews the dashboard once per day, checks bot performance, reviews any risk events, and sends a check-in via Telegram. This takes approximately five minutes.

The 24-hour window is calibrated for this routine. Checking in at approximately the same time each day (say, each morning) means the warning alert never fires under normal conditions. The warning is set at 83 percent (20 hours) specifically to provide a 4-hour buffer for schedule variation. If you usually check in at 8 AM but one morning you are busy until 10 AM, the warning fires at approximately 4 AM (20 hours after yesterday's 8 AM check-in), and you have until 8 AM to check in before the critical alert at approximately 7 AM.

If you are traveling or have a planned absence, the appropriate action is to either reduce bot allocations before leaving, pause bots manually, or arrange for someone else to check in. The dead man's switch is not designed for planned absences. It is designed for unplanned ones.

How It Interacts with Other Risk Mechanisms

The dead man's switch is the last layer in a multi-layered safety system, and it is deliberately the most aggressive. The per-bot drawdown breaker at 20 percent stops individual bots that are losing. The portfolio halt at 15 percent stops all bots when there is correlated stress. The decay detector pauses bots whose strategies have degraded. The daily loss limit stops bots that are having a bad day.

All of these are loss-based triggers. They require something to go wrong before they activate. The dead man's switch is the only operator-based trigger. It activates based on the absence of human oversight, not the presence of losses. This means it catches scenarios that the other mechanisms miss: a market environment where bots are not losing but are making trades that a human would want to review, or a situation where the system is functioning correctly but an external risk (exchange regulatory action, API deprecation) requires human judgment.

Implementation Details

The switch is implemented as a standalone service with three core methods. The check-in method resets the internal timer to the current time. It is called when the operator sends the check-in command via Telegram. If the switch is already triggered (latched), the check-in is ignored and a warning is logged explaining that reset is required instead.

The check-status method returns the current state and the elapsed hours since the last check-in. The bot manager calls this before processing any new signals. If the status is triggered, the bot manager skips all trading activity. This check happens at the beginning of every tick loop for every bot, so the shutdown is immediate once the switch fires.

The reset method un-latches the switch, resets the timer, and allows bots to resume. It can only be called through the Telegram command interface, ensuring a human is in the loop. After reset, the operator must also manually restart bots through the dashboard or API, providing a second confirmation that they intend to resume.

What We Have Learned

Running 45 bots for weeks has taught us that the dead man's switch is more about discipline than safety. The daily check-in routine forces engagement with the system. Without it, the natural tendency is to set up bots and stop paying attention until something goes wrong. By the time something goes wrong visibly, the damage is often already done.

The check-in routine surfaces small issues early. A bot that has been declining for five days is easy to spot during a daily review. Without the daily review, that bot might decline for three weeks before the decay detector triggers at a Sharpe of 0.5. The dead man's switch does not directly detect this issue, but the daily routine it enforces does.

The latch has fired once during our paper trading period when a network change disrupted Telegram delivery for 26 hours. No bots were actually at risk (paper trading, simulated fills), but the exercise was valuable. The reset process confirmed that all bots were in expected states, no risk events had occurred, and the system was functioning correctly. The latch forced a manual review that we would not have done otherwise.

For a system managing real capital, this kind of forced review after any gap in monitoring is exactly the right behavior. The inconvenience of the latch is the cost of the safety it provides. We consider it one of the most important components in our risk framework, precisely because it protects against the one risk that no algorithm can assess: whether a human is paying attention.