Running 45 Bots on a Single Machine: Our Architecture

Running one trading bot is straightforward. Running 45 simultaneously on a single machine without them interfering with each other, overwhelming the exchange API, or creating unmanaged portfolio risk is an engineering problem that requires careful architecture.

Our platform runs 45 paper trading bots on a Mac, executing across 6 strategy types and 13 symbols. The entire system, bots, API server, React dashboard, database, and local AI model, runs as a single Python process with an async event bus for internal communication. Here is how it works.

The Single-Process Design

All 45 bots run inside the FastAPI server process. This is a deliberate architectural choice, not a limitation. Running bots in-process eliminates inter-process communication overhead, simplifies state management, and allows the event bus to operate as a zero-copy in-memory queue.

The FastAPI lifespan creates the full pipeline on startup: DataFetcher (ccxt exchange connection), PaperTrader (simulated execution engine), BotManager (lifecycle management for all 45 bots), and SchedulerService (APScheduler for timing bot ticks). These are stored in the application state and injected into API routes via dependency injection.

When a bot is started via the API, BotManager creates a BotInstance with the strategy class, symbol, timeframe, and configuration parameters. The SchedulerService registers an APScheduler job that calls the bot's tick method at the strategy's timeframe interval. For 15-minute strategies, the tick fires every 900 seconds. For 4-hour strategies, every 14,400 seconds. For 1-hour strategies, every 3,600 seconds.

The Tick Loop

Each bot's tick is the core execution cycle. It runs as an async coroutine within the event loop, so 45 bots can have overlapping ticks without blocking each other.

The tick sequence is: fetch candles from exchange, update unrealized PnL on open positions, check if a position is already open (skip new entries if so), run strategy analysis to generate a signal, apply AI enrichment if available, pass through the three-level risk hierarchy (per-bot, portfolio, AI), and if all checks pass, execute the order through the paper trader.

Each tick is independent. An error in one bot's tick stops that bot only, publishes a BOT_ERROR event, and does not affect any other bot. This isolation is critical when running 45 bots. A bug in one strategy configuration, a malformed candle response for one symbol, or a transient exchange API error should not cascade to the other 44 bots.

The Event Bus

Internal communication uses an async event bus built on asyncio queues. When a bot generates a signal, it publishes a SIGNAL_GENERATED event. When an order fills, ORDER_FILLED. When a circuit breaker fires, CIRCUIT_BREAKER_FIRED. Each event type has subscribers that react independently.

The event bus uses non-blocking publish (put_nowait) so event publication never delays the tick loop. Subscribers process events at their own pace. If a subscriber falls behind, events queue up to a maximum of 1,000 per subscriber before dropping. This backpressure prevents a slow subscriber (like Telegram notification) from affecting bot execution.

Key subscribers include: the SSE endpoint (pushes events to the React dashboard in real time), the Telegram notifier (sends alerts for significant events), the risk event persister (writes circuit breaker and risk rejection events to the database), and the equity snapshot service (records hourly equity readings for drawdown tracking).

The Database: SQLite WAL

All state is persisted in SQLite with WAL (Write-Ahead Logging) mode. WAL allows concurrent readers and a single writer without blocking, which is essential when 45 bots might update their state simultaneously while the API serves dashboard queries.

Key pragmas: journal_mode=WAL, busy_timeout=5000 milliseconds (wait up to 5 seconds for a write lock rather than failing immediately), foreign_keys=ON, synchronous=NORMAL (fsync at checkpoint, not every transaction, balancing durability with performance).

The database holds 19 tables covering bots, positions, orders, trades, candles, equity snapshots, risk events, AI analyses, backtest results, and job tracking. With 45 active bots and months of historical data, the database is approximately 5 GB, primarily candle data across 25 symbols and 11 timeframes.

Rate Limit Management

Binance imposes API rate limits that become significant when 45 bots fetch candles simultaneously. The DataFetcher manages this through request queuing and deduplication.

Multiple bots may trade the same symbol (mean reversion and momentum both trade SOL/USDT). The DataFetcher deduplicates candle requests so the same symbol-timeframe combination is only fetched once per tick cycle, regardless of how many bots request it. The fetched data is cached in memory and shared across all requesting bots.

For the current deployment of 13 unique symbols across 3 timeframes (15-minute, 1-hour, 4-hour), each tick cycle requires approximately 39 API calls. With careful timing (staggering bot starts by 2 seconds), these spread across the tick interval rather than firing simultaneously. Binance allows 1,200 requests per minute for spot market data, so 39 requests per cycle is well within limits.

The Risk Hierarchy

The risk system operates at three levels, all evaluated synchronously within the tick loop before any order reaches the executor.

Per-bot risk runs five checks in sequence: stop-loss enforcement, maximum position size (25 percent of bot capital), drawdown circuit breaker (20 percent), daily loss limit (5 percent), and consecutive loss cooldown (halving after 3 losses with a 10 percent floor). These checks use only the individual bot's state: its own trades, its own equity, its own positions.

Portfolio risk runs three checks using aggregate state across all 45 bots: total exposure cap (50 percent of aggregate capital), single asset concentration (25 percent), and portfolio drawdown halt (15 percent). These checks query all open positions across all bots to calculate current exposure.

The correlation sizer adjusts position size based on portfolio overlap, scaling positions between 25 percent and 100 percent of their intended size depending on how correlated the new position is with existing exposure.

Scheduling 45 Jobs

APScheduler manages the timing for all 45 bot ticks. Each bot is registered as an interval trigger job with the period matching its strategy's timeframe. The scheduler handles time drift, missed ticks (if a previous tick ran long), and graceful shutdown.

Bots are staggered at startup with 2-second intervals to prevent thundering herd effects on the exchange API. The first bot starts immediately, the second 2 seconds later, the third 4 seconds later, and so on. For 45 bots, the full startup takes 90 seconds. After initial startup, the bots naturally desynchronize further as tick execution times vary slightly.

The scheduler runs within the same async event loop as the FastAPI server. This means bot ticks and API request handling share the same event loop but do not block each other because both are async. A long-running API query (fetching backtest results with pagination) does not delay bot execution, and a bot tick (fetching candles and running strategy analysis) does not delay API responses.

Memory and CPU Profile

The full system with 45 active bots, the API server, the React dev server, and Ollama running Llama 3.1 at 8 billion parameters uses approximately 4 GB of RAM and minimal CPU during normal operation. Bot ticks are short (typically under 2 seconds each) and CPU-intensive only during strategy analysis. Between ticks, the process is mostly idle, waiting for the next scheduled execution.

The memory profile is dominated by cached candle data (DataFrames for each symbol-timeframe combination) and the Ollama model weights. The bot instances themselves are lightweight: each holds a reference to its strategy class, its configuration, and a connection to the shared database and exchange client.

During backtest jobs (which run in separate ProcessPoolExecutor workers), CPU usage spikes significantly. This is why we offload large backtest jobs to distributed workers rather than running them on the same machine as the live bots. The live bot process needs consistent, low-latency access to the exchange API, which is impaired when CPU-intensive backtests compete for resources.

What We Would Change

If we were rebuilding from scratch, the main change would be separating the bot engine from the API server into distinct processes communicating via the database and a message queue. The current single-process design is simpler and performs well, but it means an API server restart also restarts all bots. With process separation, bots could continue running during API deployments.

The SQLite WAL approach works well up to our current scale but would need revisiting above approximately 100 bots or if we added real-time order book streaming. At that scale, the single-writer limitation of SQLite becomes a bottleneck during concurrent position updates. PostgreSQL with connection pooling would be the natural next step.

For now, 45 bots on a single Mac is well within the architecture's capacity. The system runs continuously, handles exchange API errors gracefully, recovers from crashes automatically, and provides real-time monitoring through the dashboard. The engineering is not glamorous, but it is reliable, and reliability is the first requirement for any system managing capital.