Building a Crypto Trading Bot from Scratch - 5: Building a Reliable Data Pipeline
So, yes, now we are the most crucial step, where making a poor decision could lead your bot system to big failures under the hood. After getting the exchange adapter working, I faced a new challenge: data management. How do you store thousands of candles efficiently? How do you avoid fetching the same data repeatedly? How do you cache real-time data without eating up all your memory?
These might seem like boring infrastructure questions, but as I said, they’re actually critical. A trading bot that can’t access data quickly and reliably is worse than no bot at all. And trust me, I learned this lesson the hard way.
The Problem: Data Is Everywhere
When your bot is running, data flows from multiple sources:
Exchange API: Real-time price updates, order status, balance checks
Local storage: Historical data you’ve downloaded for backtesting
In-memory cache: Recent candles that strategies need frequently
Signal and trade logs: Everything your bot has done
Without a centralized data management layer, you end up with strategies hitting the exchange API directly, reading files directly, maintaining their own caches. It’s a mess. And it’s slow. And you hit rate limits constantly.
The solution? A DataManager that sits between everything and provides a unified interface for all data access.
The DataManager: Your Data Hub
Here’s the core idea: every component that needs data talks to the DataManager, not to the exchange or storage directly. The DataManager handles:
Fetching data from exchanges
Loading data from local storage
Caching recent data in memory
Validating data integrity
Saving signals and trades
Here’s the basic structure:
class DataManager:
“”“
Centralized data access layer for the trading bot.
Responsibilities:
- Fetch data from exchange and cache locally
- Provide unified interface for historical and real-time data
- Manage data storage (CSV, database, etc.)
- Validate and clean data before use
“”“
def __init__(self, data_store: DataStore, exchange: Optional[ExchangeBase] = None):
self.data_store = data_store
self.exchange = exchange
self._cache: Dict[str, List[Candle]] = {} # In-memory cache
Notice it takes two dependencies: a DataStore (for persistence) and an ExchangeBase (for live data). This separation is key—the DataManager doesn’t care if data comes from CSV files or a database, and it doesn’t care if you’re using Binance or Kraken.
Smart Data Fetching: Check Local First
The most important optimization in the DataManager is this: always try local storage before hitting the exchange.
Here’s how I implemented fetch_historical_data:


