Quantitative Finance Insights

Quantitative Finance Insights

Building a Crypto Trading Bot from Scratch - 5: Building a Reliable Data Pipeline

QPY's avatar
QPY
Dec 10, 2025
∙ Paid

a close up of a window with a building in the background
Photo by Claudio Schwarz on Unsplash

So, yes, now we are the most crucial step, where making a poor decision could lead your bot system to big failures under the hood. After getting the exchange adapter working, I faced a new challenge: data management. How do you store thousands of candles efficiently? How do you avoid fetching the same data repeatedly? How do you cache real-time data without eating up all your memory?

These might seem like boring infrastructure questions, but as I said, they’re actually critical. A trading bot that can’t access data quickly and reliably is worse than no bot at all. And trust me, I learned this lesson the hard way.

The Problem: Data Is Everywhere

When your bot is running, data flows from multiple sources:

  • Exchange API: Real-time price updates, order status, balance checks

  • Local storage: Historical data you’ve downloaded for backtesting

  • In-memory cache: Recent candles that strategies need frequently

  • Signal and trade logs: Everything your bot has done

Without a centralized data management layer, you end up with strategies hitting the exchange API directly, reading files directly, maintaining their own caches. It’s a mess. And it’s slow. And you hit rate limits constantly.

The solution? A DataManager that sits between everything and provides a unified interface for all data access.

The DataManager: Your Data Hub

Here’s the core idea: every component that needs data talks to the DataManager, not to the exchange or storage directly. The DataManager handles:

  • Fetching data from exchanges

  • Loading data from local storage

  • Caching recent data in memory

  • Validating data integrity

  • Saving signals and trades

Here’s the basic structure:

class DataManager:
    “”“
    Centralized data access layer for the trading bot.

    Responsibilities:
    - Fetch data from exchange and cache locally
    - Provide unified interface for historical and real-time data
    - Manage data storage (CSV, database, etc.)
    - Validate and clean data before use
    “”“

    def __init__(self, data_store: DataStore, exchange: Optional[ExchangeBase] = None):
        self.data_store = data_store
        self.exchange = exchange
        self._cache: Dict[str, List[Candle]] = {}  # In-memory cache

Notice it takes two dependencies: a DataStore (for persistence) and an ExchangeBase (for live data). This separation is key—the DataManager doesn’t care if data comes from CSV files or a database, and it doesn’t care if you’re using Binance or Kraken.

Smart Data Fetching: Check Local First

The most important optimization in the DataManager is this: always try local storage before hitting the exchange.

Here’s how I implemented fetch_historical_data:

User's avatar

Continue reading this post for free, courtesy of QPY.

Or purchase a paid subscription.
© 2025 QPY · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture