Rosa Del Mar

Daily Brief

Issue 58 2026-02-27

Systematization Path: Strip Discretion, Backtest, Then Forward-Test With Reconciliation

Issue 58 Edition 2026-02-27 9 min read
General
Sources: 1 • Confidence: Medium • Updated: 2026-04-11 18:21

Key takeaways

  • Mabe reports that in his early trading community, backtesting was often frowned upon because it was believed not to reflect reality and because traders believed intuition could not be modeled.
  • Mabe asserts that even with fully automated execution, trading remains emotionally difficult because the discretionary pressure shifts to how to respond to drawdowns.
  • Mabe says that for short-selling strategies he prefers a 'pristine' backtest that includes commissions but does not explicitly model slippage or locate costs, treating those as post-backtest degradations.
  • Mabe describes his core traded setup as a gapping-stock breakout that enters on a breakout from a narrowing post-open range and places a stop on the opposite side of that tightening range, with an intraday holding period.
  • Mabe reports that, in his backtests, taking partial profits and moving stops to breakeven materially reduces performance versus holding full size to the strategy's natural exit.

Sections

Systematization Path: Strip Discretion, Backtest, Then Forward-Test With Reconciliation

  • Mabe reports that in his early trading community, backtesting was often frowned upon because it was believed not to reflect reality and because traders believed intuition could not be modeled.
  • Mabe asserts that the work of validating a strategy effectively begins after the backtest, because live trading (even at tiny size) reveals important differences between simulation and reality.
  • Mabe says his first attempt to backtest his discretionary process produced better results than his manual trading, changing his view of his discretion.
  • Mabe says live automated results were close to but not identical to backtest results because backtests assume fills that are not always achievable in real markets.
  • Mabe says he created an automated reconciliation loop by logging trades (including slippage) to an online journal and generating daily reports of missed backtest trades to diagnose and improve capture.
  • Mabe says converting a discretionary approach into a backtest can start by stripping discretion and applying the underlying rules as a purely systematic strategy.

Automation Shifts The Human Bottleneck To Governance And Drawdown Decisions

  • Mabe asserts that even with fully automated execution, trading remains emotionally difficult because the discretionary pressure shifts to how to respond to drawdowns.
  • A speaker in the episode asserts that predefining drawdown thresholds and actions reduces emotional decision-making when a system enters a drawdown.
  • Mabe says he increased automation in stages by automating sizing, then exit orders, then computer-generated entry orders with manual transmit, rather than enabling full auto execution immediately.
  • Mabe asserts that scaling becomes harder as trade size increases because the larger numbers change psychology and can reintroduce errors even with automation.
  • Mabe recommends approaching automation as an additive side project rather than trying to immediately replace a working discretionary approach, because full cutovers create excessive pressure and take longer than expected.
  • Mabe claims the only two ways to build confidence in a trading system are long-term repetition of live trading and backtesting, with backtesting acting as a shortcut to the confidence needed to scale size.

Backtest Realism Limits: Fills, Slippage Modeling Tradeoffs, And Stop-Related Bias

  • Mabe says that for short-selling strategies he prefers a 'pristine' backtest that includes commissions but does not explicitly model slippage or locate costs, treating those as post-backtest degradations.
  • Mabe asserts that very tight stops can create overly optimistic backtests due to bar-resolution and entry-bar assumptions about whether a stop could be hit immediately after entry.
  • Mabe says live automated results were close to but not identical to backtest results because backtests assume fills that are not always achievable in real markets.
  • Mabe asserts that trying to model slippage and real-world execution perfectly inside a backtest is generally futile.
  • Mabe says tick-by-tick backtesting can address some precision issues but is costly and resource-intensive, so it requires cost-benefit judgment.

Risk Framework: R-Multiples, Fixed-Dollar Risk Sizing, And Stop-Distance-Driven Position Size

  • Mabe describes his core traded setup as a gapping-stock breakout that enters on a breakout from a narrowing post-open range and places a stop on the opposite side of that tightening range, with an intraday holding period.
  • Mabe asserts that tightening ranges enable larger share size for the same fixed dollar risk because the stop distance is smaller.
  • Mabe says he evaluates performance using expectancy and R-multiples and treated this as a non-negotiable prerequisite before making his first day trade.
  • Mabe says his process sets stop distance from the setup and sizes positions by risking a fixed dollar amount per trade, which he increased gradually as confidence grew.

Trade Management Claims: Partial Exits And Stops Often Reduce Backtested Performance

  • Mabe reports that, in his backtests, taking partial profits and moving stops to breakeven materially reduces performance versus holding full size to the strategy's natural exit.
  • Mabe reports that, in his backtests, stops generally worsen strategy performance, while also stating stops remain necessary for practical live-trading risk control.
  • Mabe recommends a backtest sanity check: the core strategy should still work without stops or targets before adding them.

Unknowns

  • What were the actual pre- and post-automation performance statistics (returns, drawdowns, volatility, and risk-adjusted measures) supporting the reported fivefold profit increase?
  • What exact backtest assumptions were used for fills, latency, spreads, and commissions, and how sensitive were results to those assumptions?
  • How were universes selected (e.g., which gappers, which liquidity thresholds), and were survivorship and corporate action handling addressed in the backtests?
  • What were the concrete, codified filters (the trade-skipping rules) that Mabe found valuable, and how stable were they out-of-sample?
  • What were the operational controls for automation failures (kill-switch logic, max-loss limits, order validation, and monitoring), and how often were they triggered?

Investor overlay

Read-throughs

  • Greater emphasis on automation governance and reconciliation implies demand for tooling that compares live fills versus backtest expectations and surfaces missed trades, logic divergences, and data issues as systems scale from small size to production.
  • Acknowledged limits of bar based backtests and stop related bias suggest incremental adoption of higher fidelity simulation and tick level data to validate tight stop strategies, balanced against engineering and cost constraints.
  • Risk first sizing using fixed dollar risk and R multiples points to systematic risk controls becoming core product requirements in automated execution stacks, especially for intraday breakout systems where stop distance drives position size.

What would confirm

  • Teams report measurable reductions in live versus backtest divergence through reconciliation workflows, such as fewer missed trades, smaller fill slippage gaps, and quicker root cause diagnosis of data, logic, and execution issues.
  • More disclosures or user behavior showing migration from bar based backtests to tick level or higher fidelity simulation for intraday strategies, alongside spend increases on market data and compute to support it.
  • Standardization of risk budgets and R based reporting in automated strategies, including position sizing derived from stop distance, and staged rollout rules for pausing or resizing during drawdowns.

What would kill

  • Live forward tests show persistent, unexplained gaps versus backtest results even after reconciliation, indicating that the assumed fill model and unmodeled costs like slippage or locates dominate realized performance.
  • High fidelity simulations fail to materially change outcomes or still cannot address key realism issues such as latency, spread dynamics, and stop execution, reducing the value of additional data and compute investment.
  • Operational incidents in automation such as frequent kill switch triggers, order validation failures, or monitoring gaps make governance overhead outweigh any performance or scalability benefits from systematization.

Sources