Strategy Performance Metrics That Actually Matter
Strategy Performance Metrics That Actually Matter
Most traders obsess over the wrong numbers. They screen-grab a backtest showing a 78% win rate and think they’ve found the holy grail. Six weeks of live trading later, they’re staring at a margin call wondering what happened.
The problem isn’t that metrics are useless. The problem is that most traders look at the wrong ones and ignore the ones that would have actually saved them. At FXVPS, we see thousands of algorithmic strategies running on our infrastructure. We know what separates the accounts that compound from the ones that blow up. It almost never comes down to win rate.
Beyond Win Rate
Win rate is the most overrated number in trading. Full stop.
A strategy that wins 30% of its trades can be wildly profitable if the average winner is four times the size of the average loser. A strategy that wins 90% can wipe you out if the 10% that lose are catastrophic (hello, every martingale system ever built).
Here’s a simple example:
- Strategy A: 35% win rate, average win $400, average loss $120
- Strategy B: 88% win rate, average win $50, average loss $380
Strategy A looks terrible on paper. Strategy B looks like a money printer. Let’s do the math. Over 100 trades:
- Strategy A: (35 x $400) - (65 x $120) = $14,000 - $7,800 = +$6,200
- Strategy B: (88 x $50) - (12 x $380) = $4,400 - $4,560 = -$160
The “bad” strategy makes money. The “good” strategy slowly bleeds out. This is why win rate by itself is a vanity metric. It tells you nothing about profitability without context.
Trend-following strategies routinely operate at 30-45% win rates and print money for decades. Mean-reversion strategies often run 60-75% but carry tail risk. What matters is the relationship between how often you win and how much you win.
Expectancy: The Only Number That Truly Matters
If you could know only one metric about your strategy, it should be expectancy. Expectancy tells you how much you can expect to make (or lose) on every single trade, on average.
The formula:
Expectancy = (Win% x Average Win) - (Loss% x Average Loss)
Using Strategy A from above:
Expectancy = (0.35 x $400) - (0.65 x $120)
= $140 - $78
= $62 per trade
Every time Strategy A enters a trade, it expects to make $62 on average. Over 1,000 trades, that’s $62,000 in expected profit.
For Strategy B:
Expectancy = (0.88 x $50) - (0.12 x $380)
= $44 - $45.60
= -$1.60 per trade
Negative expectancy. No amount of position sizing or money management fixes this. You are paying the market to take your money.
What “good” looks like:
- Below $0: You don’t have a strategy, you have a donation program.
- $0 - $5 per trade: Marginal. Transaction costs and slippage will likely eat this alive in live trading.
- $5 - $30 per trade: Solid. This is where most professional systematic strategies live.
- $30+ per trade: Excellent, but verify your sample size. If this is based on 40 trades, it might be noise.
We often express expectancy as a multiple of risk (R-multiple). If you risk $100 per trade, an expectancy of 0.3R means you expect to make $30 per trade. Anything above 0.2R with sufficient sample size is worth running.
Profit Factor
Profit factor is beautifully simple:
Profit Factor = Gross Profit / Gross Loss
If your strategy made $50,000 in winners and lost $30,000 in losers, your profit factor is 1.67.
How to interpret it:
- Below 1.0: You’re losing money. The strategy is net negative.
- 1.0 - 1.2: Barely profitable. Slippage, commissions, and spread widening in live trading will likely push this underwater.
- 1.2 - 1.5: Decent. Survivable with tight execution and low-latency infrastructure. This is where the value of a properly tuned VPS starts paying for itself in real dollars.
- 1.5 - 2.0: Strong. This is the sweet spot for most robust, long-running strategies.
- 2.0 - 3.0: Very strong. Make sure the sample size justifies it.
- Above 3.0: Be suspicious. Seriously.
A profit factor above 3.0 from a backtest almost always means one of three things: too few trades, a cherry-picked period, or overfitting. We’ve seen plenty of backtests showing profit factors of 5.0+. We’ve never seen one maintain that live for more than a few months.
The real power of profit factor is in monitoring live performance. If your strategy historically runs at 1.8 and it drops to 1.1 over the last 200 trades, something has changed. That’s an actionable signal.
Sharpe and Sortino Ratios
The Sharpe ratio measures risk-adjusted return. It answers: “For every unit of risk I’m taking, how much am I getting paid?”
Sharpe Ratio = (Average Return - Risk-Free Rate) / Standard Deviation of Returns
A Sharpe of 1.0 means you’re earning one unit of return for each unit of volatility. A Sharpe of 2.0 means you’re earning two units of return per unit of volatility. Higher is better, and above 2.0 is excellent for any strategy running longer than a year.
The problem with Sharpe: it punishes upside volatility the same as downside volatility. If your strategy has occasional windfall winners, the Sharpe ratio penalizes you for it. Nobody complains about upside variance.
Enter the Sortino ratio:
Sortino Ratio = (Average Return - Risk-Free Rate) / Downside Deviation
Sortino only considers negative returns when calculating volatility. It penalizes what actually hurts you (drawdowns) and ignores what helps you (outsized winners).
Benchmark values for the Sortino ratio:
- Below 1.0: Mediocre risk-adjusted returns.
- 1.0 - 2.0: Good. Acceptable for most strategies.
- 2.0 - 3.0: Very good. You’re getting paid well for the risk.
- Above 3.0: Exceptional. Again, verify sample size.
Annualization matters. A Sharpe calculated from daily returns gets multiplied by sqrt(252) to annualize. Monthly returns use sqrt(12). Comparing a daily Sharpe to a monthly Sharpe without adjusting is comparing apples to chainsaws. Always confirm the basis.
Maximum Drawdown and Recovery
Maximum drawdown (MDD) is the largest peak-to-trough decline in your equity curve, expressed as a percentage.
Max Drawdown % = (Peak Equity - Trough Equity) / Peak Equity x 100
If your account grew to $50,000 and then dropped to $38,000 before recovering, your max drawdown is 24%.
This number determines whether you’ll psychologically survive your strategy. Most traders can stomach a 15% drawdown. Very few can sit through 40% without intervening. Know your drawdown tolerance before you go live, not during.
But depth isn’t everything. Duration matters just as much, sometimes more.
A 20% drawdown that recovers in two weeks is annoying. A 12% drawdown that lasts nine months is soul-crushing. The longer you sit in drawdown, the more likely you are to abandon the strategy right before it recovers.
Recovery factor ties it together:
Recovery Factor = Net Profit / Maximum Drawdown
If your strategy made $30,000 with a max drawdown of $10,000, your recovery factor is 3.0. Three dollars of gain for every dollar of pain. Above 3.0 over a meaningful sample is strong. Below 1.0, and the drawdown barely justified the returns.
When evaluating a strategy, ask three questions about drawdown:
- How deep does it go? (Max drawdown %)
- How long does it last? (Max drawdown duration in days/weeks)
- How quickly does it recover? (Recovery factor)
If the answer to any of those makes you uncomfortable, size down. No strategy is worth running at a size that makes you override it.
Trade Frequency and Sample Size
Here’s an uncomfortable truth: a backtest with 50 trades is statistically meaningless.
With 50 observations, the confidence interval around any estimate is enormous. You could be looking at pure luck and not know it.
Rough guidelines for statistical confidence:
- Under 100 trades: Don’t draw conclusions. Treat it as preliminary.
- 100 - 300 trades: Starting to be useful. Wide confidence intervals, but patterns may be real.
- 300 - 1,000 trades: Solid sample. You can start to trust the metrics.
- Above 1,000 trades: Strong statistical basis. If the metrics hold here, you likely have something real.
This matters for strategy design. A strategy that trades once a week gives you 52 trades per year — almost six years to reach 300. A strategy that trades five times a day gets there in two and a half months. More trades means faster validation and more statistical confidence.
This is also why higher-frequency strategies benefit most from low-latency VPS hosting. When you’re taking five or ten trades a day, execution delay compounds across thousands of trades per year. A $2 slippage difference per trade across 1,200 annual trades is $2,400 of drag.
Equity Curve Analysis
Numbers can lie. The equity curve doesn’t.
A healthy equity curve rises from lower-left to upper-right with relatively smooth progress. Drawdowns happen, but they should be proportional to gains and recover in a reasonable timeframe.
Warning signs in the equity curve:
- A flat line followed by a sudden spike. Your strategy made all its money on one or two trades. That’s not a strategy; that’s a lottery ticket.
- A smooth upward slope that suddenly breaks. Possible regime change. The market conditions your strategy exploited may no longer exist.
- Step-function gains with long flat periods. The strategy only works in specific conditions. Fine if you know that, dangerous if you don’t.
- An upward slope that gradually flattens. Alpha decay. Your edge is eroding, likely because other participants have found and arbitraged it.
The most useful exercise is comparing the live equity curve to the backtest portion. If the live curve looks materially different, something has changed and “it just needs more time” is rarely the answer.
We recommend plotting a 50-trade rolling expectancy alongside your equity curve. If rolling expectancy turns negative and stays there for 50+ trades, your strategy has likely stopped working. Don’t wait for a drawdown to confirm what the numbers already told you.
Red Flags: Spotting Overfitting and Curve Fitting
Overfitting is the silent killer of algorithmic trading. The strategy looks incredible in backtests because it was optimized to fit historical noise. Then it meets live markets and falls apart because it memorized the past instead of learning patterns.
Be suspicious when you see:
- Too many parameters. If your strategy has 12 tunable parameters optimized over a 2-year backtest, you’ve almost certainly curve-fit. A robust strategy should have as few free parameters as possible. Two to four is a good target. Above six, justify every single one.
- Spectacular backtest results. A profit factor above 4.0, a Sharpe above 3.0, or a win rate above 80% combined with large average winners. These numbers almost never survive live trading. The more impressive the backtest, the more skeptical you should be.
- Metrics that degrade sharply out of sample. Profit factor of 2.5 in-sample and 1.1 out-of-sample? It’s overfit. A robust strategy shows similar (not identical) performance across both periods.
- Sensitivity to small parameter changes. Change your moving average from 21 to 23 periods and the strategy falls apart? That’s fragility, not edge. A real edge is robust across a range of nearby parameter values.
- No losing months. Real strategies have losing months. If your backtest shows 36 consecutive profitable months, either it’s overfit or you’re not accounting for realistic execution costs.
- Suspiciously low drawdowns. A strategy returning 80% annually with a 3% max drawdown doesn’t exist outside of backtests.
The best defense against overfitting is simplicity. Fewer parameters, out-of-sample validation, testing across multiple instruments, and always assuming your live performance will be 30-50% worse than your backtest. If the strategy is still worth trading after that haircut, you might have something real.
What We Actually Look At
When traders ask us what metrics to focus on, here’s our short list:
- Expectancy per trade. Positive and large enough to survive execution costs.
- Profit factor. Above 1.5, ideally above 1.8.
- Maximum drawdown duration. Can you psychologically survive the worst drought?
- Sortino ratio. Above 2.0 annualized is strong.
- Sample size. At least 300 trades before you trust anything.
- Out-of-sample consistency. Similar metrics on data the optimizer never saw.
Win rate isn’t on that list. Neither is total return (meaningless without context about risk taken). Focus on these six, and you’ll avoid most of the traps that blow up retail algo traders.
Your strategy is only as good as its weakest metric. Run the numbers honestly, and make sure your infrastructure can execute the edge your metrics promise.