# Walk-Forward Analysis Flow
## Target Platform
- **CPU**: i9-13900K — 8 P-Cores @ 5.5GHz (E-Cores off, HT off)
- **RAM**: 32GB DDR4-3200
- **SSD**: NVMe Gen4 (ADATA LEGEND 850)
- **Workers**: `--workers 7` (保留 1 core 給 OS + I/O)
## Data: MXF_1m_20200223.csv
- Range: 2020-03-02 ~ 2026-03-31 (1,814 trading days, ~6.1 years)
- Size: 1,683,477 rows (1-min bars)
- All commands run from `..\2Axis\`
## Memory Budget (7 workers)
| Component | Per Worker | 7 Workers |
|-----------|-----------|-----------|
| 1-min OHLCV cache (5 arrays × 1.68M × 8B) | 67 MB | 470 MB |
| K-min data cache (worst: K=1, same as 1-min) | 67 MB | 470 MB |
| Numba trade buffer (200K × 8 cols × 8B) | 12.8 MB | 90 MB |
| date/flag arrays + overhead | ~20 MB | ~140 MB |
| **Total** | **~167 MB** | **~1.2 GB** |
32GB RAM 綽綽有餘。Python 主進程 + Pandas 讀 seed CSV 約佔 200-500MB。
---
## Pre-run: Numba JIT Warm-up
V2 函數簽名已改(新增 `trade_date_start/end`),首次執行需重新 JIT 編譯。
```bash
# 若遇 Numba 編譯錯誤才需手動清(V2 模組名不同,通常不衝突):
rm -rf __pycache__/bt_bb_unpruned_v2.* __pycache__/bt_bb_2nd_unpruned_v2.*
```
首次 JIT 編譯約 3-5 分鐘(Stage 1 + Stage 2 各一次),之後 `cache=True` 生效,後續啟動秒開。
---
## Fold 1: IS 4yr (2020/03-2024/03) + OOS 1yr (2024/04-2025/03)
### Step 1: Stage 1 Signal Optimization (IS)
```bash
python bt_bb_unpruned_v2.py --mode opt ^
--start-date 2020-03-02 --end-date 2024-03-31 ^
--output data/fold1_s1_unpruned.csv ^
--workers 7
```
- 1,166,400 combos (5 K-bar x 3 axis x 2 neutral x 21 BB x 16 ROC x 8 Flat x POC combos)
- 含 Plateau 26-grid 鄰域檢定
- Output: `data/fold1_s1_unpruned.csv`
- **預估: 30-45 min** (首次 +5 min JIT)
### Step 2: Filter Seeds
```bash
python filter_seeds.py data/fold1_s1_unpruned.csv ^
--output data/fold1_seeds.csv ^
--years 4.0
```
- Plateau=Pass, PF>=1.5, Trades>=150, RF>=3, Calmar>2
- `--years 4.0` 對應 IS 期間,確保 Calmar/Recovery 計算正確
- **預估: < 5 秒**
### Step 3: Stage 2 Risk Optimization (IS)
```bash
python bt_bb_2nd_unpruned_v2.py data/fold1_seeds.csv ^
--output data/fold1_s2_IS.csv ^
--start-date 2020-03-02 --end-date 2024-03-31 ^
--workers 7
```
- Risk grid: ~432 combos/seed (SL×TP×Trail×TimStop, pruning 後)
- Numba 在 1-min 解析度逐棒掃描 1.68M bars × 432 combos/seed
- Output: `data/fold1_s2_IS.csv` (31 cols, 含 S1 績效)
- **預估: 60-90 min** (取決於種子數量,通常 ~2,000-3,000)
### Step 4: Stage 2 OOS Validation
```bash
python bt_bb_2nd_unpruned_v2.py data/fold1_seeds.csv ^
--output data/fold1_s2_OOS.csv ^
--start-date 2024-04-01 --end-date 2025-03-31 ^
--workers 7
```
- 相同種子、相同風控網格,不同日期窗口
- 用於 IS vs OOS 比較,偵測過度擬合
- **預估: 60-90 min** (與 IS 相同,因為全量資料都需載入)
**Fold 1 小計: ~2.5-4 小時**
---
## Fold 2: IS 5yr (2020/03-2025/03) + OOS 1yr (2025/04-2026/03)
### Step 1: Stage 1 Signal Optimization (IS)
```bash
python bt_bb_unpruned_v2.py --mode opt ^
--start-date 2020-03-02 --end-date 2025-03-31 ^
--output data/fold2_s1_unpruned.csv ^
--workers 7
```
- **預估: 30-45 min** (Numba cache 已存在,無 JIT 延遲)
### Step 2: Filter Seeds
```bash
python filter_seeds.py data/fold2_s1_unpruned.csv ^
--output data/fold2_seeds.csv ^
--years 5.0
```
- **預估: < 5 秒**
### Step 3: Stage 2 Risk Optimization (IS)
```bash
python bt_bb_2nd_unpruned_v2.py data/fold2_seeds.csv ^
--output data/fold2_s2_IS.csv ^
--start-date 2020-03-02 --end-date 2025-03-31 ^
--workers 7
```
- **預估: 60-90 min**
### Step 4: Stage 2 OOS Validation
```bash
python bt_bb_2nd_unpruned_v2.py data/fold2_seeds.csv ^
--output data/fold2_s2_OOS.csv ^
--start-date 2025-04-01 --end-date 2026-03-31 ^
--workers 7
```
- **預估: 60-90 min**
**Fold 2 小計: ~2.5-3.5 小時** (無 JIT 開銷)
---
## Full-Period Baseline (Optional)
不限日期,完整 6.1 年:
```bash
python bt_bb_unpruned_v2.py --mode opt ^
--output data/full_s1_unpruned.csv --workers 7
python filter_seeds.py data/full_s1_unpruned.csv ^
--output data/full_seeds.csv --years 6.1
python bt_bb_2nd_unpruned_v2.py data/full_seeds.csv ^
--output data/full_s2.csv --workers 7
```
- **預估: ~2.5-3.5 小時** (Stage 1 ~35min + Stage 2 ~90min)
---
## Runtime Summary
| Step | Per Fold | 瓶頸 |
|------|----------|------|
| Stage 1 (1.16M combos) | 30-45 min | Numba Tier-3 OMS loop × K-bar bars |
| Filter seeds | < 5 sec | Pandas I/O |
| Stage 2 IS (~2.5K seeds × 432 RC) | 60-90 min | 1-min 解析度 1.68M bars × 432 combos/seed |
| Stage 2 OOS (same) | 60-90 min | 同上 |
| **Per Fold Total** | **2.5-4 hr** | |
| Scenario | Estimated Total |
|----------|----------------|
| **2-Fold WFA (recommended)** | **5-8 小時** |
| + Full-Period Baseline | +2.5-3.5 小時 |
| **Complete (2 folds + baseline)** | **7.5-11.5 小時** |
### Performance Notes
- **Stage 1 瓶頸是 K=1**:1.68M 根 1 分棒佔全部 K-bar 時間的 ~40%;K=10 只需 168K 根,不到 2 分鐘
- **Stage 2 瓶頸是 seed 數量**:每顆種子需 432 次 1-min 全量掃描(~14 sec/seed @ 5.5GHz)
- **7 workers 是甜蜜點**:8 workers 會搶 OS 資源,6 workers 浪費一顆 P-Core
- **Numba cache 一次性**:首輪 JIT 約 3-5 min,後續 `cache=True` 直接載入
- **記憶體峰值 ~2 GB**:32GB DDR4-3200 不會成為瓶頸
### Optimization Tips
- 如果跑過夜,建議 Process Lasso 暫時移除 Core 綁定(釋放全部 8 cores 給 Python)
- 可用 `--workers 8` 壓榨最後一核,風險是 OS 偶爾卡頓(backtest 不影響正確性)
- Stage 2 的 seed 排序已按 K_Bar 排列(worker cache 命中率最高)
---
## WFA Validation Criteria
策略通過 WFA 的條件:
1. **OOS PF >= 1.2** (IS 門檻 1.5 的 80% 折扣)
2. **OOS Win Rate >= 45%**
3. **IS→OOS PF 衰退 < 40%** (PF_OOS / PF_IS > 0.6)
4. **OOS Net Profit > 0**
5. **兩個 Fold 都出現在 top N** (跨 Fold 穩定性)
## Analysis After All Runs Complete
提供以下 CSV 即可分析:
- `fold1_s2_IS.csv`, `fold1_s2_OOS.csv`
- `fold2_s2_IS.csv`, `fold2_s2_OOS.csv`
- (Optional) `full_s2.csv`
我可以產出:
1. Top 5 WFA-validated strategies (cross-fold intersection)
2. IS vs OOS 績效比較 Excel(含 PF decay、Calmar decay)
3. Overfitting Risk Score per strategy
4. Final agent config recommendation
## Date Format
兩個階段都接受 `YYYY-MM-DD` 或 `YYYYMMDD`(dash 自動移除)。
`filter_seeds.py` 無日期參數,用 `--years` 校正 Calmar/Recovery 計算。
---
## V2 Changes Summary
| Fix | Stage 1 | Stage 2 |
|-----|---------|---------|
| Entry price: next-bar Open (non-repainting) | Fixed | Already correct |
| Signal from `oms_position_arr[i-1]` | Fixed | Already correct |
| WFA date range (`--start-date`, `--end-date`) | Added | Added |
| Data file override (`--data-file`) | Added | Added |
| Worker initializer (Windows spawn safe) | Added | Added |
| S1 columns carried through | N/A | Yes (from Phase 3) |
| 6-year data default | Yes | Yes |


