DÉCODE
the option patterns
A multi-exam decoder for multiple-choice option distribution — 60 sets / 11 ML models / cross-era strict audit.
The truth about "folklore" rules: "3-long-1-short pick shortest" is the only weak rule validated on both exams (+2~5%).
College English Test Band 4
30 sets · 6 ML models
expected hit rate
Three Eras
2017-2018 · 2019-2020 · 2021-2023
v4.1 → v4.6
3-long-1-short heuristic wins
Folklore audit
5 eliminated / 8 kept
6 audits
99.7% A parser bug · 3-long-1-short real effect
Basic stats visualization
letter dist · length bias · position bias
Note: CET-4 site has no figures (the 27 belong to CET-6).
Complete code & data
5 Python scripts / 4 reports / 1 source list
OptionDecoder/
├── README.md # Repo overview
├── data/cet4/ # 6 JSON (rules/models/eras/audit/stats/folklore)
├── src/cet4/ # 5 Python scripts
│ ├── utils/ # fetcher / parser / basic_stats / folklore_audit / final_audit
│ └── v4.1_baseline/ # v4_ml_loo.py — 5-fold cross-era
├── analysis/cet4/ # ml_v4_loo / ml_v4_summary / final_audit
├── reports/cet4/ # README / FINAL_STRATEGY_v46 / AUDIT_REPORT_v46 / ELIMINATED_METHODS
└── plans/cet4/sources.md # 30 set URLs
College English Test Band 6
36 sets · 5 ML models
expected hit rate
Four Eras
2010-2013 · 2014-2016 · 2017-2019 · 2020-2025
v3.1 → v3.5
Stacking cross-era AUC 0.66
Folklore audit
19 eliminated / 8 kept
6 audits
38.27% over-estimate corrected · Bayesian strict
Full visualization
letter dist / length bias / ROC / 3-long-1-short
Complete code & data
17 Python scripts / 8 reports / 4 Era analyses
OptionDecoder/
├── README.md # Repo overview
├── data/cet6/ # 4 JSON (rules/models/eras/audit) + basic_stats + folklore
├── src/cet6/ # 17 Python scripts
│ ├── utils/ # audit / distractor / era / folklore / viz / quick_wins
│ ├── v3.1_baseline/ # ml_train (XGBoost AUC 0.606)
│ ├── v3.2_ner/ # jieba NER + 100-fold + parse_v4/v5 + stats_v32
│ ├── v3.3_strict/ # 8 models + 11 rules + merge_v6 + ml_v33
│ ├── v3.4_accurate/ # Bayesian + Group K-Fold + ml_v34
│ └── v3.5_ensemble/ # Stacking + cross-era + audit_v35 + final_strategy
├── analysis/cet6/ # era_comparison / distractor / folklore / v3.2-v3.5 results
├── analysis/figures/ # 27 analysis charts (1.2 MB)
├── reports/cet6/ # README / FINAL_STRATEGY_v35 / AUDIT / ELIMINATED / 5 more
└── plans/cet6/ # 3 RESEARCH_PLAN versions
CET-4 vs CET-6
Same methodology, different conclusions
| Dimension | CET-4 (Mint) | CET-6 (Amber) | Gap |
|---|
One methodology
Two independent pipelines
Bulk-download exam HTML, filter copyrighted content, keep only question stems and answer keys.
Multiple answer-key formats (A/B/C/D) handled; full-width parens & Roman numerals supported.
Letter dist, length bias, position bias, consecutiveness — 30+ folklore rules audited one by one.
From baseline to Stacking, cross-era validation + LOO double insurance, over-fitting stripped out.
5-fold CV reports real reproducible hit rate; inflated numbers from parser bugs are flagged.
GitHub Pages static site, JSON-driven render, full MIT open source.
The truth about folklore rules
regardless of CET-4 or CET-6
For both CET-4 and CET-6, "3-long-1-short pick shortest" is the only weak rule validated on both datasets. Its real hit rate is +2~5 percentage points above random — far less than the +10~20% folklore claims.
What actually moves your score is finishing the paper, not using "folklore rules" to replace doing the work.