RESEARCH · 2026 · 60 SETS · 11 ML MODELS · 1 TRUTH

DÉCODE
the option patterns

A multi-exam decoder for multiple-choice option distribution — 60 sets / 11 ML models / cross-era strict audit.
The truth about "folklore" rules: "3-long-1-short pick shortest" is the only weak rule validated on both exams (+2~5%).

CET-430 sets · 20.95%

CET-636 sets · 27-30%

ML models11

Audits12

LicenseMIT

SECTION 01 · CET-4 SITE

College English Test Band 4
30 sets · 6 ML models

Realistic
expected hit rate

vs random 18.77% · net +2.18%

Sets

Questions

Options

Models

Effective rules

Eliminated

Eras

L+R 3-long-1-short

Data · Eras

Three Eras
2017-2018 · 2019-2020 · 2021-2023

Models · 6 ML models

v4.1 → v4.6
3-long-1-short heuristic wins

Rules · 8 effective

Folklore audit
5 eliminated / 8 kept

Audit · Strict

6 audits
99.7% A parser bug · 3-long-1-short real effect

Figures

Basic stats visualization
letter dist · length bias · position bias

Note: CET-4 site has no figures (the 27 belong to CET-6).

Repo

Complete code & data
5 Python scripts / 4 reports / 1 source list

OptionDecoder/
├── README.md              # Repo overview
├── data/cet4/             # 6 JSON (rules/models/eras/audit/stats/folklore)
├── src/cet4/              # 5 Python scripts
│   ├── utils/                # fetcher / parser / basic_stats / folklore_audit / final_audit
│   └── v4.1_baseline/       # v4_ml_loo.py — 5-fold cross-era
├── analysis/cet4/         # ml_v4_loo / ml_v4_summary / final_audit
├── reports/cet4/          # README / FINAL_STRATEGY_v46 / AUDIT_REPORT_v46 / ELIMINATED_METHODS
└── plans/cet4/sources.md   # 30 set URLs

SECTION 02 · CET-6 SITE

College English Test Band 6
36 sets · 5 ML models

Realistic
expected hit rate

vs random 25% · net +2-5%

Sets

Questions

Options

Models

Effective rules

Eliminated

Eras

L+R 3-long-1-short

Data · Eras

Four Eras
2010-2013 · 2014-2016 · 2017-2019 · 2020-2025

Models · 5 ML models

v3.1 → v3.5
Stacking cross-era AUC 0.66

Rules · 8 effective

Folklore audit
19 eliminated / 8 kept

Audit · Strict

6 audits
38.27% over-estimate corrected · Bayesian strict

Figures · 27 analysis charts

Full visualization
letter dist / length bias / ROC / 3-long-1-short

Repo

Complete code & data
17 Python scripts / 8 reports / 4 Era analyses

OptionDecoder/
├── README.md               # Repo overview
├── data/cet6/              # 4 JSON (rules/models/eras/audit) + basic_stats + folklore
├── src/cet6/               # 17 Python scripts
│   ├── utils/                 # audit / distractor / era / folklore / viz / quick_wins
│   ├── v3.1_baseline/        # ml_train (XGBoost AUC 0.606)
│   ├── v3.2_ner/             # jieba NER + 100-fold + parse_v4/v5 + stats_v32
│   ├── v3.3_strict/          # 8 models + 11 rules + merge_v6 + ml_v33
│   ├── v3.4_accurate/        # Bayesian + Group K-Fold + ml_v34
│   └── v3.5_ensemble/        # Stacking + cross-era + audit_v35 + final_strategy
├── analysis/cet6/          # era_comparison / distractor / folklore / v3.2-v3.5 results
├── analysis/figures/      # 27 analysis charts (1.2 MB)
├── reports/cet6/           # README / FINAL_STRATEGY_v35 / AUDIT / ELIMINATED / 5 more
└── plans/cet6/             # 3 RESEARCH_PLAN versions

SECTION 03 · CET-4 vs CET-6

CET-4 vs CET-6
Same methodology, different conclusions

Dimension	CET-4 (Mint)	CET-6 (Amber)	Gap

DÉCODE
the option patterns

College English Test Band 4
30 sets · 6 ML models

Three Eras
2017-2018 · 2019-2020 · 2021-2023

v4.1 → v4.6
3-long-1-short heuristic wins

Folklore audit
5 eliminated / 8 kept

6 audits
99.7% A parser bug · 3-long-1-short real effect

Basic stats visualization
letter dist · length bias · position bias

Complete code & data
5 Python scripts / 4 reports / 1 source list

College English Test Band 6
36 sets · 5 ML models

Four Eras
2010-2013 · 2014-2016 · 2017-2019 · 2020-2025

v3.1 → v3.5
Stacking cross-era AUC 0.66

Folklore audit
19 eliminated / 8 kept

6 audits
38.27% over-estimate corrected · Bayesian strict

Full visualization
letter dist / length bias / ROC / 3-long-1-short

Complete code & data
17 Python scripts / 8 reports / 4 Era analyses

CET-4 vs CET-6
Same methodology, different conclusions

One methodology
Two independent pipelines

The truth about folklore rules
regardless of CET-4 or CET-6

DÉCODE the option patterns

College English Test Band 430 sets · 6 ML models

Three Eras2017-2018 · 2019-2020 · 2021-2023

v4.1 → v4.63-long-1-short heuristic wins

Folklore audit5 eliminated / 8 kept

6 audits99.7% A parser bug · 3-long-1-short real effect

Basic stats visualizationletter dist · length bias · position bias

Complete code & data5 Python scripts / 4 reports / 1 source list

College English Test Band 636 sets · 5 ML models

Four Eras2010-2013 · 2014-2016 · 2017-2019 · 2020-2025

v3.1 → v3.5Stacking cross-era AUC 0.66

Folklore audit19 eliminated / 8 kept

6 audits38.27% over-estimate corrected · Bayesian strict

Full visualizationletter dist / length bias / ROC / 3-long-1-short

Complete code & data17 Python scripts / 8 reports / 4 Era analyses

CET-4 vs CET-6Same methodology, different conclusions

One methodologyTwo independent pipelines

The truth about folklore rulesregardless of CET-4 or CET-6

DÉCODE
the option patterns

College English Test Band 4
30 sets · 6 ML models

Three Eras
2017-2018 · 2019-2020 · 2021-2023

v4.1 → v4.6
3-long-1-short heuristic wins

Folklore audit
5 eliminated / 8 kept

6 audits
99.7% A parser bug · 3-long-1-short real effect

Basic stats visualization
letter dist · length bias · position bias

Complete code & data
5 Python scripts / 4 reports / 1 source list

College English Test Band 6
36 sets · 5 ML models

Four Eras
2010-2013 · 2014-2016 · 2017-2019 · 2020-2025

v3.1 → v3.5
Stacking cross-era AUC 0.66

Folklore audit
19 eliminated / 8 kept

6 audits
38.27% over-estimate corrected · Bayesian strict

Full visualization
letter dist / length bias / ROC / 3-long-1-short

Complete code & data
17 Python scripts / 8 reports / 4 Era analyses

CET-4 vs CET-6
Same methodology, different conclusions

One methodology
Two independent pipelines

The truth about folklore rules
regardless of CET-4 or CET-6