Architecture

How the pieces fit together. Not a sales pitch.

Build pipeline

papers/  +  docs/  +  templates/
        │
        ▼
   validate_papers.py         (CI: every PR)
   generate_index.py          (consolidated references.bib)
   generate_rss.py            (RSS feed)
   generate_citation_graph.py (graph data + page)
   build_sidebar.py           (rewrites sidebar block in config.ts)
        │
        ▼
   vitepress build docs
        │
        ▼
   postbuild_a11y.py          (adds <main>, aria-labels)
        │
        ▼
   docs/.vitepress/dist/      (artifact → GitHub Pages)

All Python scripts in scripts/ are plain CLI tools, no framework. They're called either from package.json's scripts block or from the Makefile. build_sidebar.py lives at the repo root because it's run before the docs are even built.

Why these choices

Static site, not SPA. The site is a content viewer. SPA routing, hydration, and runtime fetching would all be overhead.
Per-paper folder, not a single papers.json. Means I can move a topic out into its own repo without rewriting an index. The validation script is the one place that has to know the structure.
Python for build scripts. Same as the validation logic in the rest of the repo. Could have been Node, but pulling in JS just to read JSON feels wrong.
Service worker only in prod. HMR fights the SW cache. The import.meta.env.DEV short-circuit was the difference between a usable dev loop and a frustrating one.

Internationalization

VitePress's built-in locales config, no plugin. English is at the root, zh-CN is under /zh/. The paper content itself isn't translated — only the site chrome. Translating abstracts is a separate, much larger project.

Accessibility

WCAG 2.1 AA. Most of it comes for free from VitePress 1.6's default theme (the <main id="VPContent"> landmark, the skip-link, aria-labels on icon-only controls, focus-visible). The one thing we still patch in is decorative-SVG hygiene — see postbuild_a11y.py at the repo root, which runs after vitepress build and marks purely-decorative <svg> elements aria-hidden="true" focusable="false". It deliberately does NOT touch SVGs that already carry a role or aria-label (those are meaningful icons).

If a regression sneaks in, the build will still succeed but a manual axe / Lighthouse pass is what catches it — there's no automated a11y check in CI yet.

Security

Static site, no server. Attack surface is the build-time scripts and the paper metadata.
recover_pdfs.py has a domain allowlist — without it, a malicious metadata entry could trigger a request to an internal address.
The CI security workflow runs gitleaks and npm audit weekly.
See SECURITY.md for disclosure contact.

Things that are deliberately not in here

Search server / Algolia. The default local provider scales fine to a few thousand docs.
Comments. GitHub issues are enough.
Analytics. Nothing on the site needs them.
Comments on the architecture document. (See also: this document.)

Architecture ​

Build pipeline ​

Why these choices ​

Internationalization ​

Accessibility ​

Security ​

Things that are deliberately not in here ​