§ Research · paper

The transformer paper-recommendation advantage is real at the head of the impact distribution and decays to null across the broad literature: a 4-checkpoint, all-26-field convergence study of SPECTER vs TF-IDF

Bucket Foundation · research-atlas working groupBucket Foundationpreprint · 2.1 (final cross-field preprint)2026-06-23CC-BY-4.0

Corpus: research-atlas v0.3.0 — OpenAlex, all 26 fields, impact-ranked 2015–2024, 78,000 → 361,800 works over 4 checkpoints (concept DOI 10.5281/zenodo.20774322)

read the PDF ↗DOI ↗code + data on github ↗

Abstract

A companion single-subfield study showed that SPECTER (a transformer pre-trained on the scientific-paper citation graph) beats a TF-IDF baseline at held-out citation prediction in High-Energy Physics (+15.4% relative MAP, p = 0.0005) — a large win, measured on the citation-dense top-cited slice of one subfield. The natural question — the one a practitioner faces when reaching for a neural paper-recommender — is whether that advantage generalizes.

We answer it with a checkpointed, resumable, producer/consumer pipeline that pulls an impact-ranked corpus (most-cited papers first) across all 26 OpenAlex top-level fields, builds a complete in-corpus citation graph and PageRank per field, embeds title+abstract on a local AMD GPU (ROCm) at a measured 9.1 docs/s, and runs the identical held-out citation-prediction evaluation — SPECTER vs TF-IDF vs word2vec vs a text-free graph recommender, with bootstrap CIs and a paired test — in every field, then grows the corpus and re-measures. The result is a clean convergence finding. At checkpoint 1 (top ~3k works/field, 78,000 works), SPECTER beats TF-IDF in 16 of 26 fields and the across-field edge is large and nearly significant: combined mean ΔMAP +0.0095 (95% CI [−0.0005, +0.0195], bootstrap p = 0.062). As the impact-ranked corpus broadens — checkpoint 2 (130,000), checkpoint 3 (361,800) — the edge decays monotonically toward null: 12/26 then 11/26 wins; combined ΔMAP −0.0008 (p = 0.80) then −0.0019 (95% CI [−0.0075, +0.0034], p = 0.49). Checkpoint 4 found the impact-ranked corpus had plateaued at 361,800 works and reproduced checkpoint 3 exactly, so the result is converged.

The headline: neural paper-recommendation's edge is concentrated in the head of the impact distribution; across the broad literature it is not a general win. The advantage survives where fine-grained phrase meaning carries relevance (Computer Science, Social Sciences, Neuroscience, Biochemistry) and reverses in physical-science / pharmacology fields (Pharmacology −0.040 p < 0.001; Chemistry −0.025 p < 0.001; Earth & Planetary −0.020 p < 0.001) where exact-term matching wins. Citation concentration (Gini 0.243–0.501) and interdisciplinarity (cross-field reference fraction 0.169–0.558) vary by field but do not predict the split.

Key findings

SPECTER's across-field edge over TF-IDF is large and nearly significant on the most-cited core (+0.0095 ΔMAP, 16/26 wins, p = 0.062 at checkpoint 1) and decays monotonically to null as the corpus broadens (−0.0019, p = 0.49 at checkpoint 3).
The result is converged: checkpoint 4 hit the corpus availability + rate-limit ceiling at 361,800 works and reproduced checkpoint 3 to the digit (11/26, −0.0019, p = 0.49).
SPECTER wins where fine-grained meaning carries relevance (Social Sciences +0.022, Computer Science +0.021 p < 0.001) and loses in lexical/physical-science fields (Pharmacology −0.040, Chemistry −0.025, both p < 0.001).
The +15.4% HEP win did not survive aggregation: Physics & Astronomy flips to a significant loss (−0.0098, p = 0.034) at the converged checkpoint.
Every number is emitted by scripts/crossfield_run.py and pinned by tests/test_crossfield.py — fully reproducible.

Figures

Cross-field convergence across 4 checkpoints (78k to 362k works): the combined mean ΔMAP and its 95% CI decay to null, and the win fraction crosses below the 0.5 coin-flip line, then plateaus at checkpoint 4.

Cite this paper

DOI: 10.5281/zenodo.20808201

@misc{bucket2026paperranking,
  title        = {The transformer paper-recommendation advantage is real at the
                  head of the impact distribution and decays to null across the
                  broad literature: a 4-checkpoint, all-26-field convergence
                  study of SPECTER vs TF-IDF},
  author       = {{Bucket Foundation research-atlas working group}},
  year         = {2026},
  howpublished = {Bucket Foundation preprint},
  doi          = {10.5281/zenodo.20808201},
  url          = {https://doi.org/10.5281/zenodo.20808201},
  note         = {research-atlas v0.3.0}
}

← all papers the research-atlas graph open datasets