bucket foundation — inverse omegabucket.foundation
§ education research · Landscape

Map expansion: cost, depth × field, trend, funnel, gates

Bucket Foundation · education-atlas working paperDOI: pending · CC-BY-4.0source on github ↗

03 — Expanding the Knowledge-Access Map: Cost, Field, Time, Continuity, Latency

Five new dimensions over the same Age × Knowledge-depth grid

education-atlas landscape analysis. Generated by `analysis/landscape/buildexpansion.pyresultsexpansion.json`; figures by `makefiguresexpansion.py`; headline numbers pinned by `testexpansion.py. It **reuses** the L0–L5 depth ladder and age bins from scale.py and the same access surface from buildaccess.py` (docs `01`/`02`), so the expansion is consistent with the base map rather than a parallel re-derivation. Every modeled cell is flagged real / anchor / estimated below.


0. What this adds

Docs 01 (solution landscape) and 02 (access measurement) built the base map: who reaches each knowledge depth, by income, and where solutions cluster. This brief expands the map along the five dimensions it did not yet build, each with a real graph:

#DimensionQuestionFigure
1Cost-to-accessWhat does it cost to reach each depth?fig_cost_surface.png
2Field / breadth (3rd axis)Is depth served evenly across disciplines?fig_depth_field.png
3Temporal trendHas frontier access risen over time?fig_temporal_trend.png
4Continuity / pipeline-leakOf 100 at L0, how many survive to L5?fig_continuity_funnel.png
5Latency / gatekeepersHow many gates between a learner and L4/L5?fig_gatekeepers.png

The depth ladder is unchanged: L0 literacy · L1 K-12 · L2 undergrad · L3 graduate/professional · L4 frontier (read primary research) · L5 producing new knowledge. Age bins: 0-5 · 5-18 · 18-22 · 22-65 · 65+.


1. Dimension 1 — Cost-to-access per depth

cost surface
cost surface

analysis/landscape/figures/fig_cost_surface.png

A USD cost surface over age × depth (midpoint of a cited low/high range, log-colored, gated by the same structural reachability mask as the access surface). The cost to reach and operate at each rung, per person-year:

DepthCheapest legit pathTypical paid pathFree $0 path exists?
L0 literacy$0 (public primary, Wikipedia, free apps)$0Yes
L1 K-12$0 (public secondary, OER)~$15k/yr (US private K-12)Yes
L2 undergrad$0 (free-tuition systems, MIT OCW, OpenStax)~$40k/yr (US private; ~$11k public in-state)Yes (content; not credential)
L3 grad/prof~$12k/yr (public grad)~$60k/yr (US law/med/MBA)No
L4 frontier$35–50 per paywalled article; $0 to read OA (arXiv/PMC/PLOS/Unpaywall)APC ~$2k avg, Nature 2026 = $12,850Yes — to READ
L5 produce~$50k–150k/yr (one funded researcher-year direct cost)No

Headline (cost): The cost curve is bimodal, not monotone. Reaching the floor (L0–L2) and reading the frontier (L4) both have a genuine $0 path — public schooling, OER, and open access. But there is *no free path to produce knowledge: L3 (graduate credential) starts at ~$12k/yr and L5 (doing research) costs ~$50k–150k per researcher-year, the most expensive rung on the grid. The economic cliff is not "the frontier is expensive to read" — that's increasingly free — it is "producing knowledge has no free on-ramp." The single sharpest paywall number: Nature's 2026 APC is $12,850 to publish one open-access article.*

Real vs estimated: USD anchors are REAL cited 2024–2026 figures (Nature APC $12,850 per STAT 2026; US tuition per College Board; per-article paywalls per publisher sites; researcher-year direct cost). The age×depth midpoint surface is a derived summary (real anchors, estimated midpoint), and the figures are US/OECD-leaning — the shape (free floor, no free production path) holds globally, the exact dollars do not.


2. Dimension 2 — The 3rd axis: depth × FIELD

depth x field
depth x field

analysis/landscape/figures/fig_depth_field.png

Coverage is not uniform across disciplines. Using the research-atlas researcher-per-field counts (real) as field-size, the served-score (0–1) heatmap shows two gradients at once: rich-vs-thin across fields, and the universal frontier cliff within every field.

Real field sizes (researchers in the corpus):

FieldResearchersReal?
Biomedicine / Biology689,684real
Earth / Climate116,840real
Physics / Astronomy108,466real
Engineering93,027real
Computer Science / ML44,999real
Materials Science44,536real
Economics / Social Sci42,276real
Chemistry20,531real
Mathematics5,198real
Humanities (hist/phil/lit)1,500placeholder
Law / Civics1,200placeholder
Arts / Performing arts500placeholder

Headline (field): Depth coverage is wildly uneven across fields. Biomedicine has ~133× more researchers than mathematics (689,684 vs 5,198) and ~460× more than the arts placeholder. Even within well-served STEM, the frontier thins everywhere — L4 served-score is below L2 in every single field. And the humanities/arts/law are structurally absent from the STEM-built research corpus entirely: their near-zero presence is not a data gap to apologize for, it is the finding — the frontier-access infrastructure (preprints, discovery, AI research tools) is overwhelmingly built for biomedical and physical science, leaving the humanities frontier the thinnest cell of all.

Real vs estimated: field sizes for the 9 STEM/social fields are REAL (research-atlas researcher_segment). Humanities/arts/law sizes are documented placeholders (the corpus does not index them; their near-absence is the point). The served-score = field-size⁰·⁵ × a per-depth attenuation (estimated, shaped on the base-map frontier cliff).


3. Dimension 3 — Temporal trend: is access democratizing?

temporal trend
temporal trend

analysis/landscape/figures/fig_temporal_trend.png

Three real series plus tool-arrival milestones, 2000→2024:

Series20002024Change
Global OA share of new papers12%54%×4.5
Global internet penetration6.7%68%×10.1
Local corpus OA% (biased high)~93%corroboration

Milestones layered on the timeline: arXiv (1991), Google Scholar (2004), PMC mandate (2008), bioRxiv (2013), Unpaywall/Semantic Scholar (2015), OpenAlex + scaled LLM research tools (2022), autonomous science agents (2024).

Headline (temporal): Reading the frontier is genuinely democratizing — producing it is not. Open-access share of new papers quadrupled (12%→54%) and crossed the 50% line in the early 2020s, while internet penetration grew ~10× (6.7%→68%) and a stack of free discovery + AI tools arrived. The read-access cliff from doc 02 is eroding fast. But this trend touches L4 (reading) only — it does not move the L4→L5 frontier-production rate, which doc 02 pins at ~0.136% of humanity and which has no comparable democratizing curve. *The honest answer: access to consume the frontier is democratizing; access to produce it is persistent.*

Real vs estimated: global OA-by-year and internet-by-year are REAL anchors (published OA bibliometrics — OpenAlex/Unpaywall/Curtin COKI; World Bank/ITU IT.NET.USER.ZS). Tool milestones are real dates. The local corpus OA series is real but OA-selected (it sits ~90%+ because the corpus is built from OA-discoverable works) — used only as directional corroboration, never as the headline.


4. Dimension 4 — Continuity / the pipeline leak

continuity funnel
continuity funnel

analysis/landscape/figures/fig_continuity_funnel.png

Of 100 people present at L0, how many survive to each deeper rung (the base-map world-average access normalized to the L0 cohort):

DepthSurvivors per 100 at L0Drop from previous
L0 literacy100.0
L1 K-1275.6−24.4
L2 undergrad45.3−30.3
L3 grad/prof9.76−35.5 ← biggest leak
L4 frontier0.17−9.6
L5 produce0.077−0.09

Headline (continuity): The biggest single leak is L2→L3 — the undergrad-to-graduate gap — which loses ~36 of every 100 who entered L0. Of 100 people, ~76 reach secondary, ~45 reach undergrad, but only ~9.8 reach graduate depth, ~0.17 reach the frontier, and ~0.08 ever produce new knowledge. The leak is not at the bottom (literacy retains 76%) — it is the middle-to-upper transitions: the established-to-graduate step (L2→L3) and the graduate-to-frontier step (L3→L4, conditional advance ~1.7%) are where the pipeline empties out. This is the quantified version of doc 01's "the L3→L4 bridge is unbuilt."

Real vs estimated: survival is the base-map world_access (REAL L0–L2, anchored L3–L5) normalized to the L0 cohort. It is a survival/presence curve, not a measured longitudinal transition — it shows how many of a representative population sit at each depth, not tracked individuals over time.


5. Dimension 5 — Latency to frontier / gatekeeper count

gatekeepers
gatekeepers

analysis/landscape/figures/fig_gatekeepers.png

A structured enumeration of the discrete gates between a motivated learner and each depth (cumulative, from a standing start):

DepthGates addedCumulativeExample new gate
L011literacy instruction
L123enrollment + years of attendance
L236diploma + admission + tuition
L339bachelor's + grad admission + grad tuition
L4413grad training + institutional affiliation + paywall/APC + domain fluency
L5518research position + funding + ethics approval + peer review + APC up to $12,850

Headline (latency): A learner must clear *13 gates to read the frontier (L4) and 18 to produce (L5) — and 11 of the 18 are structural gates that effort alone cannot pass: tuition, admissions, institutional affiliation, paywalls/APCs, a funded research position, ethics approval, and peer-review acceptance. The latency to the frontier is not mainly a knowledge barrier (which open access + AI tools now lower); it is a stack of institutional and financial gates*, and they concentrate at exactly the L3→L4→L5 transitions where doc 04's pipeline leak is worst.

Real vs estimated: the gate inventory is a structured enumeration of real, named barriers (enrollment, tuition, prerequisites, affiliation, paywall/APC, peer review). Counts are exact for the enumerated list; the list is representative, not exhaustive.


6. What's real vs estimated (the honesty ledger)

DimensionStatusAnchor / assumption
Cost — USD per depthREAL anchorsNature APC $12,850 (STAT 2026), College Board tuition, publisher paywalls, researcher-year cost
Cost — age×depth midpoint surfacederivedmidpoint of real low/high; US/OECD-leaning
Field sizes (9 STEM fields)REALresearch-atlas researcher_segment
Field sizes (humanities/arts/law)PLACEHOLDERcorpus is STEM-built; near-absence is the finding
Field served-scoreestimatedsize⁰·⁵ × per-depth attenuation shaped on base cliff
Global OA share by yearREAL anchorOA bibliometrics (OpenAlex/Unpaywall/COKI)
Internet penetration by yearREAL anchorWorld Bank / ITU IT.NET.USER.ZS
Tool milestonesREAL datesarXiv→FutureHouse timeline
Local corpus OA% by yearREAL but biasedcorpus OA-selected; corroboration only
Continuity funnelREAL (L0–L2) + anchored (L3–L5)base-map world_access normalized; presence not longitudinal
Gatekeeper inventorystructured enumerationreal named barriers; counts exact for the list
Depth ladder L0–L5, age binsCONSTRUCTEDreused from scale.py

Known limitations. (1) Costs are US/OECD-leaning; the shape generalizes, the dollars do not. (2) The field corpus is STEM/biomed-built, so the humanities frontier is under-counted (the placeholder makes the gap visible but is not a measured census). (3) The continuity funnel is a cross-sectional survival curve, not tracked individuals. (4) The gate list is representative, not exhaustive. (5) The OA temporal trend measures reading access only — it does not democratize production.


7. The expansion headline

*The base map said depth is the binding constraint and income buys it. The expansion says the binding constraint is production, not consumption. Reaching the floor and reading the frontier both now have a $0 path and are democratizing over time (OA ×4.5 to 54%, internet ×10 to 68%). But producing knowledge has no free path (L5 ~$50–150k/researcher-year), is wildly uneven across fields (biomed has 133× more researchers than math, humanities barely exist in the frontier infrastructure), leaks worst at the L2→L3 step (−36 of every 100), and sits behind 18 gates, 11 of them structural* (affiliation, tuition, paywalls/APCs up to $12,850, peer review). Consuming the frontier is getting free and easy; producing it remains gated, expensive, and STEM-concentrated.

8. Reproduce

cd analysis/landscape
python3 build_expansion.py            # -> results_expansion.json
python3 make_figures_expansion.py     # -> figures/fig_{cost_surface,depth_field,temporal_trend,continuity_funnel,gatekeepers}.png
python3 -m pytest test_expansion.py -q # pins the new headline numbers
python3 -m pytest -q                   # full suite (existing + expansion) stays green

Files: analysis/landscape/scale.py (constructed axes, reused), build_access.py (base surface, reused), build_expansion.py (this analysis), make_figures_expansion.py (the 5 figures), results_expansion.json (output), test_expansion.py (regression guard). Sibling docs: 01-solution-landscape.md, 02-access-data-science.md.