bucket foundation — inverse omegabucket.foundation
§ education research · Landscape

Access: the data science

Bucket Foundation · education-atlas working paperDOI: pending · CC-BY-4.0source on github ↗

02 — Quantifying Knowledge Access on the Age × Knowledge-Depth Grid

A data-science measurement of who can reach how deep into knowledge, and where the access cliff is

education-atlas landscape analysis. Generated by `analysis/landscape/buildaccess.pyresults.json; figures by makefigures.py`; headline numbers pinned by `testaccess.py`. Real anchors are World Bank EdStats (enrollment, literacy), UNESCO UIS (researchers-per-million), ITU (internet by age), and the research-atlas corpus. Constructed scales and every modeled cell are flagged explicitly below._


0. The question

Education debates usually measure enrollment — are children in a classroom? This analysis asks a harder question: for a person of a given age, how deep into a body of knowledge can they actually go? And the inverse: what fraction of humanity ever reaches the frontier — the place where new knowledge is read, let alone produced?

We lay the question on a grid:

  • X = Age / life-stage: 0-5 · 5-18 · 18-22 · 22-65 · 65+
  • Y = Knowledge depth: a constructed 6-rung ladder
  • L0 basic literacy / numeracy
  • L1 K-12 / secondary
  • L2 undergraduate
  • L3 graduate / professional
  • L4 frontier — reading primary research
  • L5 producing new knowledge

The deliverable is a measured surface over that grid, split by income tier, overlaid with solution density, and reduced to four headline findings.


1. The constructed scale (read this before trusting any cell)

The two axes are not measured by any authoritative body — they are an analytical frame this project defines in analysis/landscape/scale.py. What is real is the access proxy mapped to each rung:

DepthMeaningISCEDAccess proxyReal / Estimated
L0basic literacy/numeracyAdult literacy rate (SE.ADT.LITR.ZS)Real (World Bank)
L1K-12 / secondary1–3Secondary net enrollment (SE.SEC.NENR)Real (World Bank)
L2undergraduate6Tertiary gross enrollment (SE.TER.ENRR)Real (World Bank)
L3graduate / professional7–8Tertiary GER × graduate-entry shareReal base × estimated multiplier
L4frontier (read primary research)Researchers per million (SP.POP.SCIE.RD.P6)Real anchor (UNESCO UIS)
L5producing new knowledgeL4 × active-publishing shareReal anchor × estimated multiplier

A second constructed object is the age×depth structural mask (AGE_DEPTH_OPEN): you cannot be at undergraduate depth at age 3, so cells outside a level's reachable life-stage are marked structurally N/A (greyed, not zero). This is a modeling decision, documented in code.

Why split L4 and L5. L4 = reaching the frontier (being a researcher, able to read primary literature). L5 = adding to it (publishing). The data shows them as almost the same sliver — which is itself the finding.


2. Method

  1. Real surface (L0–L2). For each World Bank indicator, take the

latest-available value per country, then the mean within each income group (High / Upper-mid / Lower-mid / Low). Pure measurement, no modeling. (build_access.load_income_surface)

  1. L3 (graduate). Multiply the real tertiary GER by a documented

graduate-entry share per tier (0.30 HIC → 0.06 LIC), anchored on OECD Education at a Glance graduate-entry rates. Real base × estimated factor.

  1. L4 (frontier). Take UNESCO UIS researchers per million inhabitants by

income group (documented group means), divide by 1e6, ×100 → a % of population. Real anchor.

  1. L5 (production). L4 × 0.45 (share of researchers who actively publish).

Estimated multiplier on the real L4 anchor.

  1. World-average row. Unweighted mean of the four tiers (chosen for

transparency over a population weighting whose weights would themselves be modeled).

  1. Solution density. Read data/landscape/solutions.csv if the sibling

solution-landscape exists; otherwise use a documented density prior that encodes the known EdTech market shape (dense at K-12 / consumer-undergrad, sparse at the frontier). Flagged as estimated when the CSV is absent.

  1. Corroboration. The research-atlas slim DB supplies the real global

researcher population as an independent cross-check on the frontier slice.

Everything is deterministic and idempotent: re-running reproduces results.json byte-for-byte from the same inputs.


3. The four findings (with graphs)

Finding 1 — The access cliff is down the depth axis, not across age

access vs age
access vs age

analysis/landscape/figures/fig_access_vs_age.png

World-average access by depth (real for L0–L2):

DepthWorld access
L0 basic literacy82.5%
L1 K-12 / secondary62.4%
L2 undergraduate37.4%
L3 graduate / professional8.1% (est)
L4 frontier0.14% (real anchor)
L5 producing new knowledge0.06% (est)

The y-axis is logarithmic because the drop is otherwise invisible: from undergraduate (37%) to frontier (0.14%) is a ~270× fall. Within a life-stage, access barely changes with age (lifelong access keeps the lines flat); the cliff is the vertical distance between rungs. Depth, not age, is the binding constraint.

Finding 2 — The equity gradient: income decides depth

income surface
income surface

analysis/landscape/figures/fig_income_surface.png

Real World Bank values, latest per country, by income tier:

DepthHighUpper-midLower-midLowHIC÷LIC
L0 literacy97.2%94.4%79.7%58.7%1.7×
L1 secondary88.8%76.8%54.7%29.2%3.0×
L2 undergrad68.8%49.1%21.9%9.7%7.1×
L3 graduate (est)20.6%8.8%2.2%0.6%~34×
L4 frontier (real)0.42%0.12%0.027%0.0055%~75×

The gradient widens with depth. At L0 the rich-poor gap is under 2×; by the frontier it is ~75×. Tertiary access (L2) alone runs 80% in high-income vs single digits in low-income systems — the textbook equity gap, confirmed in this dataset at 68.8% vs 9.7%. (Learning poverty corroborates from the other direction: 14.2% HIC vs 89.8% LIC — nine in ten low-income children can't read at 10, so the L0 line is itself optimistic about real depth.)

Finding 3 — Coverage white-space: the top-right is empty

coverage heatmap
coverage heatmap

analysis/landscape/figures/fig_coverage_heatmap.png

Two panels over the same age×depth grid:

  • (c1) Access — % who reach each cell (world-average; structurally-N/A cells

hatched grey). The surface is hot (high access) along the bottom and cold (≪1%) along the top two rows at every age.

  • (c2) Solution density — where education solutions cluster. Dense at the

K-12 cell (L1 × 5-18) and consumer-undergrad (L2 × 18-22); the L4/L5 × working-adult corner is nearly empty.

The two panels rhyme: the cells with the worst access (frontier) also have the fewest solutions. Solution density here is a **documented estimated prior** — no `data/landscape/solutions.csv` was present at generation time; the script picks up the real CSV automatically once the sibling solution-landscape ships.

Finding 4 — Consume vs. create: ~0.1% of humanity ever produces new knowledge

frontier participation
frontier participation

analysis/landscape/figures/fig_frontier_bar.png

The single most dramatic number in the analysis:

World researchers per million ≈ 1,360 → ~0.136% of people are at the knowledge frontier. ~99.86% only ever consume.

And it is steeply unequal: 0.42% (High income) vs 0.0055% (Low income) — a ~75× gap in who even reaches L4. L5 (actually publishing) is smaller again.

Real cross-check (research-atlas): the research corpus holds 1,438,636 distinct researchers, of whom 320,879 are currently active publishers — an independent, bottom-up confirmation that the population producing knowledge is ~10⁶, i.e. a fraction of a percent of the ~8×10⁹ humans. The two methods (UNESCO per-capita anchor and the OpenAlex corpus headcount) agree on the order of magnitude.


4. Dimension 5 — the depth ceiling of a typical person

The median depth a typical member of each tier actually reaches (highest rung with ≥50% access):

Income tierDepth ceiling of the median person
High incomeL2 (undergraduate)
Upper-middleL1 (secondary)
Lower-middleL1 (secondary)
Low incomeL0 (basic literacy)

No income tier's typical person reaches graduate depth (L3) or beyond. Even in the richest countries, the median adult's ceiling is "some undergraduate exposure"; the frontier (L4/L5) is not a ceiling anyone's median approaches — it is reached by a sub-1% tail everywhere.


5. The access channel — internet by life-stage

Depth is gated not only by schooling but by the channel to reach it. ITU 2023 global online shares by age band (real anchor, coarse bins):

AgeOnline share
0-50% (mediated by caregiver)
5-1860%
18-2275%
22-6565%
65+40%

The largest age digital divide is at 65+ — exactly the life-stage with the most time for lifelong, self-directed depth. The channel narrows where the opportunity is widest.


6. What's real vs. estimated (the honesty ledger)

ComponentStatusAnchor / assumption
L0 / L1 / L2 access, all tiersREALWorld Bank EdStats, latest per country
Learning poverty, enrollment contextREALWorld Bank EdStats
Researchers per million (L4)REAL anchorUNESCO UIS SP.POP.SCIE.RD.P6 group means
Researcher corpus headcountREALresearch-atlas research_atlas_slim.duckdb
Internet by ageREAL anchorITU Facts & Figures 2023, coarse global bins
L3 graduate accessReal base × estimatedtertiary GER × OECD graduate-entry share (0.30→0.06)
L5 productionReal anchor × estimatedL4 × 0.45 active-publishing share
Solution densityESTIMATED priordocumented EdTech market-shape prior (no CSV yet)
Age×depth structural maskCONSTRUCTEDAGE_DEPTH_OPEN reachability rules
Depth ladder L0–L5, age binsCONSTRUCTEDscale.py analytical frame

Known limitations. (1) The depth ladder is an analytical construct, not an ISCED-exact mapping. (2) World-average rows are unweighted tier means, not population-weighted. (3) SP.POP.SCIE.RD.P6 group means are documented anchors, not pulled live from the cache (the education-atlas WB cache does not include the R&D series); they are conservative relative to widely-cited figures (world ~1,360/M). (4) Solution density is illustrative until the sibling solution-landscape CSV exists. (5) Data sparsity is worst exactly where access is worst — every low-income number is a lower bound (see EDUCATION_PROBLEMS.md §5).


7. Headline

Education's binding constraint is depth, not age, and it is bought by income. Access falls ~270× from undergraduate (37%) to the research frontier (0.14%); the rich-poor gap widens from under 2× at literacy to ~75× at the frontier; no income tier's typical person reaches graduate depth; and ~0.136% of humanity ever reaches the frontier where new knowledge is produced — ~99.86% only ever consume it. The age×depth grid's entire top-right corner — deep knowledge for ordinary adults — is the white space, in both access and solutions.

8. Reproduce

cd analysis/landscape
python3 build_access.py        # -> results.json
python3 make_figures.py        # -> figures/*.png
python3 -m pytest test_access.py -q   # pins the headline numbers

Files: analysis/landscape/scale.py (constructed axes), build_access.py (analysis), make_figures.py (figures), results.json (output), test_access.py (regression guard).