bucket foundation — inverse omegabucket.foundation
§ education research · Landscape

Geographic access to the frontier

Bucket Foundation · education-atlas working paperDOI: pending · CC-BY-4.0source on github ↗

05 — The Geographic & Demographic Map of Frontier Access

A per-country world map of who reaches the frontier — and who is shut out entirely

education-atlas landscape analysis. Generated by `analysis/landscape/buildgeographic.pyresultsgeographic.json`; figures by `makefiguresgeographic.py`; headline numbers pinned by `testgeographic.py. Extends 02-access-data-science.md and 03-map-expansion.md on the **same L0–L5 depth ladder** from scale.py. Real anchors are World Bank (SP.POP.SCIE.RD.P6, SE.TER.ENRR, IT.NET.USER.ZS, SE.TER.CMPL.ZS, SE.TER.ENRR.FE/.MA, SP.POP.TOTL — all per-country, cached in data/raw/worldbank/`), UNESCO UIS (women in science), and the research-atlas corpus. Every modeled cell is flagged below._


0. The question this layer answers

Doc 02 measured the access cliff down the depth axis and split it by income tier. It could not answer where. The frontier (L4 = reaching primary research, L5 = producing it) is the rung where ~0.136% of humanity lives — but that sliver is not spread evenly across the map. This layer asks:

Which countries hold the world's frontier capacity? Who within them — by gender, by wealth, by where they live — is shut out? And how many countries don't even appear in the data?

The frontier anchor is the same one doc 02 used as a documented group mean: UNESCO/World Bank researchers per million (SP.POP.SCIE.RD.P6). Here we pull it per country (latest available), turning the tier-level anchor into a real 217-country map.


1. The composite frontier-access index

For each country we build a transparent 0–100 index from four real World Bank components, weighted by how directly each gates the frontier:

ComponentIndicatorWeightRole
Researchers per millionSP.POP.SCIE.RD.P60.40the L4/L5 anchor (log-scaled, cap 9000/M)
Tertiary enrollment (GER)SE.TER.ENRR0.25the L2 pipeline into the frontier
Tertiary completionSE.TER.CMPL.ZS0.15the L3 finish rate
Internet users (%)IT.NET.USER.ZS0.20the digital access channel

Researchers/M is log-compressed before normalizing (it is heavy-tailed — Liechtenstein ~18,130/M, Korea ~9,470/M, the poorest single digits) so the index is not a one-country spike. Weights are documented, not tuned to any target. A country enters the index only if it has the frontier anchor; the others are the coverage finding (§4). Non-anchor gaps are mean-imputed within the country's income group and flagged per-country under .imputed (tertiary completion is imputed for 23 countries, tertiary GER for 6, internet for 1 — the anchor is never imputed).

This is a composite of real series with documented weights — REAL inputs, ESTIMATED combination.


2. The world map — the brutal gradient

frontier-access world map
frontier-access world map

analysis/landscape/figures/fig_geo_choropleth.png

geopandas/Natural Earth was not available at generation time, so the map degrades gracefully (as designed) to a real-geography country-bubble map: each country plotted at its (longitude, latitude) from country.parquet, sized by researchers/million, colored by the index — plus a regional bar fallback panel. The script auto-upgrades to a true filled choropleth the moment geopandas is installed; it never fails on the missing dependency.

The picture is stark: a dense band of dark-green high-index countries across Western Europe, North America, and East Asia (Korea/Japan/Singapore), fading to red across Sub-Saharan Africa and South Asia, with 75 grey ×'s — countries that have no researcher datapoint at all.

Top 15 vs bottom 15:

country ranking
country ranking

analysis/landscape/figures/fig_geo_country_rank.png

Top of the indexIndexresearchers/M
Korea, Rep.929,472
Finland928,315
Singapore918,782
Australia914,569
Norway917,451
Bottom (countries that do have a datapoint)Indexresearchers/M
Congo, Dem. Rep.1710
Burundi1822
Malawi1925
Niger1927
… Nigeria2622

The index runs 92 (Korea) → 17 (DR Congo) among countries with data — and the genuinely shut-out countries don't appear on the chart at all, because they have no number.

Regional means (real-data countries only)

RegionMean indexResearcher-data coverage
North America81.967% (2 of 3)
Europe & Central Asia77.184% (49 of 58)
East Asia & Pacific64.655% (21 of 38)
Middle East, N. Africa, Afghanistan & Pakistan62.174% (17 of 23)
Latin America & Caribbean56.844% (18 of 41)
South Asia42.467% (4 of 6)
Sub-Saharan Africa31.265% (31 of 48)

Sub-Saharan Africa's mean index (31) is less than half Europe's (77), and its researcher intensity averages ~130/M against North America's ~5,280/M.


3. Concentration — a handful of countries hold almost everything

concentration
concentration

analysis/landscape/figures/fig_geo_concentration.png

We estimate each country's absolute frontier capacity as researchers/M × population (SP.POP.SCIE.RD.P6 × SP.POP.TOTL, both real series; the product is an estimated researcher headcount). Across the ~142 countries with the anchor, total estimated capacity ≈ 11.5 million researchers, and it is hoarded:

GroupShare of world's estimated researcher capacity
Top 1 (China)25.7%
Top 5 (China, US, Japan, Germany, Korea)54.9%
Top 1069.3%
Top 2587.3%

China + the United States alone hold ~40%. The top-10 countries hold roughly seven-tenths of the world's frontier capacity; the top-25 hold almost nine- tenths. The long tail of ~117 remaining countries with data splits the last ~13% — and 75 countries split nothing measured.

The Lorenz curve of researchers/M gives a Gini of 0.646 across 142 countries — research intensity is more unequally distributed than income is in most countries. The span is 18,130/M (max) vs 1.4/M (min) — ~13,000×; the median country sits at 707/M, barely half the world average of ~1,360/M, because the mean is dragged up by a rich handful.

Cross-check (research-atlas): the org-level finding mirrors the country-level one — all top-25 funded research organizations are US/elite institutions, and the corpus holds ~1.44M distinct researchers. The concentration is fractal: it holds at the country level and again at the institution level.


4. The coverage finding — 75 countries are effectively off the map

The single most honest result in this layer is what's missing:

Of 217 real (non-aggregate) countries, only 142 have any researcher-per- million datapoint at all. 75 countries — 35% of all countries — have NONE.

These are overwhelmingly low- and lower-middle-income, concentrated in Sub-Saharan Africa, small island states, and conflict-affected regions. The absence is not noise — it is the finding. A country with no measured research capacity is not a country with a low-but-known frontier; it is a country whose frontier participation is so thin (or whose statistical capacity is so weak) that the world's flagship R&D indicator simply has no value for it. Data sparsity is worst exactly where access is worst, so every low-income number in this analysis is an upper bound on a darker reality (the same caveat doc 02 §6 raised).

This is why the bottom-15 chart is labeled "the shut-out who have a datapoint": the truly shut-out countries can't be ranked, because they don't have a number to rank.


5. The gender cut

gender in research
gender in research

analysis/landscape/figures/fig_geo_gender.png

Three panels, two of them real World Bank data, one documented UNESCO anchor:

(a) Women hold ~⅓ of research posts globally — UNESCO (anchor)

RegionWomen as % of researchers
Central Asia48.2%
Latin America & Caribbean45.8%
Arab States41.5%
Central & Eastern Europe39.6%
World33.3%
North America & W. Europe32.9%
Sub-Saharan Africa31.5%
East Asia & Pacific23.9%
South & West Asia18.5%

The global figure is ~33% — and the rich research regions (North America/W. Europe, East Asia) sit at or below the world average, while Central Asia and Latin America are near parity. The frontier's gender gap is not a "developing-world" story; it's worst at 18.5% in South & West Asia but stubbornly stuck around a third in the regions that produce most of the world's research.

(b) The leak is post-degree — World Bank (REAL)

Real tertiary gross enrollment by sex shows women now out-enroll men: world mean female 52.4% vs male 39.3% (+13.1 points). The female advantage widens with income (+21.9 pts high-income, +15.8 upper-mid, +3.3 lower-mid) and flips negative in low-income countries (−2.2 pts) — the one tier where women still trail men into university at all.

So women enter higher education in equal or greater numbers nearly everywhere, yet hold only a third of research posts. The pipeline doesn't leak at the classroom door — it leaks between the degree and the lab.

(c) Horizontal segregation — UNESCO (anchor)

Women's share of graduates by field: Health & Welfare 69%, Education 67%, Arts & Humanities 64% — but ICT/computing 21%, Engineering 28%. Even where women reach the frontier, they are routed away from the fields (computing, engineering) that dominate the research corpus measured in doc 03.


6. The rural / wealth cut (documented anchors)

Globally-comparable rural-urban and wealth-quintile depth data lives in DHS microdata, which is not in this repo's cache, so these are cited UNESCO GEM/WIDE anchors (flagged estimated):

  • In low/lower-middle-income countries, tertiary completion is essentially a

top-wealth-quintile phenomenon: ~9% of the richest quintile completes tertiary vs ~0.5% of the poorest — an ~18× gap, inside a single country.

  • Primary completion in low-income countries: ~70% urban vs ~45% rural — a

25-point gap that compounds up every depth rung, so by the frontier the rural poor are absent entirely.

The geographic gradient (§2–4) is therefore the outer shell; inside each shut-out country sits the same gradient again by wealth and location.


7. What's real vs. estimated (the honesty ledger)

ComponentStatusAnchor / assumption
Researchers/M, per countryREALWorld Bank SP.POP.SCIE.RD.P6, latest per country
Tertiary GER / completion / internet, per countryREALSE.TER.ENRR, SE.TER.CMPL.ZS, IT.NET.USER.ZS
Female/male tertiary GERREALSE.TER.ENRR.FE / .MA, latest per country
Population (for absolute capacity)REALSP.POP.TOTL
Composite index (0–100)Real inputs × documented weights0.40/0.25/0.15/0.20; researchers/M log-scaled
Non-anchor gaps in the indexIMPUTED (flagged)income-group mean; completion 23, GER 6, internet 1
Absolute frontier capacityESTIMATEDresearchers/M × population = estimated headcount
Top-share / Ginicomputed on the estimateover 142 data-carrying countries
Women researchers % by regionDOCUMENTED ANCHORUNESCO UIS Women in Science (no clean WB series)
Women graduate share by fieldDOCUMENTED ANCHORUNESCO "Cracking the code" / UIS
Rural-urban / wealth-quintile gapsDOCUMENTED ANCHORUNESCO GEM / WIDE (DHS-backed; not in cache)
Depth ladder L0–L5CONSTRUCTEDscale.py analytical frame

Known limitations. (1) Absolute capacity uses the latest available year per country, which differs across countries (most 2018–2023). (2) Researchers/M itself under-measures countries with weak statistical systems — the 75 missing countries are the extreme of this. (3) The composite weights are a defensible choice, not a derived optimum; the ranking is robust to reasonable reweighting, the absolute scores less so. (4) Women-in-research and rural/wealth numbers are cited regional/global anchors, not pulled per-country from the cache.


8. Headline

The frontier has a geography, and it is a near-monopoly. Of 217 countries, only 142 have any measured researcher capacity at all — 75 (35%) have none. Among those that do, the composite frontier-access index runs 92 (Korea) to 17 (DR Congo), and Sub-Saharan Africa averages 31 against Europe's 77. Absolute frontier capacity is hoarded: the top-10 countries hold ~69% and the top-25 hold ~87% of the world's ~11.5M estimated researchers, with China + the US alone at ~40% — a Gini of 0.646 and a ~13,000× span from the most to least research-intensive country. The org-level mirror (research-atlas: all top-25 funded orgs US/elite) confirms the concentration is fractal. And the gender cut shows the gap is post-degree, not pre-degree: women now out-enroll men into university worldwide (+13 points), yet hold only ~33% of research posts (as low as 18.5% in South & West Asia and 21% in computing). The frontier is reached by a few countries, and within them, unevenly by gender, wealth, and place.

9. Reproduce

cd analysis/landscape
python3 build_geographic.py            # -> results_geographic.json (fetches+caches 6 WB series)
python3 make_figures_geographic.py     # -> figures/fig_geo_*.png
python3 -m pytest test_geographic.py -q # pins the headline numbers

Files: analysis/landscape/build_geographic.py (analysis), make_figures_geographic.py (figures), results_geographic.json (output), test_geographic.py (regression guard). World Bank series cached in data/raw/worldbank/ (same key format as edu/connectors/worldbank.py, so a full atlas refresh reuses them).