Methodology/Identifiability
What Each Dimension Measures
Causal identifiability atlas · verified by intervention
For each of the twelve cognitive dimensions, the markers that drive it are removed from thirty canonical texts — Plato through Du Bois — and the fingerprint is recomputed. The table reports four numbers per dimension: how far the dimension itself collapses, how far the full 12-dim vector moves, how many of the five public labels flip on average, and how often the reasoning_mode label flips specifically. Strong rows are dimensions that genuinely measure what their name claims; weak rows are dimensions whose score barely moves when their signal is taken away.
| Dimension · Intervention | |Δ dim| | cos shift | labels flipped | rm flip |
|---|---|---|---|---|
02Epistemic Diversity remove all hedge and assertion markers | 0.77 | 0.122 | 1.03 / 5 | 7% |
03Temporal Orientation remove past and future temporal markers | 0.61 | 0.015 | 0.90 / 5 | 0% |
06Authority Reference remove authority-citation markers | 0.39 | 0.032 | 0.57 / 5 | 57% |
08Experiential Reference remove first-person experiential markers | 0.33 | 0.021 | 0.43 / 5 | 43% |
10Dialectical Complexity remove contrast and synthesis connectives | 0.33 | 0.019 | 0.70 / 5 | 0% |
11Abstraction Level remove abstract and concrete vocabulary | 0.29 | 0.014 | 0.73 / 5 | 0% |
04Argument Density remove claim connectives (is, must, therefore, …) | 0.26 | 0.012 | 0.23 / 5 | 3% |
07First-Principles Reasoning remove first-principles markers | 0.17 | 0.003 max 0.16 | 0.13 / 5 | 10% |
12Intellectual Tempo collapse sentence boundaries (no length variation) | 0.13 | 0.079 | 0.00 / 5 | 0% |
05Conceptual Leap collapse paragraph breaks into a single block | 0.13 | 0.057 | 0.93 / 5 | 0% |
01Epistemic Confidence remove all hedge and assertion markers | 0.11 | 0.122 | 1.03 / 5 | 7% |
09Evidential Reference remove data-and-evidence markers | 0.08 | 0.000 max 0.06 | 0.07 / 5 | 7% |
Reading the table
The largest |Δ dim| rows — Epistemic Diversity, Temporal Orientation, Authority Reference, Experiential Reference, Dialectical Complexity— are dimensions whose score is genuinely driven by their own lexicon: take the markers away and the targeted score moves an average of 0.3–0.8 toward neutral, the labels flip on most texts, and the geometry shifts perceptibly. These are well-identified instruments on canon prose. Their numbers mean what they say.
Authority Reference is the most striking row. Its cosine shift is small but its reasoning-mode flip rateis the highest in the table: 57% of canon texts change reasoning mode when authority-citation markers are removed. This is the corpus speaking honestly — Aquinas cites Aristotle, Hume cites Locke, Du Bois cites the abolitionists. Strip those citations and the engine genuinely re-classifies the writer’s reasoning mode. The dimension is doing what its name claims.
Evidential Referencesits at the bottom on canon prose: its lexicon (data, evidence, percent, n=) rarely fires on philosophy, theology, or pre-twentieth-century essay. This is a property of the corpus, not a flaw in the dimension. A modern empirical-essay corpus — Cowen, Alexander, blog-era prose — would show a different row. The atlas reads what the canon contains.
Intellectual Tempo shifts the geometry (cosine 0.08) but flips zero labels: collapsing sentence boundaries reshapes the vector without driving any single label rule, because tempo lives in the variance, not in any threshold. The dimension is real and measurable; the labels are coarser than the underlying topology.
First-Principles Reasoning moves modestly on canon prose (10% reasoning-mode flip rate) where it would move sharply on, say, internet-rationalist essays. Canon writers reason from first principles structurally rather than verbally — Spinoza builds more geometrico, Descartes proceeds methodically, but neither leans on the explicit "by definition / it follows that" markers the lexicon was tuned for. A useful row to read alongside the substrate boundary: the lexicon is calibrated to a specific analytic register.
What this is — and isn’t
What it is.A causal-identification check: each dimension’s score and its downstream label flip in proportion to the presence of the markers the engine claims to read. Where the bars are large, the instrument is doing what it says. Where they are small, the instrument either has no signal to read on this corpus, or the dimension is measuring something that survives lexicon removal — which is itself a finding.
What it isn’t. A claim that the lexicons are exhaustive or universal. The substrate boundary still applies: the markers are English-analytic by construction. The atlas tells you the dimension reads what its lexicon reads; the substrate section tells you whose prose that lexicon was built for.
Corpus. Thirty public-domain canonical texts — Plato (Republic), Aristotle (Nicomachean Ethics), Marcus Aurelius (Meditations), Augustine (Confessions), Aquinas (Summa Theologica), Montaigne (Essays), Descartes (Discourse), Bacon (Novum Organum), Spinoza (Ethics), Hobbes (Leviathan), Locke (Second Treatise), Hume (Enquiry), Kant (Critique of Pure Reason), Smith (Wealth of Nations), Tocqueville (Democracy in America), Mill (On Liberty), Darwin (Origin of Species), Marx (Manifesto), Nietzsche (Beyond Good and Evil), Emerson (Essays), Thoreau (Walden), James (Pragmatism), Russell (Problems of Philosophy), Wollstonecraft (Vindication), Du Bois (Souls of Black Folk), Woolf (A Room of One’s Own), Confucius (Analects), Lao Tzu (Tao Te Ching), Douglass (Narrative), and Tagore (Sadhana). Each text is a 30,000–45,000 character excerpt fetched from Project Gutenberg via the same pipeline that seeds Rodin’s production canon archive (scripts/seed-archive.ts). Human prose, not synthetic; auditable by Gutenberg ID. Validation against documented author-pair relationships is reported separately on the validation harness page.
Generated 2026-04-28 · corpus n = 30