uage · Genetics · Artificial Intelligence

The Grammar Inside Us
— and What It Means
for Machines

How children transform broken contact-speech into living language, what ancient genes make it possible, and why Silicon Valley should be paying very close attention.

Science & IdeasMarch 202612 min read

Part I

From Broken Words to Grammar

Imagine being dropped into a sugar cane plantation in 19th-century Hawaii. Around you are workers from Japan, China, Korea, Portugal, and the Philippines — each speaking a mother tongue no one else understands. The plantation bosses speak English. What do you do?

You improvise. You borrow the most common words from whatever language is loudest, flatten all the grammar, strip out the tenses, the articles, the subordinate clauses — everything that takes years to learn — and speak in short, jagged bursts. "Me work tomorrow. You go field." This improvised contact-speech is called a pidgin. It communicates. It survives. But linguists would hesitate to call it a language, because it lacks the deep structural architecture — the rules that let speakers say anything, not just a fixed repertoire of simple ideas.

"No child has ever been born into a pidgin-speaking household and remained a pidgin speaker. Every time, within a generation, the pidgin becomes a creole."

Here is the astonishing part: when children grow up hearing a pidgin, they do not learn it. They transform it. Spontaneously, without instruction, they pour in grammatical structure that was never there before — tense markers, aspect markers, relative clauses, the full architecture of a natural human language. The result is called a creole, and it is every bit as expressive and complex as English or Mandarin. This transformation has happened independently on multiple continents, across dozens of language groups, every single time the right conditions arise.

The question that has captivated linguists and cognitive scientists for decades is simple and terrifying: where does the grammar come from? The input the children received was impoverished — chaotic, ungrammatical, structurally empty. But the output is rich, systematic, and fully generative. The children did not learn grammar. They grew it.

Part II

Steven Pinker and the Language Instinct

In 1994, cognitive scientist Steven Pinker gave this phenomenon its most compelling name: the language instinct. His argument, built on decades of fieldwork by linguists like Derek Bickerton and Noam Chomsky's earlier theoretical framework of Universal Grammar, was bold: language is not a cultural invention that humans stumbled into, like writing or the wheel. It is a biological adaptation, as species-specific as echolocation in bats or web-spinning in spiders.

The Bickerton Creole Hypothesis

Derek Bickerton spent years studying creole languages from Hawaii to Suriname to Trinidad and noticed something startling: despite zero historical contact, creoles from different parts of the world share the same deep grammatical features. They independently develop tense systems built around "before now / after now / same time as now." They develop the same structure for making questions. They mark aspect — whether an action is complete or ongoing — in nearly identical ways. Bickerton argued this was not coincidence. It was the child's internal grammar template — what he called the Language Bioprogram — asserting itself onto the raw material of the pidgin.

Pinker extended this argument by surveying the full range of evidence: the universality of language acquisition milestones across cultures, the existence of specific language impairments that leave other cognition intact (and vice versa), the existence of a dedicated language organ in the left hemisphere that responds to grammar the way the visual cortex responds to edges. Language, he concluded, is what evolution built us to do.

Part III

The First Generation, the Next Generation

The creolisation process is not a single event — it unfolds across generations in a precise, observable sequence. Think of it as a relay race in which each generation picks up the baton exactly where the biology directs them to.

The Adult Immigrants — Pidgin Speakers

Adults past the critical period arrive speaking their native languages. They cobble together a shared pidgin. It works for trade and basic coordination, but grammar is minimal and highly variable from speaker to speaker. There is no consistent word order, no systematic tense, no way to embed one clause inside another.

The First Native Generation — Creole Architects

Children born into this environment undergo something extraordinary. They hear the pidgin, but they do not reproduce it. Their brains, within the critical window of language acquisition (roughly birth to puberty), activate the language bioprogram. They impose consistent word order. They invent tense markers. They build complex sentences. By the time these children are adults, they speak a fully-fledged creole — one that is remarkably consistent among all peers who grew up in the same community.

The Second Generation — Refiners and Inheritors

Children of G1 acquire the creole as a full native language. This generation's role is crucial: they stabilise the system, iron out the last inconsistencies, and begin developing the rich stylistic variation — slang, register, dialect — that marks any mature language. The creole is now fully grammaticalised and self-sustaining.

What makes this sequence so scientifically important is the directionality. Grammar flows from child to adult, not the other way. The children are not imitating their parents — they are correcting them, injecting structure into structural chaos, driven by something the parents no longer have: an open critical window and an active biological grammar module.

Part IV — Case Study

Nicaragua, 1977: A Language Is Born

Nicaraguan Sign Language (NSL) — Idioma de Señas de Nicaragua

Before 1977, deaf people in Nicaragua had no shared language. They lived in isolated communities, developing home signs — idiosyncratic gestural systems used only within individual families — that were not mutually intelligible and had no grammar in the linguistic sense.

When the Sandinista government opened the first schools for deaf children in Managua, something unprecedented happened. Hundreds of deaf children — who had never had a common language — were suddenly placed in the same space. The teachers tried to teach lip-reading and Spanish. The children ignored the lessons. On the school bus, in the playgrounds, between classrooms, they began to talk to each other using a spontaneous gestural system.

The first cohort (G1), largely teenagers, produced a rudimentary contact sign system — complex by home-sign standards, but still relatively iconic and inconsistent. Then younger children entered the school. The G2 cohort, aged four to seven, took the G1 input and did something remarkable: they regularised it, made it more abstract, added grammatical structure, developed spatial grammar to mark verb agreement, and created a fully recursive, expressive language where none had existed before.

Linguist Judy Kegl, who studied the emergence of NSL in real time, described watching grammar appear spontaneously in the younger cohort — without any model, without any instruction, against all predictions of standard learning theory. NSL is now studied by every major linguistics department in the world as the only witnessed case of a language being created from scratch by human beings.

The crucial finding: the older students (G1, teenagers) could not fully acquire the more complex grammar the younger children created. The younger ones' brains were still within the critical window. Their language instinct was still switched on.

Part V

The Genes That Build Language

If language is a biological instinct, it must have a genetic architecture. Over the past twenty-five years, molecular genetics has begun to map this architecture — and what it has found is both specific and ancient.

FOXP2

The "language gene." Encodes a transcription factor regulating hundreds of downstream genes. Mutations cause severe speech and grammatical processing deficits. The human variant differs from chimpanzee FOXP2 by just two amino acids — changes that occurred roughly 200,000–400,000 years ago and are linked to the emergence of modern speech anatomy.

FOXP1

Works in concert with FOXP2 in the developing striatum and cortex. Disruptions are associated with intellectual disability and language delay. Regulates the migration and differentiation of neurons involved in motor-language coordination.

CNTNAP2

A downstream target of FOXP2. Associated with language delay, dyslexia, and autism spectrum conditions. Shapes the connectivity of frontal lobe language circuits during early development — essentially wiring the "grammar network."

ROBO1 / ROBO2

Govern axon guidance in the developing cortex, directing the formation of the arcuate fasciculus — the white matter tract connecting Broca's area (grammar production) to Wernicke's area (language comprehension). Variants correlate with reading and phonological processing differences.

The Critical Period Problem

These genes do not simply switch on at birth and stay on forever. They operate within developmental windows — periods when the brain is maximally plastic and the language circuits are being actively sculpted. The critical period for phonology closes earliest (around age six). The window for syntax closes around puberty. This is why a child who arrives in a new country at age four speaks without an accent; one who arrives at fifteen does not. The genes write the program, but they set an expiry date on the most powerful version of it.

The mechanism involves epigenetic regulation — the progressive methylation of gene promoter regions that silences the most aggressive plasticity genes as the brain matures. It is not that adult brains cannot learn language; they clearly can. But they learn it differently, effortfully, without the unconscious generative power that turns a pidgin into a creole in a single childhood.

The key insight is that these genes are not building "vocabulary" or "grammar rules" in any simple sense. They are building the computational architecture that makes language-like reasoning possible: recursive structure, hierarchical embedding, the ability to build infinite sentences from a finite set of elements. This architecture appears to be conserved across all human populations — which is precisely why Nicaraguan deaf children and Hawaiian plantation children, separated by a century and an ocean, produce the same grammatical innovations independently.

Part VI

Can We Keep the Window Open Longer?

This is the question that is increasingly animating neuroscientists, pharmaceutical researchers, and educationalists. If the critical period closes due to epigenetic silencing of plasticity genes, could we delay or partially reopen that closure — enabling adults to learn new languages with something approaching childhood fluency?

The answer, emerging carefully from animal models and early human studies, is: possibly yes, but with significant nuance.

Mechanisms Under Investigation

Valproic acid (VPA) — a histone deacetylase inhibitor — was shown in a 2013 study to partially reopen the critical period for absolute pitch in adults, suggesting that epigenetic tools can reset some aspects of auditory plasticity. Language researchers are watching this finding closely.

BDNF (Brain-Derived Neurotrophic Factor) regulation is a key mediator of synaptic plasticity in language areas. Exercise, sleep quality, and certain dietary factors modulate BDNF expression — giving us non-pharmacological levers that partially mimic the heightened plasticity of childhood.

PNN (Perineuronal Net) degradation — perineuronal nets are molecular lattices that form around neurons as the critical period closes, physically constraining synaptic remodelling. In mice, degrading PNNs with the enzyme chondroitinase reopens plasticity windows. Whether this is safe or feasible in human language circuits remains under investigation.

Immersive bilingual environments — there is strong epidemiological evidence that sustained high-density immersive exposure, combined with emotional engagement (the brain's limbic system regulates synaptic consolidation), meaningfully extends effective language learning efficiency into early adulthood. The genes may not reopen, but their downstream effects can be partially recapitulated through the right input conditions.

The pragmatic takeaway for multilingual learning today is that the biology, while not infinitely malleable, is far more responsive than the "it's too late after age 12" folk wisdom suggests. The critical period is not a cliff — it is a slope, and steepness varies by subsystem. Phonology closes earliest and hardest. Vocabulary never really closes. Syntax falls somewhere in between, remaining substantially learnable well into the twenties given sufficient input density and motivation.

Part VII

What This Means for Artificial Intelligence

The pidgin-to-creole story is not merely a curiosity of human developmental biology. It is a blueprint — and a challenge — for anyone building language AI systems.

Today's large language models learn language the way G0 adults learn a pidgin: through sheer statistical exposure to vast corpora, pattern-matching surface regularities without access to the generative, hierarchical, recursive grammar engine that children grow biologically. They are extraordinarily powerful pattern matchers. But they sometimes fail in precisely the ways pidgin speakers fail — when asked to handle deeply embedded clauses, novel compositional structures, or the kind of grammatical innovation that G1 children perform effortlessly.

Inspiration 01

Critical period curricula

Designing training regimes that expose models to grammatically impoverished inputs early and structurally rich inputs later — mimicking the developmental sequence — to see if hierarchical grammar emerges more robustly.

Inspiration 02

Inductive bias architectures

Building structural priors — the machine equivalent of the language bioprogram — directly into model architectures, rather than expecting them to emerge from data alone. Tree-structured networks and recursive neural grammars are early attempts.

Inspiration 03

Multi-agent creolisation

Running communities of AI agents that must communicate to accomplish tasks, with deliberately impoverished shared vocabularies — observing whether structured grammar-like conventions emerge spontaneously, as they do in children.

The NSL Parallel for AI

The NSL experiment showed that language does not need a teacher — it needs the right social conditions: a community of agents with a shared communicative need, sufficient exposure time, and the right internal architecture. Multi-agent reinforcement learning researchers have begun running exactly this kind of experiment, observing emergent communication protocols between AI agents that share some, but not all, features of human language structure. The genetics angle maps onto architecture: just as FOXP2 builds the neural substrate that makes creolisation possible, the right architectural inductive biases may be what allows AI systems to develop genuinely compositional, generative language — rather than sophisticated interpolation.

The deepest implication may be this: human language did not emerge because humans were exposed to language. It emerged because evolution built, in the human genome, an engine for generating grammatical structure from impoverished input. The most powerful path to language AI may not be more data — it may be the right innate structure. Biology discovered this four hundred thousand years ago. Computer science is only beginning to catch up.

"Every child who has ever turned a pidgin into a creole has been doing something that no AI system has yet fully replicated — growing grammar, from nothing, on schedule, out of chaos. The genome had two hundred millennia to figure out how to do it. We have had about sixty years. We are not behind. We are just beginning."

Search This Blog

Science