The colonial internet (it's not a metaphor)

Written by Nhiyc‍ ‍
Editor: Lucija Ajvazoski

Colonialism didn't end. It got a software update.

That sounds like a provocative opener, but it's actually just a description of what happened. The logic, the geography, and the power structures of colonial resource extraction have mapped almost exactly onto the digital economy. Same regions being extracted from. Same centres accumulating the profit. Different infrastructure, same design.

This isn't a metaphor or a rhetorical flourish. It's a pattern that shows up in the data, if you'll excuse the pun.

How resource extraction became data extraction

European colonialism ran on a simple model: identify valuable resources in Africa, Asia, the Americas, and the Pacific, extract them with minimal cost, and move the profit to colonial centres. The infrastructure built to enable that extraction (ports, railways, telegraph lines) was owned and controlled by the same powers doing the extracting.

The digital economy runs on a remarkably similar model. Data is identified as valuable in communities across the Global South, extracted at minimal cost, processed and monetised in tech hubs concentrated in the Global North and China, and the profit moves upward and stays there. The infrastructure enabling that extraction (undersea cables, satellite networks, data centres) is owned and controlled by the same powers doing the extracting. Submarine cable networks connecting continents are predominantly owned and operated by US, European, and Chinese corporations, meaning the physical infrastructure of the global internet replicates the physical infrastructure of colonial trade.

The legal justification is also structurally similar. Colonial land theft was often formalised through the doctrine of terra nullius, the idea that land was "empty" or "unused" if it wasn't being used in ways European legal frameworks recognised. The equivalent in data is the treatment of data as public domain: if it's findable, it's fair game to scrape. If a community's knowledge exists in a form that can be digitised, it can be extracted without consent, credit, or compensation. The fact that it belongs to someone, that it was developed over generations, that its use is governed by community protocols, is simply not recognised by the frameworks doing the extracting.

Whose knowledge? A pattern, not a coincidence

The following are not isolated incidents. They are examples of a consistent pattern documented across regions and knowledge systems, in which Indigenous and community knowledge is incorporated into commercial and scientific systems without consent, credit, or compensation. The UN Human Rights Office and Cultural Survival have both documented this dynamic in the context of AI development. What follows illustrates how that pattern plays out across different regions and knowledge types.

In the Amazon basin, Indigenous communities have accumulated generations of detailed knowledge about forest management, plant properties, and ecological relationships. Across multiple documented cases, this type of knowledge has been incorporated into conservation AI systems and biodiversity databases with credit going to the researchers who digitised it, while the communities who developed it over centuries are not named, not compensated, and not consulted on how it gets used.

In the Pacific Islands, traditional navigation systems and localised weather prediction practices developed over millennia have been datafied and incorporated into climate adaptation models, often presented as innovation. Meanwhile, the communities who built that knowledge base are simultaneously studied as victims of climate vulnerability rather than recognised as experts in climate adaptation.

Across sub-Saharan Africa, farmers' accumulated expertise in crop selection, soil management, and seasonal patterns has been incorporated into precision agriculture platforms. Those platforms are then sold back to farming communities as optimisation tools, at a price, by corporations that did not develop the underlying knowledge.

In the Arctic, Inuit observations of ice patterns, seasonal changes, and ecosystem shifts represent an irreplaceable climate monitoring system built over thousands of years. That knowledge feeds into climate models which are then used to inform Arctic policy from which Inuit communities are largely excluded.

In Southeast Asia, traditional fishing practices and community-governed coastal management systems have been datafied and "optimised" by platforms that in a number of cases undermined the community governance structures that made those practices work in the first place.

The pattern is consistent: take the knowledge, strip the cultural context and community governance that gave it meaning, repackage it as data or innovation, and if you sell it back to the communities it came from, charge for it.

This is also happening at the level of AI

Many major AI models currently in commercial deployment, including those from OpenAI, Google, Anthropic, Alibaba, and others, were trained on scraped data. That scraped data may include Indigenous knowledge systems, Global South cultural expressions, traditional ecological practices, and community-generated content, taken without consent, without credit, and without compensation. As the UN Human Rights Office notes, because researchers, governments, and institutions have collected data on Indigenous peoples without consent for generations, there is now a large volume of Indigenous data publicly accessible and vulnerable to being scraped into AI training sets.

The pattern is clear: communities produce knowledge and data, which are collected, centralised, and monetised far from their source. It mirrors the extractive logic of colonialism.

This is not a side issue or an edge case. It is the foundational business model of the AI industry. The same companies now positioning themselves as essential infrastructure for climate solutions built that infrastructure on extracted knowledge from the communities most affected by the climate crisis.

A useful illustration of how circular this logic has become is Anthropic. In February 2026, Anthropic publicly accused three Chinese AI companies, DeepSeek, Moonshot AI, and MiniMax, of generating over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts in order to train their own models, a technique called distillation. Anthropic called it theft. Yet Anthropic itself had settled a landmark copyright lawsuit just months earlier in September 2025, paying $1.5 billion to authors whose books had been downloaded from pirate websites to train Claude. The objection is not to extraction itself. The objection is to being on the receiving end of it.

The Rooibos case

In 2010, Nestlé's subsidiary Nestec S.A. applied for five patents covering uses of Rooibos and Honeybush, plants traditionally farmed and used medicinally by South Africa's Indigenous Khoi and San populations for generations. The patent applications made no reference to that history. They treated the knowledge as discovered rather than inherited, and were filed without the permits required under South African law or the consent required under the Convention on Biological Diversity.

The South African government formally recognised Khoi and San rights over Rooibos in 2019, establishing a benefit-sharing agreement that now generates revenue for those communities. It took nine years of legal challenge, sustained advocacy, and a government willing to act.

This is called biopiracy, and it has a digital equivalent in every AI training dataset that contains traditional ecological knowledge, every climate model built on Indigenous monitoring data, every carbon offset scheme that commodifies forest stewardship practices developed by communities who never consented to that use.

The Rooibos case is notable partly because it resulted in a concrete legal win. Those wins are rare. The extractive practice is not.

Why infrastructure is important

You cannot separate data sovereignty from infrastructure sovereignty. Data sovereignty means communities having meaningful control over data about them, data they generate, and knowledge they've developed. But if that data lives on servers owned by Amazon Web Services, subject to US law, accessible to US government subpoenas, the sovereignty is largely symbolic.

Undersea cables connecting continents are predominantly owned by US, European, and Chinese corporations. Data centres are heavily concentrated in the Global North and increasingly in China. The "cloud" is not a neutral, borderless space. It is someone else's computer, in a specific jurisdiction, governed by specific laws, owned by specific entities with specific interests.

If your community's climate monitoring data sits on Alibaba Cloud, it is subject to Chinese data law. If it sits on AWS, it is subject to US law and corporate terms of service that can change without notice. If the political winds shift, and they do, access to that data is vulnerable.

This is not a hypothetical concern. The current US administration has already demonstrated that climate data held by federal agencies can be restricted or deleted. While data doesn't disappear entirely, access disappears, and inaccessible data is, for practical purposes, gone.

Building genuinely sovereign data infrastructure, community-owned servers, federated networks, regional cooperatives, is expensive, technically demanding, and slow. That's a real constraint. It's also not an argument against trying. It's an argument for resourcing it.

Let’s connect everything

Colonial resource extraction required communities to be dispossessed of land they had governed. Digital extraction requires communities to be dispossessed of knowledge they generated and data about their own lives and territories.

The argument that data scraping is "just how the internet works" is structurally identical to the argument that resource extraction was "just how trade worked." It describes a norm. It does not justify one.

Data sovereignty movements, Indigenous data governance frameworks, legal challenges to biopiracy, benefit-sharing agreements, cooperative data infrastructure are far from niche technical debates. They are the contemporary form of the same sovereignty arguments that dismantled formal colonial structures over the last century. They are slow, contested, and unfinished. They are also making concrete gains in certain communities through sustained organised effort.

The next article in this series will examine who has actually accumulated power over data infrastructure and what that means for the rest of us.

This article is part of a series expanding on the conversations from Panic with a Purpose, a podcast exploring data, AI, and justice for people and the planet. Episode 1 on Data Sovereignty, Democracy, and Ownership is out now.

Previous
Previous

What even is data, and why should you care?

Next
Next

Who's actually in charge? (Spoiler: not governments)