The contemporary discourse on data governance has been compromised by Data Idealism which approaches data as primarily a social and legal artifact. There are variations of data idealism, such as data ethics, "Free Flow with Trust", data decolonisation, data feminism, ethical AI development and others which basically suggest that much of the social, political, and economic consequences of our digital age can be managed through the mechanisms of transparency, fairness, and ethical alignment. This is a case of structural blindness. Here, I propose Data Realism as a necessary corrective, requiring a shift in focus from the ideas of computational equity to the more concrete realities of infrastructural ownership, standardization leverage, and strategic capacity to manage this non-fungible asset.
This requires recognising the following Five key tenets of Data Realism:
1) The world exists and data are our only contact with it.
To deny this is to abandon the epistemic project - data may be imperfect, shaped, mediated — but contact with the world nonetheless. Once we accept the fundamental role of data as interfaces to reality, Data Realism demands that broad, cost-effective access to data be made possible. The goal is to create and maximize the utility of data commons while minimising systemic risk. Today, AI companies and developers need clarity on what data they can use and how. A facilitative sourcing framework will remove the constant threat of litigation, allowing development teams to focus on quality and performance of their models rather than worry about legal risk management. The current public data bottleneck stifles competition in AI.
This focus on "making things work" means Data Realism advocates for policies that legalise the collection of publicly available data. The current legal ambiguity and ethicist shaming cripples startups. Surely, there have to be clear technical standards for scraping (rate limits, robots.txt adherence, mandatory anonymisation, exclusion of sensitive and non-essential data etc), but by lowering the cost of basic data access and creating data commons, states can forces AI companies to compete on superior modeling, contextual application, and algorithmic innovation — rather than on who is the biggest and baddest proprietary data hoarder.
In lieu of this, more public and private investment in curated, contextual public datasets are needed. These vetted datasets can lower the initial data sourcing cost for startups and create a standardized benchmark for model development, replacing expensive, ad-hoc, and legally risky scraping efforts. Regulatory policy here must mandate data sharing or standardized APIs for essential public-interest data held by natural monopolies by incentivising voluntary contribution of anonymised, high-quality datasets to open-source commons. Further if data are our contact with the world, an over-reliance on specific metrics can warp signals, so data realism also demands holistic reality capture that incorporates qualitative insights and a plurality of indicators.
2) Data exist with the world.
This implies that data production is situated and filtered through the environment - data are not an abstraction but a critical resource which come from somewhere, are made by someone, and are shaped by instruments, protocols, and power. Data not only represent but also enact realities - especially as more and more information systems are automated - data shape public discourse, inform policy, and modulate real human and machine behavior. The materiality of data infrastructures is far from ephemeral, data are stored, circulated, and maintained by physical systems that leave significant ecological and geopolitical footprint. Since every interaction leaves a trace - Data Realism demands we acknowledge that data is not simply "collected" but its genesis and production is infrastructured. When schools of data idealism focus on moral arguments about data, they also accept the infrastructural dominance of incumbent hegemons (and their ethical priorities) as an unchangeable premise, seeking to ameliorate the prevailing system of power rather than challenging its foundations.
Therefore, a successful Data Realist state must foster a "permanent view of politics" required to integrate the trajectory of global technological developments into its own strategic calculus, prioritize the development of sovereign technical and management standards, and explicitly link digital industrial geographies to national security goals. Just as the world has winners and losers, the digital society has data powers and data provinces.
3) The world leaks through.
Data realism is not a defense of surveillance, dashboards, spreadsheets, or technocratic governance. It is a defense of reality as something external to human discourse and design — something that can resist, surprise, and falsify our models. To that end, Data Realism rejects two dominant trends:
Naive Empiricism — the idea that data “speak for themselves,” that numbers are neutral, measurement is innocent. This view fails to account for context, bias, or interpretation.
Radical Constructivism — the view that data are nothing but power-laden constructs, shaped entirely by ideology, narrative, or positionality. This view erases the world and collapses epistemology into politics, often for sake of it.
A realist stance rejects both the blind faith in datafication and the nihilism of pure relativism. Data realism does not deny context, ideology, or structure. It insists that, even through those, the world leaks through. A temperature reading. A mortality rate. A vote count. These are not just narratives. They constrain us. To treat data as real is to take them seriously — not as final truth, but as our provisional contacts with the world. It is to ask what all this shows and means, not just who made it and why. Data therefore must be analyzed without idealization, where a commitment to the hard facts of data, even if inconvenient or ugly (e.g., showing inequality, corruption) is necessary and statistical gaslighting civilisationally poisonous. Data can be manipulated. But manipulation presupposes a baseline that can be distorted. Falsifying a vote count still depends on the idea of a real vote count. Censoring mortality rates still implies that there were deaths. To lie with data is to admit that truth matters, because at some level data are a non-negotiable reality that exist and operate independent of our political beliefs and moral aspirations. This means we must apply the highest scrutiny to the data used to train and test our systems, human or artificial.
4) Data drive agency in the world.
Data is not an end but an index of industrial, military, and academic capacity. A state’s true data capacity is measured not by the size of its population’s data footprint, but by its independent ability to standardize, store, and compute that data without reliance on external supply chains or governance frameworks. This requires a systemic integration of military, academic, and industrial objectives — a union of science with industry that treats digital technical standards as global public goods that must be wielded strategically, and not just consumed passively. Contrary to idealistic claims, national security is the ultimate policy engine driving data governance decisions at the level of states, with privacy and ethics serving as secondary and often negotiable constraints. The data policies of the hegemonic and emergent powers are fundamentally rooted in securing technological advantage. The task therefore is not to eliminate dependency through isolation, but to gain the necessary leverage in data systems to shape the rules of its game.
Idealistic data policies are politically naïve because they assume consent and cooperation in an anarchic system. Consider the G7's DFFT narrative, for instance, which is often projected as a universal good, but is mostly an elegant rhetoric of "free flow with trust" that uses an abstract legal promise as a mask to hide the concrete realities of global political controls left unacknowledged. A Data Realist state, therefore, must subject all policies to a rigorous test of falsifiability: Does this policy measurably increase sovereign capacity and reduce structural dependency, or does it merely achieve moral compliance with the dominant/external powers that be? Data Realism thus demands a meaningful shift from the judicial-police state (focused on making and enforcing laws) to the structural state (focused on building and owning digital capability and metapolicy spaces).
5) Navigating the world with data requires pragmatism.
Data realism is commitment to practical ethics, not idealisms. It eschews notions of diagonally opposite left/right systems. It is a philosophy of effective technological acceleration and not of technological pessimism. Practical ethics require direct and immediate confrontation with ethical necessities of data flows - make the methods of collection, cleaning, modeling, and interpretation as transparent as possible to those who are affected by the resulting decisions - but beyond a right to audit and redress, data stewards should not have to bother with projecting desires of how the world ought to be into their data pipelines. Data ownership is the ownership of Truth, and thus carries a responsibility to protect and de-risk the data in their care, and if required, transfer that ownership for systemic continuity. The primary task of governace here is thus not to make data and systems ethical, but to confront and master the measurable structural facts of computation, ownership, and capability.
As almost everyone knows, the world and its governments are secretly run by accountants. This implies that data should go through continuous assessment for depreciation or appreciation. Once you put a number on the decay or change in data's subjective value due to context shifts and bias development, it will better incentivise the appropriate and timely flow of organisational resources to update and responsibly address the state of its data pipelines - as necessary to hold on to the realism in data. The financial incentives for better data governance and lifecycle management, reducing long-term infrastructural and technical debts, should thus be made explicit and immediate to the accounting elites.
To conclude, as AI and automated systems gain more and more leeway into human affairs, this manifesto calls for embracing Data Realism, a philosophy anchored in the undeniable existence of the world and data's fundamental role as our provisional contact with it. It is a mandate to recognise that in a geopolitically volatile world, reliance on external, proprietary, or geographically constrained data sources can be a major systemic vulnerability - and argues for a strategic shift toward data resilience, reliability and sovereignty to guarantee uninterrupted operational continuity of our digital lives regardless of external regulatory or political pressures.