The Future of Truth
Combining new data primitives with validity models to create AI engines that fight for us
This is an essay about truth. Not the truth itself, like a dramatic expose or manifesto, although I have done that elsewhere. This is about the dry part- the concept of truth, how we measure it, and the consequences. Because there is nothing dry about the consequences. Think literal blood, sweat and tears. Human lives. To understand why the truth is so important, consider the age old struggle between truth and power.
Why is the truth a threat to power?
Because it’s not always. If you ran for class president and won, the truth of how many votes everyone got wouldn’t be a threat at all. But let’s say hypothetically you spread a vicious lie about your opponent the day before the election. Now the truth is a threat. If everyone found out, you could lose your title, and certainly your reputation. And there lies the answer- truth is a threat because power is often gained by deceit. And the more power one has, the easier it becomes to deceive. Thus a dangerous cycle forms, often consolidating into totalitarian regimes enforcing a vice grip on information like we currently see in China and Russia.
In our class president example, the truth can be thought of as representing an ethical dimension of right and wrong, which carries power because ultimately it’s the consent of your class that provides you power. People can be bullied into consent, or manipulated. But as long as they know the truth, they will naturally seek ways to resist it, because most people believe in what is fair. In many ways our ethical nature, and our compassion, is what makes us distinctly human.
This may strike you as well established political science, but I have covered it again here because the age old struggle between truth and power may be entering its final chapter. We are in the nascent stages of a digital revolution, and our lives are transitioning quickly into the virtual realm. But we aren’t the only ones changing; our machines are changing too. The rise of artificial intelligence over the last year has been, quite frankly, frightening in scope. For the first time we are witnessing AI’s that challenge and even exceed traditional human capabilities like art and writing. For the first time it feels like the machines are thinking.
Such advanced capability means we are about to be surrounded by these things. They will soon be answering our phone calls, serving us food, driving our cars, and assisting with pretty much any task you can imagine. Think teachers. Lawyers. Judges. Politicians.
Cops.
How will these machines view the world? What will be the truth as they see it? AI systems run on behalf of authoritarians could provide them a level of power and control never seen before. Conversely, AI systems capable of validating information could be a solution to unlocking and dismantling long standing systems of oppression. As the power of our technology grows, so do the stakes. It’s no less than a battle for the future, dystopia vs utopia, with machines deciding the outcome.
So let’s talk about truth.
The 3 Types of Truth
The concept of truth is a deep rabbit hole so for simplicity I will define three categories:
1) Logical Truth (math)
2) Real Event Truth (facts)
3) Hypothetical Truth (opinions)
Logical Truth
Math is probably the best example of logical truth, where for instance the arithmetic expression 1 + 1 = 2 evaluates as true. This is generally a type of truth that we all agree on, and even those that don’t are surrounded by airplanes and computers and myriad operations of daily life that testify to the rigid laws of math and physics. There are caveats here, such as quantum physics, but for the purposes of this essay and the goal of building a better future for humanity, such caveats are irrelevant. Put differently, if anyone questions the existence of reality, the next question should be- what are they trying to hide?
Real Event Truth (facts)
As any criminal lawyer will attest, there is a big difference between knowing something happened and proving it. For example, say you went down to the corner store and bought a six pack of beer. How would you prove that you did that? The receipt isn’t enough, because someone else could have used your card. The clerk isn’t quite enough, because people are notoriously bad at remembering things. Even video footage can be manipulated. Ultimately the truth of any real world event, in an objective sense, is impossible to know. As a solution, humans have adopted a probabilistic framework for establishing event truth. For example if the clerk remembers you at a certain time, which matches your credit card receipt, and the CCTV, the likelihood that you were there is very strong. So we consider it true.
Outside of a courtroom, this probabilistic truth framework gets more shaky. For example when we consume news, nobody has the time to fact-check sources, so we just trust the authority of common media corporations or institutions. And it’s a bad system. To give just one example, in 2002 the entire US population was told that Iraq was stockpiling weapons of mass destruction, which turned out to be completely false. More than a million Iraqis were subsequently killed because of this falsehood, but for much of the military industrial complex and their political counterparts, the war was a bonanza of cash profit and geo-political power.
The divergence between our event information and the event truth has gotten so bad that our current period has been labeled the post-truth era. Perhaps the quintessential quote describing the ongoing manipulation of public perception by those in power came from a white house aide during the Bush administration, rumored to be Karl Rove:
The aide said that guys like me were ‘in what we call the reality-based community,’ which he defined as people who ‘believe that solutions emerge from your judicious study of discernible reality… That’s not the way the world really works anymore,’ he continued. ‘We’re an empire now, and when we act, we create our own reality. And while you’re studying that reality — judiciously, as you will — we’ll act again, creating other new realities, which you can study too, and that’s how things will sort out. We’re history’s actors…and you, all of you, will be left to just study what we do’
Decades later this type of gloating about the irrelevance of discernible event truth has evolved into a contentious public dialogue over “fake news” and a general erosion of trust in American media and institutions. The rise of social media has ratcheted up the distrust even more, because suddenly ordinary people have been given a voice, and they are using it to fairly decry our untrustworthy and unaccountable institutions. Simultaneously, the inclusion of ordinary people has ushered in a flood of alternative viewpoints, which are often bogus too.
In short, the landscape of event truth is a chaotic mess right now. We consume more information than ever, by orders of magnitude, and our probabilistic truth frameworks have been mangled into nihilistic doubt engines where nothing can be believed. It’s an improvement over accepting lies with no reservation, but overall not a good situation, as evidenced by our many broken systems.
Hypothetical Truth (opinions)
This last one merely combines logical truth and event based truth together. For example in theory if I eat, I will no longer be hungry. This is a hypothetical statement that evaluates as likely true, but the variables are messy real world concepts like eating and hunger, so it’s not a clean logical expression. And it also hasn’t actually happened, so it’s not a real event truth. Another example- if we remove money sucking insurance companies from the American healthcare system, our healthcare will improve and at lower cost. This is almost certainly true, as many other countries have proven, but structurally speaking, it’s a theoretical statement that combines abstract logic with real world events. Another common word for this is an opinion, where some opinions have a strong basis (probable facts and sound logic) and others none whatsoever.
To pull this all together, note the following passage from political philosopher Hannah Arendt where she establishes the distinction between event based truth (facts) and hypothetical truth (opinions):
But do facts, independent of opinion and interpretation, exist at all? Have not generations of historians and philosophers of history demonstrated the impossibility of ascertaining facts without interpretation, since they must first be picked out of a chaos of sheer happenings (and the principles of choice are surely not factual data) and then be fitted into a story that can be told only in a certain perspective, which has nothing to do with the original occurrence? No doubt these and a great many more perplexities inherent in the historical sciences are real, but they are no argument against the existence of factual matter, nor can they serve as a justification for blurring the dividing lines between fact, opinion, and interpretation, or as an excuse for the historian to manipulate facts as he pleases. Even if we admit that every generation has the right to write its own history, we admit no more than that it has the right to rearrange the facts in accordance with its own perspective; we don’t admit the right to touch the factual matter itself.
Bottom line- although reality is difficult to measure and interpret, that doesn’t make it any less real. So let’s talk about measurement.
Current Methods of Truth Measurement
The current state of event truth measurement in modern society can be boiled down to one thing:
Trust.
That’s right, basically all of our current systems rely on trust. For example what recourse would you have if your bank decided to steal all your money? What proof would you have? All the information about your money is stored in a database that they have complete control over. Deleting your records or creating a fake withdrawal entry would be trivial for them. So right now, all your money is secured by trust. And it’s not just your bank- the entire financial system is run by central bankers who we trust to create appropriate amounts of new money and distribute it fairly (side note: neither is true). The pseudonymous creator of Bitcoin introduced his invention with this exact point:
The root problem with conventional currency is all the trust that’s required to make it work. The central bank must be trusted not to debase the currency, but the history of fiat currencies is full of breaches of that trust. Banks must be trusted to hold our money and transfer it electronically, but they lend it out in waves of credit bubbles with barely a fraction in reserve. We have to trust them with our privacy, trust them not to let identity thieves drain our accounts. -Satoshi
Except it’s not just banking, it’s everything. Large media corporations are currently the most “trusted” source for real event truth, but only because of their primacy, not their accuracy. Even so-called fact checking websites are merely forming their own opinion using sources they personally consider reputable, which requires us to trust them, and trust their sources. Wikipedia usually leans on official state sources on controversial matters, which is problematic of course, and when readers raise objections their solution is bring in a council of wise men.
On Twitter, users have the option to question veracity via community notes, but even there, the sourcing is simply whoever the author of a particular note trusts. One key difference is that users are allowed to vote on the validity of the notes, which introduces an element of democratic rule in truth mediation. Generally speaking, I support this because a democratic system is preferable to just letting those in power decide. But it’s still not a great solution, because the majority is often wrong, and those in power have profound influence on what a majority of people think. For example back in 2003, if you asked Americans whether Iraq had weapons of mass destruction, a majority would have said yes.
Event truth, as it currently stands, is a giant web of trust, with various levels depending on your personal belief system. Take for example the notion of “experts”, where a jury might consider the opinion of a domain expert to be more trustworthy than a non-expert. This approach sorta works, but the issue again is that people are prone to lying, and many experts in court trials have been later revealed as either flat wrong or entirely compromised. Same goes for institutions. In academia funding dollars will often dictate or manipulate research output. Does tobacco cause cancer? Here are a million reasons why not. Ask yourself, can any institution closely tied to the Chinese government actually be trusted? Similarly, the NIST, CDC, NSA, FBI, CIA etc. are often cited by ChatGPT. But what would ChatGPT have said about WMD’s in Iraq in 2003? Regardless of one’s opinion on our institutions, the fact that they could be compromised means that we have to treat them as a potential vector of untruth.
In any trust model, human beings are probably the worst source you can possibly find. As an example of alternative sourcing, let’s imagine a police report that details the death of a suspect in custody due to a drug overdose, but later a video clip surfaces showing the officer suffocating the victim to death. Between the competing pieces of information, the video would be considered more trustworthy because video is a harder to fake, and much closer to physical evidence than the police officers testimony. That said, physical evidence is also far from perfect, because of how much it relies of the custody of untrustworthy people. First you have the person who collected it, then the person who analyzed it, the person who reported on it, the people who reported on that, etc.. Again, here is Hannah Arendt to tie this all together.
In other words, factual truth is no more self-evident than opinion, and this may be among the reasons that opinion-holders find it relatively easy to discredit factual truth as just another opinion. Factual evidence, moreover, is established through testimony by eyewitnesses — notoriously unreliable — and by records, documents, and monuments, all of which can be suspected as forgeries. In the event of a dispute, only other witnesses but no third and higher instance can be invoked, and settlement is usually arrived at by way of a majority; that is, in the same way as the settlement of opinion disputes — a wholly unsatisfactory procedure, since there is nothing to prevent a majority of witnesses from being false witnesses.
If only there was a system that didn’t rely on trust.
The Invention of Trustless Systems
Many people see “crypto” as just a slow database and dismiss it as effectively useless, or worse, a Ponzi scheme. But from what I can tell, blockchain systems have solved the puzzle of creating a trustless system for the first time in human history. And in the context of what I wrote above, such systems may be critical to the future of humanity.
So let’s unpack that. First, what is a trustless system?
Blockchains are software applications that work by distributing many different copies of a shared ledger, ideally across the world and many thousands of computers. For example if I send you some Bitcoin, the transaction is broadcast to everyone in the network, verified as valid, and then transcribed as an update to the shared ledger. Notably, once the transaction has been written to the shared ledger, it is effectively immutable, meaning it can never be changed. This critical quality of immutability is achieved using cryptographic hashes, where each new set of entries is encrypted into a “block” that incorporates the prior block hash, forming a long chain. This is why its called a blockchain, and why its so hard to change the history. Changing just a single penny from a transaction 10 years ago would trigger a giant cascade of mismatched hashes, and the network would easily identify and trash your version.
Note that it’s not an entirely trustless system, in the sense that it does depend on the (ideally) many thousands of people running it. Also it can be attacked if someone were to control more than 50% of the ledger. But the difficulty for achieving 50% control of a well distributed system is massive, especially if there is a financial element tied in like a cryptocurrency (please try!). Thus in practice these systems are considered effectively trustless, because they don’t require you to trust a singular entity like you do with all our other systems.
Consider the example I gave above where your bank can simply delete the money in your account. If your money was on a blockchain, not only could the bank not do that, nobody could do that- not criminals, not the government, nobody except you. In a blockchain system, only the person who holds the private key associated with funds can transact with them. When it comes to money, the characteristic of personal autonomy is incredibly important, but for the purpose of this essay I am going to focus on the characteristic of immutability. Because ultimately blockchains aren’t limited to tracking money- that was just Satoshi’s first, wildly successful experiment. Blockchains can also be used to store any type of information.
To give just a few examples of information that would be useful to have stored immutably:
• Votes
• Ownership Records (your home, car, etc.)
• Basic data about the world (weather, price indexes, census data, etc.
• Custody/Supply Chain Records
• Health Records
• Academic Records
In the US we take the reliability of these records for granted, because our bookkeeping systems tend to function. But in many other countries, records like this are far from dependable. For example when the Taliban regained control of Afghanistan, what do you think happened to the academic records of women? How about their ownership records? Ultimately a correctly implemented blockchain system introduces a brand new truth primitive data structure. It’s more reliable than a human being, and also doesn’t rely on a series of custodial trust-based entities like physical evidence does. Of course just because something is on a blockchain doesn’t mean it’s correct. Lies can be encoded onto the chain just the same. And it doesn’t mean the Taliban won’t take your house. But it does mean that what was put there will remain there, immutably, and in context of the battle of truth versus power, that’s a big deal.
I‘m not sure anyone has described this idea so succinctly as Balaji Srinivasan, who wonderfully labels all our current data as fiat information, meaning it’s true because someone powerful said so. In this presentation Balaji explains that our information supply chain is fundamentally broken, and how this new truth primitive can be used to form a new, credible base layer of real event truth. Granted, that only addresses part of the problem- the event truth layer. What about the hypothetical truth? Can this new truth primitive help there too? I believe the answer is yes.
Man vs Machine
Before I cover this final point, it’s important to reiterate that humans are really bad at remembering things. Our short term memories are known to max out at around 7 concepts, and our long term memories are notoriously distorted and patchy. As a workaround, our brains take all sorts of shortcuts, which is why we’re so subjective in our opinions. The impact of these limitations cannot be overstated- every single conversation we have is brutally uninformed and restricted in scope, and it’s even worse if the dialogue is out loud.
Consider the broad implications of this in combination with what I covered earlier about poor event truth measurement:
• We’re never quite sure of what is happening or has happened
• We forget everything almost immediately
• We trust those given authority to sort it all out
Our entire our society is shaped by these “qualities”, but one consequence in particular that I want to highlight is the complete lack of accountability that we currently consider normal. For example, after lying us into the Iraq war, what consequences did the New York Times face? After legalizing torture and then overseeing the waterboarding of innocent Iraqis, what consequences has George W. Bush faced? Last I checked even liberals were fawning over Bush and yearning for the “decorum” of his era, because everyone has forgotten how awful he really was. This is colloquially known as the memory hole, but is actually applicable outside our brains too, in that throughout history very little data was ever truly “saved”. For example we only have a record of certain events because one woman decided to record 30 years of continuous television onto VHS tapes.
In contrast, a machine literally never forgets anything. Ever. And a machine can reference trillions of historical data points in seconds. When it comes to quantitative processing, there really is no comparison, and the machines are only getting started. So consider what the implications on society might be if the following were to be true:
• Almost everything is recorded
• Nothing is forgotten
• Nothing is assumed to be honest
Now we are ready to discuss the validity model.
The Validity Model
A validity model is something I am proposing, and would basically be a way for machines to assess the validity of any statement or idea. Machine learning is already very much about assigning “weights” to data, so this would be very similar in function. The difference is that most training models currently evaluate towards a different goal. For example a Large Language Model like GPT uses weights to predict the next best word or phrase in a written response. In a validity model the weights would be based on the credibility of a individual assertion, meaning a prediction of how true it is.
The credibility or validity of any assertion is entirely dependent on a series of related ideas. So the core concept of such a model would be to determine the validity of any particular statement by quantifying the validity of its dependencies.
For example if I posit that a circle is round, the dependent ideas would be:
• What a circle is [Geometry, etc.]
• What round means [Geometry, etc.]
• What “is” means [Logic]
Validity probability: 100%
That’s a pretty easy one because I chose a logical truth, but by adding up the validity of the underlying premises, you can create a validity score for any statement. Next let’s make it harder and examine the assertion that Iraq currently has weapons of mass destruction. The related ideas would be:
• What Iraq is
• What WMD’s are
• What the physical evidence is, including:
-Who collected it
-Who maintained it
• What the circumstantial evidence is, including:
-Who said it
-The history of Iraq’s capability in this area
-The current state of Iraq’s scientific capability in this area
Validity probability: 7%
The entirety of human history and current affairs could be graphed like this, forming a map of all recorded events and hypotheses. Remember, this is a task for a machine, not a human, so the quantitative complexity is not an issue. If we assume the existence of such a mechanism, there are a few notable characteristics:
Liveness
The graph would be constantly updating based on real-time data collection. This type of liveness would be especially fascinating because of the predictive nature of the system, where many of the probabilities would be constantly shifting in response to incoming data from reality. The constant feedback would also continually improve the accuracy of the model.
Data Collection and Data Integrity
The input data to such a system would have all sorts of different sources or feeds, such as the New York Times, or the FDA, or your friends blog. But among all the feeds, any information sourced from systems that are difficult to manipulate (eg. blockchains) would be scored much higher than say, something someone important said. Note that a blockchain-based mechanism providing information about the real world (eg. Sports scores) is referred to as an “Oracle”, and Oracles often incorporate a series of redundant checks and balances to improve accuracy and stability.
Accountability
All hypothetical truths are actually predictions. So the accountability property of such a system within everyday discourse would be absolutely beautiful. The entirety of the public life of an individual would be treated no differently than any other entity on the graph, meaning all their actions would be tracked and assessed. Note this is not the same as the dystopian “social credit score” which is entirely subjective. Rather, this would be an objective measure of an individuals actions, such as the validity of their public statements and recorded public actions. The same applies to publications and institutions.
Objectivity & Controversy
The system would not be particularly kind to religion, because to this day there is no tangible evidence for the existence of god beyond an emotional insistence. But that would actually be a great thing, because throughout history religion has wrought incredible damage and suffering. The ability to enforce objectivity in matters of deep importance to human suffering would be an incredible bit of progress. Even so, the machine would no doubt be deeply controversial, but in a good way. It’s a good thing that we are debating fake news. Those conversations elevate us, and the algorithmic nature of the system will insulate it.
Security
If such a system were to exist, its natural habitat will likely be a blockchain system to ensure that its own data not be manipulated. And by “its own data” I am not just referring to the large set of real world data being collected and processed, but also the code used to run itself. Without a system that provides security guarantees, a supposed validity model will just be a dressed up version of the legacy trust based system.
Incremental Construction
Obviously mapping the entirety of human history and current affairs isn’t currently feasible. But the concept of validity still matters at a smaller scale, so the path towards the generalized model will likely be through various smaller applications. For example the predictive nature of the system will lend itself nicely to gambling, trading stocks, political races, or really anything that people aren’t sure about before it happens. For example you could say, “I predict X will happen” and the mechanism will troll millions of experts and hard data points and then spit out a mathematically calculated chance you’re right.
Conclusion
Something like the validity model like I have described will be necessary to provide credibility to our AI systems. By combining blockchain based real event truth feeds with an open source validity model, our machines could become orders of magnitude more trustworthy than anything that currently exists. And although such a powerful system may scare you, it doesn’t scare me nearly as much as allowing AI’s to exist without such a system. Consider that right now your identity is already under constant surveillance, and the guy who acted to expose the truth about that (Snowden) is currently hiding in exile. Right now, those in power can see all your texts, your emails, your google searches, listen to your phone calls, monitor your location, etc. You already have a digital profile being maintained across both federal and private entities, and most of it is stuff you would consider private.
The validity model I’m envisioning would strictly use public information, and importantly, operate as an open system. Every single Bitcoin transaction can be viewed by everyone, and the code too. Similarly, this would be open source and fully transparent, because anything less would add a layer of trust antithetical to its entire purpose. Think of it like a big accountability force field that nobody can hide from, including those in power. It’s a system that we should build, and then demand, because authoritarians everywhere will certainly resist such an indiscriminate truth telling machine. But that’s exactly why it’s so important.
The wonderful part of this entire story is that the truth is on our side. The coming technology will be potently powerful, and to the extent that we can leverage the machines to amplify truth and not power, they represent a unique opportunity to extricate people across the world from oppression and suffering.
And breaking the chains of poverty and exploitation will just be the start. From extending the human lifespan to exploring new planes of existence, the marriage of man and machine has the potential to elevate our species to a level of utopian existence beyond your wildest imagination.
Validity probability: 62%