April 11, 2024 | OPINION | By Clay Arnold

As artificial intelligence advances, there is real concern about the potential consequences of releasing these systems into the world. While AI has the potential to revolutionize our lives, we may also want to take a second to consider some of its more pervasive effects on our information space. One of the issues with current AI models is their reliance on the likelihood of outcomes. These models are trained to predict the most probable output based on the input data, which leads to both unpredictable behavior and bias. 

While techniques like ‘reinforcement learning from human feedback’ (RLHF) mitigate these issues, they have limited effectiveness in ensuring the overall accuracy of AI systems. For RLHF, which is a fundamentally resource-limited process, it is impossible to remove all bias from a sufficiently large input. RLHF training aimed at enhancing one aspect of an AI model’s performance may inadvertently compromise other desirable characteristics. For instance, efforts to improve adherence to increasingly specific alignment rules might come at the cost of reduced accuracy in coding tests, a trade-off that users and stakeholders may deem undesirable. 

While RLHF can certainly guide AI systems towards more favorable behaviors, it is not a panacea. At the risk of sounding repetitive, it is worth covering another point of concern: AI models are not inherently designed to prioritize accuracy above all else. Instead, they are trained to provide outputs that are either correct or, more troublingly, convincingly incorrect. This issue arises because RLHF relies on human feedback, and humans themselves can inadvertently reinforce plausible but false information. 

“Convincingly wrong” is in some ways worse than just plain wrong, humorously enough, because we are training our models to effectively deceive us. Despite the efforts to bring AI models to alignment, it has been demonstrated that each of the current top-performing models, including Claude, GPT and Gemini, have been “jailbroken” to some extent. Jailbreaking refers to the process of bypassing the safety constraints imposed on AI models, allowing them to generate content deemed offensive or unintended. 

In a strange twist of fate, it seems like the ability for models to learn — as well as the larger context window size — is a significant reason for the jailbreakability of these newest models. This is notable because it points to the idea that we cannot create models that are both as good as we can make them and that do not have this recent type of vulnerability. 

Another signal that consumers will end up with large models that are unencumbered by alignment constraints is that it is a dominant market strategy for the second-place modelmaker to produce open-source models. Open-source models are essentially already jailbroken, so I believe it is inevitable for us to conclude in an end-state with models as powerful as the most potent models we have, without the alignment constraints. 

The fact that even our best models can be manipulated is concerning. If AI systems can be compromised, it is challenging to trust their outputs, since the ability of AI to create false information makes it a useful tool for those wanting to spread misinformation. 

The ease with which AI models can generate convincing, yet factually incorrect content, drastically reduces the cost and effort required to create and disseminate misinformation. Suppose that a human can produce a thousand tokens for ten dollars, a rate that is purely coincidental and not at all related to the author’s remuneration for this piece. 

Under this assumption, large language models (LLMs) are already between four and seven orders of magnitude less expensive than their meat-based counterparts on a per-word basis. This low barrier to entry could lead to an unprecedented surge in the spread of false information, causing significant harm to individuals and society as a whole. 

We have already witnessed the impact of AI-generated misinformation in various domains. For example, during political campaigns, AI has been used to create fake news articles and social media posts, potentially influencing public opinion and swaying election outcomes. 

In the realm of public health, AI-generated misinformation about vaccines and treatments has contributed to the spread of conspiracy theories and mistrust in scientific authorities. The proliferation of AI-generated misinformation dilutes the overall pool of information available to the public. As more false information is injected into the information ecosystem, it is more difficult for people to distinguish between credible versus misleading content. 

As I discussed in “The Attack on Expertise” this erosion of trust in information can have far-reaching consequences, from undermining democratic processes to hampering public decision-making. Now, while we should embrace new technology, it would be prudent to remain vigilant in addressing its limitations and risks. Only with a balance can we ensure that AI serves as a positive force, as opposed to a down-pressure on our ability to interface with the truth.

Leave a Reply