Recently there’s been an explosion of content about OpenAI’s ChatGPT and other related technologies. With this comes the worry that there will be more AI-generated content in the vast sea of the internet. We’ve even had one of our Managing Partners, Brittany Hunter, create a blog post using only ChatGPT.
Detecting AI-Generated Content
To help detect whether a sample of text is real or fake, a group of researchers created a piece of software called Giant Language Model Test Room or GLTR. This simple-to-use piece of software lets you input text. It then generates a series of charts to help you determine the uniqueness of the text you provided. GLTR does this by predicting how likely each word would be chosen based on the word preceding it.
I wanted to see what this software could do, so I asked some of my fellow Atoms to tell me about their favorite food. The responses I got were artfully written and great examples of what “human-generated” content could look like. I then gave ChatGPT a prompt that I thought would get me a response close to the ones I received from the Atoms. The difference between the AI-generated and Atom-generated content was staggering and the charts agreed.
Figure 1: AI-Generated Content
Figure 2: Atom-Generated Content
The first figure shows the GLTR analysis of the AI-generated content, and Figure 2 is an analysis of the Atom-generated content. The green bar represents words it would predict to be in the top 10. Yellow represents the top 100, red represents the top 1,000, and violet is for words that fall outside of the top 1,000. More green is bad, and more purple is good. As we can see, there are fewer than 10 “rare” words in the AI-generated content and over 30 in the Atom-generated content.
Content by Real Humans
Here’s what one of the Atoms, Rachael McQuater, responded with when prompted about their favorite food:
“A perfect croissant is a feat of human engineering. Every time I eat one, I think on how our most important developments as a species have brought this little architectural marvel to my hands. I savor the rich smell of deeply toasted wheat, leavened with yeast that we’ve learned to culture over thousands of years, fluffed into delicate honeycombs by the evaporation of water in the fattiest butter the baker could get their hands on. I have a whole sacred process: peeling off the highest, crispiest arch to save for last; crunching through the corners that have fried in the butter that melted off the laminated edges; pulling apart stretchy layers from the soft insides one by one.”
The prompt that I used for ChatGPT was “write a short paragraph about how good a croissant is.” The response that the AI software generated was:
“A croissant is a type of French pastry made from laminated dough, which is a dough made by repeatedly rolling and folding butter into flour. The result is a pastry that is flaky and buttery, with a crisp crust and a delicate, airy interior. Croissants are often served for breakfast and can be filled with various sweet or savory ingredients, such as chocolate or ham and cheese. They are considered to be a delicious and indulgent treat, and are enjoyed by many people around the world.”
Anyone who reads this can tell that Rachael’s content is more passionate than the ChatGPT croissant content.
Detect AI-Generated Content with GLTR
While GLTR isn’t a tool that will tell for certain whether a piece of text is AI-generated, it can certainly help. If you ever wonder about a bland or boring website, then copy and paste a chunk of the prose into GLTR and do some quick analysis.