Challenges of Detecting AI-Generated Textual content | by Dhruv Matani | Sep, 2023

Now we have all of the elements we have to verify if a bit of textual content is AI-generated. Right here’s the whole lot we want:

  1. The textual content (sentence or paragraph) we want to verify.
  2. The tokenized model of this textual content, tokenized utilizing the tokenizer that was used to tokenize the coaching dataset for this mannequin.
  3. The skilled language mannequin.

Utilizing 1, 2, and three above, we will compute the next:

  1. Per-token chance as predicted by the mannequin.
  2. Per-token perplexity utilizing the per-token chance.
  3. Whole perplexity for your complete sentence.
  4. The perplexity of the mannequin on the coaching dataset.

To verify if a textual content is AI-generated, we have to evaluate the sentence perplexity with the mannequin’s perplexity scaled by a fudge-factor, alpha. If the sentence perplexity is greater than the mannequin’s perplexity with scaling, then it’s most likely human-written textual content (i.e. not AI-generated). In any other case, it’s most likely AI-generated. The explanation for that is that we anticipate the mannequin to not be perplexed by textual content it will generate itself, so if it encounters some textual content that it itself wouldn’t generate, then there’s purpose to consider that the textual content isn’t AI-generated. If the perplexity of the sentence is lower than or equal to the mannequin’s coaching perplexity with scaling, then it’s probably that it was generated utilizing this language mannequin, however we will’t be very certain. It’s because it’s attainable for a human to have written that textual content, and it simply occurs to be one thing that the mannequin might even have generated. In any case, the mannequin was skilled on a whole lot of human-written textual content so in some sense, the mannequin represents an “common human’s writing”.

ppx(x) within the system above means the perplexity of the enter “x”.

Subsequent, let’s check out examples of human-written v/s AI-generated textual content.

Examples of AI-generated v/s human written textual content

We’ve written some Python code that colours every token in a sentence primarily based on its perplexity relative to the mannequin’s perplexity. The primary token is all the time colored black if we don’t contemplate its perplexity. Tokens which have a perplexity that’s lower than or equal to the mannequin’s perplexity with scaling are colored crimson, indicating that they might be AI-generated, whereas the tokens with greater perplexity are colored inexperienced, indicating that they had been positively not AI-generated.

The numbers within the sq. brackets earlier than the sentence point out the perplexity of the sentence as computed utilizing the language mannequin. Observe that some phrases are half crimson and half blue. This is because of the truth that we used a subword tokenizer.

Right here’s the code that generates the HTML above.

def get_html_for_token_perplexity(tok, sentence, tok_ppx, model_ppx):
tokens = tok.encode(sentence).tokens
ids = tok.encode(sentence).ids
cleaned_tokens = []
for phrase in tokens:
m = record(map(ord, phrase))
m = record(map(lambda x: x if x != 288 else ord(' '), m))
m = record(map(chr, m))
m = ''.be a part of(m)
html = [
for ct, ppx in zip(cleaned_tokens[1:], tok_ppx):
shade = "black"
if ppx.merchandise() >= 0:
if ppx.merchandise() <= model_ppx * 1.1:
shade = "crimson"
shade = "inexperienced"
html.append(f"<span fashion='shade:{shade};'>{ct}</span>")
return "".be a part of(html)

As we will see from the examples above, if a mannequin detects some textual content as human-generated, it’s positively human-generated, but when it detects the textual content as AI-generated, there’s an opportunity that it’s not AI-generated. So why does this occur? Let’s have a look subsequent!

False positives

Our language mannequin is skilled on a LOT of textual content written by people. It’s typically exhausting to detect if one thing was written (digitally) by a selected particular person. The mannequin’s inputs for coaching comprise many, many various kinds of writing, probably written by numerous folks. This causes the mannequin to study many various writing kinds and content material. It’s very probably that your writing fashion very carefully matches the writing fashion of some textual content the mannequin was skilled on. That is the results of false positives and why the mannequin can’t make sure that some textual content is AI-generated. Nevertheless, the mannequin can make sure that some textual content was human-generated.

OpenAI: OpenAI not too long ago introduced that it will discontinue its instruments for detecting AI-generated textual content, citing a low accuracy price (Supply: Hindustan Times).

The unique model of the AI classifier software had sure limitations and inaccuracies from the outset. Customers had been required to enter a minimum of 1,000 characters of textual content manually, which OpenAI then analyzed to categorise as both AI or human-written. Sadly, the software’s efficiency fell brief, because it correctly recognized solely 26 p.c of AI-generated content material and mistakenly labeled human-written textual content as AI about 9 p.c of the time.

Right here’s the blog post from OpenAI. It looks like they used a distinct strategy in comparison with the one talked about on this article.

Our classifier is a language mannequin fine-tuned on a dataset of pairs of human-written textual content and AI-written textual content on the identical matter. We collected this dataset from quite a lot of sources that we consider to be written by people, such because the pretraining knowledge and human demonstrations on prompts submitted to InstructGPT. We divided every textual content right into a immediate and a response. On these prompts, we generated responses from quite a lot of completely different language fashions skilled by us and different organizations. For our internet app, we modify the arrogance threshold to maintain the false optimistic price low; in different phrases, we solely mark textual content as probably AI-written if the classifier may be very assured.

GPTZero: One other standard AI-generated textual content detection software is GPTZero. It looks like GPTZero makes use of perplexity and burstiness to detect AI-generated textual content. “Burstiness refers back to the phenomenon the place sure phrases or phrases seem in bursts inside a textual content. In different phrases if a phrase seems as soon as in a textual content, it’s more likely to seem once more in shut proximity” (source).

GPTZero claims to have a really excessive success price. In response to the GPTZero FAQ, “At a threshold of 0.88, 85% of AI paperwork are categorised as AI, and 99% of human paperwork are categorised as human.”

The generality of this strategy

The strategy talked about on this article doesn’t generalize properly. What we imply by that is that in case you have 3 language fashions, for instance, GPT3, GPT3.5, and GPT4, then you could run the enter textual content by means of all the three fashions and verify perplexity on all of them to see if the textual content was generated by any one in every of them. It’s because every mannequin generates textual content barely in another way, and so they all must independently consider textual content to see if any of them could have generated the textual content.

With the proliferation of huge language fashions on the planet as of August 2023, it appears unlikely that one can verify any piece of textual content as having originated from any of the language fashions on the planet.

In actual fact, new fashions are being skilled day-after-day, and attempting to maintain up with this fast progress appears exhausting at greatest.

The instance beneath reveals the results of asking our mannequin to foretell if the sentences generated by ChatGPT are AI-generated or not. As you may see, the outcomes are combined.

The sentences within the purple field are accurately recognized as AI-generated by our mannequin, whereas the remaining are incorrectly recognized as human written.

There are various the reason why this may occasionally occur.

  1. Prepare corpus dimension: Our mannequin is skilled on little or no textual content, whereas ChatGPT was skilled on terabytes of textual content.
  2. Information distribution: Our mannequin is skilled on a distinct knowledge distribution as in comparison with ChatGPT.
  3. Tremendous-tuning: Our mannequin is only a GPT mannequin, whereas ChatGPT was fine-tuned for chat-like responses, making it generate textual content in a barely completely different tone. If you happen to had a mannequin that generates authorized textual content or medical recommendation, then our mannequin would carry out poorly on textual content generated by these fashions as properly.
  4. Mannequin dimension: Our mannequin may be very small (lower than 100M parameters in comparison with > 200B parameters for ChatGPT-like fashions).

It’s clear that we want a greater strategy if we hope to offer a fairly high-quality end result to verify if any textual content is AI-generated.

Subsequent, let’s check out some misinformation about this matter circulating across the web.

Leave a Reply

Your email address will not be published. Required fields are marked *