AI text watermarks: can they help detect AI-written content?

Summary:

AI text watermarking involves embedding subtle linguistic patterns or signals into AI-generated text, allowing automated detection to verify whether content was machine-written.
Major AI companies like OpenAI and Google are actively researching watermarking solutions to help educators and platforms detect AI-generated content, potentially curbing academic dishonesty and online misinformation.
Despite its promise, watermarking faces challenges including easy evasion through text editing, the need for widespread adoption, and concerns about fairness and unintended consequences, limiting its current effectiveness as a foolproof detection method.

Generative AI has advanced to the point where machine-written text can closely mimic human writing. This has sparked concerns in education and beyond. People are now seeking reliable ways to tell if an essay, article or social media post was produced by an AI. One proposed solution is to have AI developers embed watermarks or hidden signals in AI-generated text. Essentially, these watermarks are subtle patterns in wording or punctuation that are imperceptible to a human reader. However, they can be detected by algorithms or using a special key. The idea is that an AI model could mark its own output invisibly, allowing later detection if needed. For instance, OpenAI has experimented with a method to “statistically watermark” the outputs of its models. It works by inserting an “unnoticeable secret signal” into the text to indicate that it was machine-generated. In theory, this approach could make it much easier to trace content back to AI. It would help educators, businesses and platforms identify AI-written material more readily.

How AI text watermarks work

AI text watermarking draws on concepts from cryptography and linguistics. Unlike a visible watermark on an image, a textual watermark involves tweaking the AI’s word choices or patterns. These tweaks are done in a way that doesn’t change the meaning or readability of the output. One approach, developed by researchers at OpenAI, uses a cryptographic function to pseudo-randomly influence the selection of words as the model generates text. The output still reads naturally. However, it carries an underlying statistical signature that can be recognised by anyone possessing the secret key used to generate it. In practical terms, a detector tool using that key could scan a piece of writing for the AI’s hidden signature. It would determine if the pattern of word usage matches an AI-generated watermark embedded in the text.

Other watermarking techniques have also been explored. Some methods act like a form of linguistic steganography. They work by adjusting phrasing, synonym choices or punctuation frequency to encode information in the text. Crucially, these changes must be subtle enough to avoid altering the text’s clarity or style noticeably. Developers have to balance embedding a detectable pattern with preserving fluency. Too obvious a pattern might result in odd word choices that tip off readers or degrade quality. On the other hand, a pattern that’s too subtle might be unreliable to detect. The ideal watermark would be invisible to human readers but reliably picked up by automated checks. Researchers have found that using a few hundred words of output is often enough for a detection algorithm to confidently identify an AI watermark. This means even moderately long passages could be checked for AI origin with a reasonable level of confidence.

Industry research and commitments

Progress by OpenAI and Google

The concept of watermarking AI-generated text has moved rapidly from theory to active research in the last couple of years. OpenAI’s guest researcher Scott Aaronson first revealed in late 2022 that the company had a working prototype for watermarking GPT outputs. The plan was to integrate this feature into future systems once it proved viable. OpenAI has continued to develop this technique, though it has not yet been rolled out to the public. OpenAI has been cautious. A spokesperson noted that while their text watermarking method is “technically promising,” the company is taking a deliberate approach because of “the complexities involved”. This includes the risk of circumvention by bad actors and potential unintended impacts on certain user groups. Notably, OpenAI decided not to widely release its earlier AI-written text detector due to its limited accuracy. Instead, the company has focused on watermarking as a more reliable in-house solution.

Other major AI players are also investing in watermarking and related provenance solutions. Google, for example, has developed a tool called SynthID Text to watermark and detect AI-generated text. Google even made this technology available open-source, encouraging adoption by developers and businesses. SynthID works by modulating the probability distribution of the AI model’s word choices to insert a hidden pattern. Google claims this method does not compromise the quality or speed of text generation. However, even Google acknowledges limitations in its approach. SynthID is less effective on very short passages or on text that has been heavily rewritten or translated. It also struggles with content where factual accuracy leaves little room for variation. These are exactly the scenarios where adding or detecting a watermark becomes tricky.

Broader industry commitments

Beyond individual products, the industry as a whole has signalled commitment to developing watermarking standards. In July 2023, a group of leading AI companies – including OpenAI, Google’s parent Alphabet, Meta, Amazon and others – pledged at the White House to develop “robust technical mechanisms” to ensure users know when content is AI-generated. They specifically cited watermarking as one such solution in their commitments. This voluntary commitment was hailed as a key step for safer AI, and it means that in principle many big tech firms agree on the importance of provenance tools. Likewise, at global forums like the UK’s 2023 AI Safety Summit, top AI firms announced they are developing identifiers for AI-generated material. The momentum suggests that watermarking (or similar provenance tags) could become an industry norm. However, it’s not there yet. So far, no widely accepted or foolproof text watermark standard exists. Any methods in use remain nascent and easy to circumvent.

Our AI detector uses the one of world’s most accurate APIs for detecting Large Language Models (LLMs). Try it here!

Potential benefits of watermarking AI content

Upholding academic integrity

If AI text watermarks can be implemented widely and effectively, they could be a game-changer in several domains. Education is a prime example. Teachers and examiners worry about students handing in essays written by ChatGPT or similar tools. A reliable watermark would allow a quick scan to verify authorship – potentially catching academic dishonesty before it becomes rampant. OpenAI’s team explicitly had academic plagiarism in mind as a use case. The goal of their watermark prototype was to make it “much harder” for someone to pass off AI-generated homework as their own work. Institutions could use watermark detectors as a backstop, dissuading students from cheating because the AI’s signature would give them away.

Moreover, watermarks might reduce our reliance on unreliable AI-detection tools. Currently, many people turn to third-party “AI detector” software to guess whether text is AI-written by analyzing its style or complexity. However, these tools are often inaccurate and have famously flagged innocent human writers as AI. If major AI models themselves watermarked their output, detection would become far more accurate than these guesswork methods. Researchers even suggest that widespread watermarking could “turn the tide” on false positives from AI detectors. These detectors have been known to wrongly accuse students who simply write in a plain or generic style. A robust watermark system could instead provide definitive evidence of AI origin when it’s present, while human writing would register as unwatermarked – avoiding false alarms.

Transparency in media and curbing misuse

The benefits extend beyond academia. In content creation and journalism, watermarking could help readers and editors identify when an article or a section of text was machine-written. This transparency might build trust, signalling to audiences when they are reading human words versus AI-generated prose. In the realm of online misinformation and spam, watermarks offer a potential tool for curbing abuse. For instance, consider a scenario of a political campaign flooding social media with hundreds of AI-generated posts or fake comments. Platforms armed with watermark detectors could algorithmically flag or down-rank such content, since the hidden signal would reveal it as mass-produced by AI. This could make it harder for malicious actors to use AI for propaganda or impersonation at scale, as their campaigns would be more easily detected and traced.

Challenges and limitations

Evasion through editing

Despite its promise, watermarking is not a silver bullet – at least not in its current form. A fundamental limitation is that watermarks can be deliberately removed or obscured. Even a subtle linguistic pattern can be broken if someone takes an AI-generated passage and paraphrases it. Paraphrasing a passage – for instance by translating it to another language and back, or swapping out enough words with synonyms – can potentially wash out the hidden signal. Early experiments confirm that this is a serious concern. OpenAI found their watermark remained detectable after minor “localized” edits. However, it became much less robust once the text was globally modified – for example, when run through a translation system or rephrased by another AI modeltechcrunch.com. In fact, OpenAI acknowledged that determined users could trivially evade the watermark by such means. External experts have echoed this point. They note it would be “fairly easy to get around it by rewording, using synonyms, etc“. It becomes a cat-and-mouse game of watermarking versus rewriting between defenders and adversaries. In short, any watermark-based detection tool must assume that truly malicious actors will try to defeat it.

Adoption hurdles

Another challenge is the need for widespread adoption. A watermark is only useful if the majority of AI-generated text people encounter actually contains one. If only a few companies implement watermarks while others do not, someone intent on hiding AI origins will simply choose a model that doesn’t mark its output. Similarly, if open-source AI models don’t use watermarks, a determined user could just opt for those tools. One AI researcher pointed out that if it becomes a “free-for-all” with many uncooperative model providers, then safety measures like watermarking become much less effective. Without broad adoption – or government regulation – the impact of watermarks would remain limited.

This reliance on collective action means watermarking might work well in a relatively controlled environment – say, within a large company that requires its internal AI tools to watermark everything. But on the open internet, its effectiveness could be much more hit-or-miss. In high-stakes settings such as court evidence or scientific literature, the absence of a watermark wouldn’t prove human authorship. It might simply mean the AI didn’t use one, or that the signal was lost. In practice, watermarking may end up most useful for flagging the bulk of AI content from major services. Determined cheaters and propagandists, however, are likely to find ways around it.

Unintended consequences

There are also concerns about fairness and unintended consequences. For example, if only English-language models implement a watermark initially, non-English AI content might slip under the radar. Conversely, non-native English speakers who use AI tools could be unfairly scrutinised. OpenAI themselves noted the risk that watermarking could “stigmatise use of AI as a writing tool for non-native English speakers”. This could happen if detection tools are not carefully implemented and lead to undue suspicion towards certain users. Additionally, the strongest watermarking approaches often rely on secret keys and proprietary detection software. This concentrates a lot of power in the hands of AI firms. An ideal system would allow third-party or public verification. However, companies worry that making the watermark too public would enable others to decode or tamper with it. This tension between security and openness means that early watermark systems might only be verifiable by the company that produced the text. That scenario would require users to trust those companies’ assessments when checking content.

Wrapping up…

AI text watermarks offer a promising, albeit partial, solution to the challenge of detecting AI-generated content. The idea of an invisible signature in machine-written text could transform our ability to trace and trust written material in education, media and online discourse. Indeed, if every AI writing model embedded a robust watermark, it would significantly boost academic honesty and transparency across the board. However, the technology must clear several hurdles before it can live up to that promise. Any watermark scheme needs to be both resilient – able to survive reasonable edits or paraphrasing – and ubiquitous, adopted by most major AI providers. Only then can it truly make a dent in cheating or misinformation. At present, no watermarking method meets those criteria without caveats. All known techniques can be circumvented with enough effort, and none has been universally embraced.

In the meantime, watermarks should be seen as one useful tool among many. They could help tip the balance against casual misuse of AI – because a watermark makes AI output easier to spot. But they won’t catch a determined adversary who is set on avoiding detection. For educators, businesses and regulators, the emergence of watermarking technology is encouraging and will likely play a role in future AI governance. Yet its effectiveness will ultimately depend on widespread implementation and continual refinement to stay ahead of evasion tactics. In summary, AI text watermarks can help detect AI-written content. However, that holds true only if nearly everyone agrees to use them – and to use them well.