The Tag War: how does the AI hide its tracks and is it possible to catch it?

UnMarker removes watermarks from AI-generated images, questioning copyright protection reliability

With the development of generative language models (LLM), the issue of machine text identification has become critically important, from scientific publications to news reports. Today, even experts are often unable to distinguish human-made photographs from those synthesized by a computer. However, an important question arises: how to protect the intellectual property of the creators and users of these technologies? One way is to use special digital tags called watermarks, which allow you to determine the origin of the image. But how effective is this method of protection?

Watermarks have long been considered a reliable tool for labeling AI content, but recent research like the one described in IEEE Spectrum shows that attackers have already learned how to remove them. 

1. How watermarks work and why they are vulnerable

Watermarking in LLM is the hidden introduction of patterns into the text, for example, through the excessive use of rare words or specific syntactic structures. However, as a new study shows, the effectiveness of this measure is questionable. Researchers have developed a tool called UnMarker that can effectively remove most of the existing types of digital watermarks, making the process of identifying the origin of images extremely difficult.

• Vulnerability example: In AINL-Eval 2025, it is noted that models like GigaChat-Lite generate texts with unnaturally short sentences — this could be a marker, but adaptive algorithms quickly learn to mimic human rhythms.

Watermarks resemble the "digital signatures" of the early Internet — they are effective only until the tools for their forgery appeared.

How does UnMarker work?

To understand how UnMarker works, first let's look at the principles of how traditional digital image labeling methods work. Modern methods are based on spectral image analysis. The image is divided into high-frequency and low-frequency components, each of which corresponds to different areas of the picture. For example, human hair is characterized by a high level of change in pixel values, while smooth facial skin has low frequencies of changes. It is in the spectral region that watermarks invisible to humans are introduced, since the human brain does not perceive such differences well.

The uniqueness of the UnMarker approach lies in the fact that it ignores traditional methods of removing visible image elements and directly affects the spectral characteristics of the entire image. This allows UnMarker to successfully disrupt the integrity of watermarks, removing them almost imperceptibly to the human eye. Although small visual artifacts may sometimes occur, they are difficult to notice without careful analysis.

Studies have shown that UnMarker is able to remove about 57-100% of various types of watermarks, including the most common approaches such as HiDDeN and Yu2. Even the newest and most stable watermarks, such as StegaStamp and Tree-Ring Watermarks, proved vulnerable to the UnMarker attack, losing approximately 60% of their effectiveness.

2. Alternative detection methods: statistics vs. machine learning

While watermarking is failing, researchers are betting on a combination of approaches:

• Statistical analysis: AINL-Eval 2025 revealed that human scientific annotations contain 10 times more numbers than the AI versions, and are also more complexly structured.

• Neural network detectors: For example, RoBERTa and DeBERTa, trained on 52,000 English-language abstracts, detect stylistic artifacts even in new models.

Problem: These methods require huge marked-up datasets and are not universal for all languages.

3. The future: "cat and mouse" or symbiosis?

• Conflict scenario: Each new AI (for example, Deepseek V3) will generate detectors, and those will stimulate more sophisticated generators.

• Cooperation scenario: The introduction of "ethical watermarking", where models voluntarily label content, as suggested by some participants of SemEval-2024.

How serious is the threat?

The authors of the study emphasized that the use of UnMarker requires certain technical skills, but the software itself is available on GitHub, which makes it accessible to a wide range of interested parties. The cost of the hardware for the attack is low: there are enough cloud servers with powerful NVIDIA A100 graphics cards, which are available for rent at reasonable prices. Moreover, the code includes tools to verify the success of the attack, facilitating the mass removal of watermarks.

Thus, the potential for mass removal of digital watermarks casts doubt on the very idea of using watermarks as a means of copyright protection and combating the spread of counterfeits. For organizations seeking to ensure the protection of their works of art and intellectual property, alternative strategies are needed that can reliably confirm the originality and authenticity of the material.

Detection technologies are lagging behind the generative capabilities of AI. The solution may lie not in the technical plane, but in the institutional one — for example, through mandatory verification of scientific articles by platforms like the arXiv.

The arms race between creators and expositors of AI texts is just beginning. As long as watermarking remains a "paper tiger," a combination of linguistics, cryptography, and regulatory policy could be the key to transparency of digital content.

Write and read comments only authorized users.

You may be interested in

Read the recent news from the world of robotics. Briefly about the main.

Researchers unveil muscle-powered leg that outperforms conventional designs

The robotic leg jumps across different terrains.

NASA has lost contact with the Ingenuity Mars helicopter

Mars rotorcraft Ingenuity experienced a worrying communications blackout.

Invest in Growth: Riverside Micro-Cap Fund VI

Discover how Riverside's Micro-Cap Fund VI fuels growth for North American companies.

Share with friends