Exploring Plagiarism Within the Digital Neural Network
Home Blog Exploring Plagiarism Within the Digital Neural Network
Exploring Plagiarism Within the Digital Neural Network

Exploring Plagiarism Within the Digital Neural Network

Artificial intelligence has become a part of all spheres but it still has numerous issues to investigate and discuss. There are so many digital systems and types of AI tools that it is natural to raise a concern about their independence or dependence on each other. If people are tempted to plagiarize and use the efforts of others instead of creating something from scratch, why can’t AI? All AI systems are trained on what other AI tools have created and we face a new challenge: more AI content online makes it impossible to automatically protect new systems from the outputs of previous ones. Ingesting the results of AI use on the web, AI systems start plagiarizing. Will it be eventually possible to detect the original source of a code or text? Who’s to say?

AI Self-Plagiarism: Core of the Matter

The story of Jax Winterbone who was working with a new AI system Grok and realized that it makes use of an OpenAI code base has got viral. Igor Babusckin, xAI engineer, claims that there are diverse outputs of ChatGPT online and that is the reason for potential duplications and plagiarism. Grok, for example, has been trained on the available code on the web. Thus, the language can be copied even accidentally without a specific indention to plagiarize. For sure, that does not happen frequently and developers try their best to ensure that such issues do not happen again. Still, one cannot make sure that an AI tool operates on a remote mode and is up-to-date.

What is the essence of the problem?

The only way to get the AI system upgraded is to enable its training. This is the process which requires a huge volume of already available content. Courts have received a lot of lawsuits and complaints from human creators as AI is continuously digesting their work, even if it is protected with copyright. Now, the focus of the problem is changing as AI has started using not only large amounts of human work, but also previous AI outputs.

So, technically the process is the following:

  1. AI systems use the outputs of human work to train.
  2. AI systems produce their own outputs which get available online.
  3. New AI systems start using the training data produced by other AI systems.

Why was this problem not so critical before? Training of all AI systems was implemented on human-created outputs only as it was the only content available on the web. With the help of PlagiarismSearch.com and other plagiarism detection tools, it was possible to distinguish between authentic and copied texts and even trace the use of AI tools in creating content. Still, with multiple types of AI-generated content which is everywhere now, it has got extremely challenging to filter out whether AI products are copied from the work of their competitors or even their own outputs.

Digital Plagiarism: Are There Any Solutions?

What can be done to prevent using AI outputs for training of new AI tools?

  1. Limit the dataset used for AI training to content available by a certain date, for example November 2022 when ChatGPT was released to the public as a free chatbot. What will happen next? The answers of AI tools will soon become outdated without access to new information.
  2. Develop a new approach to AI training based on distinguishing between human-made and AI-produced content.

Is it possible now? Unfortunately not and that is bad news. Still, the technologies keep developing at a breathtaking pace.

Plagiarism within AI Systems: Is It a Fair Game?

Let’s start with classification of AI systems. AI can be of four types:

  • Reactive
  • Generative or limited memory
  • Theory of mind
  • Self-aware

The first two, reactive and generative AI, are programmed to perform certain tasks, while the other two are supposed to perceive, learn, and operate as human beings. Currently, only the first two types of AI are in use and they do rely on previous experiences to make decisions, absorbing training data and storing them to upgrade their problem-solving skills.

Plagiarism is a crime that is not allowed from ethical and ethical standpoints, but what if it is about copying the outputs of artificial intelligence, not people? In the United States, copyright protection is not eligible for AI systems. If ChatGPT or any other AI tool generates images or text, they can look creative and new, but they do not qualify for authenticity or protection. However, in China, the copyright was granted to an AI-generated image and that made it equal to anything produced by humans and gives the rights of ownership not to the AI system, but to a person who first generated the image.

Even if an AI system plagiarizes the outputs of another one, the ethical principles will not be clear in such cases. The question is whether it is done accidentally or on purpose to make training faster and more effective. There have not been many similar cases and precedents to make conclusions on ethical norms and legal standards. Still, it is the matter which requires a lot of attention and immediate actions to prevent dramatic consequences.

Taking into account numerous benefits of using AI tools, it is impossible to fancy that they will not be used any longer for some reason. Still, it is extremely important to proceed with caution as there are huge privacy, data security, and plagiarism risks in using the systems. Technical, legal, and ethical issues of using AI outputs for further training of AI tools should be discussed and analyzed as it may result in lower quality of new outputs and increase in the available bias and number of errors.

For minimization and mitigation of risks, it is crucial to take a proactive approach and provide clear policies, investing in cybersecurity, encryption, and new strategies of training AI systems. All the issues should be addressed without any delays as the problem can get more serious and the consequences may be gross. Unfortunately, it is not an easy problem to solve. AI requires continuous updating of datasets and it requires use of all available information. Thus, it is time to give the answer to the question: are we ready to accept plagiarism as a norm?

Kelsey Ayton
Born in Warsaw. Studied Psychology at SWPS University of Social Sciences and Humanities; took part in several inspiring Erasmus programs.
Former Practical Psychologist| Blogger of Various Mass Media | Currently PlagiarismSearch content writer | Mother-Freelancer
Have you got any questions?