Copyright vs. ChatGPT: Unravelling the Legal Dilemma | Article

Artificial intelligence is becoming increasingly present in numerous activities in our daily lives. So much so that interacting with it has long ago gone beyond the privilege of specialists in the field. As is often the case in legal matters, new factors in public life raise numerous new questions that legislation has been slow to answer. This is normal considering that laws govern existing social relations and should be created with precision, thoroughness and sustainability.

The general public and the legal community have long commented on the increasingly apparent entrance of artificial intelligence (“AI”) into our daily lives. The intersections between the various manifestations of AI and copyright law have also long been of interest to legal circles. Rather than saving the question of whether there is a clash between copyright and artificial intelligence for the end of this article, I prefer to state that, at this stage, there is no clash. Moreover, it is not imminent. Instead, modern copyright law is facing yet another new challenge, together with online piracy and works of art in the Meta universe.

This article does not delve deeply into the legal nature and effects of artificial intelligence, as the topic has already been widely debated by legal scholars internationally. Nevertheless, this academic debate does not provide substantive answers to businesses that face the choice of whether to use AI services in their operations and in how this use is regulated by law.

The emergence of the easily accessible ChatGPT service has allowed anyone curious to tap into content created by artificial intelligence. Various fields of art and commerce need to adapt to this new situation and the possibilities it creates without clear legal regulations. For the purposes of this article, the author will focus on the intersections between copyright and the ChatGPT phenomenon, as it is the most popular manifestation of artificial intelligence at the moment. Before analysing the points of contact between AI and copyright law, it is worth mentioning some of the other hot legal questions that ChatGPT has raised.

One major issue that has already caught the attention of regulators in several EU countries is the protection of personal data and the vast amount of information freely provided to the service for its "training" by users. The leak of chat data from multiple ChatGPT users has led OpenAI (the company administering ChatGPT) to apologise to the public and assure them that any technical errors have been corrected(1). The negative effects resulting from the free sharing of data with the chat service for the purpose of training and making it more usable became apparent shortly after launch. The media published evidence of Samsung employees using ChatGPT to check for errors and optimise software code they had created(2). Some of the shared information constituted company trade secrets, but once shared with the chat service, it became part of its library, compromising its confidential nature. Thus, the lack of clear limitations and remedies has led to an explicit ban of the service in Italy and has sparked public debate in Germany, France, and Ireland(3). In Australia, a case has arisen over the platform's liability for disseminating false defamatory facts about the Mayor of Hepburn Shire(4). Mr Brian Hood has notified the public of his intention to initiate legal action against OpenAI in relation to false information where ChatGPT linked the Mayor's name to a corruption scandal from the recent past.

Perhaps the best description of ChatGPT would come from the Chat itself. When asked "What are you?” it provides the following response: “I am ChatGPT, a language model developed by OpenAI. My main purpose is to assist users by answering their questions and providing information on various topics. I have been trained on a wide range of subjects, including science, history, geography, sports, art, technology, and more. My training data includes a diverse collection of text from books, articles, websites, and other sources. However, please note that my knowledge is based on information available up until September 2021. I utilize natural language processing (NLP) algorithms to analyze questions and generate meaningful responses in real-time. The GPT (Generative Pre-trained Transformer) model I am based on has been trained on vast amounts of internet text, optimizing its ability to predict the next word in a sentence. This enables the model to learn various grammatical and semantic patterns and generate meaningful and grammatically correct expressions. Instead of being programmed with specific rules, I have been trained to "understand" and analyze human natural language. When you ask me a question, I analyze its meaning, search for relevant information in my database, and generate a response that is meaningful and useful. While I strive to provide accurate and helpful answers, please keep in mind that I cannot guarantee 100% accuracy, and it's always a good idea to verify the information you receive from me.”

The main intersections between ChatGPT as a manifestation of artificial intelligence and copyright can be divided into the following categories:

1. Artificial intelligence as a subject of copyright

This issue is clearly settled by current legislation and does not require detailed analysis. The software code behind the algorithm of operation of artificial intelligence services, such as ChatGPT, is subject to copyright. Pursuant to the Bulgarian Copyright and Neighbouring Rights Act (“CNRA”), software code is treated as a literary work and can enjoy the same legal protection. As ChatGPT itself confirms, "The copyright on ChatGPT belongs to OpenAI, the company that developed me. OpenAI has the rights to the model and the information it generates. This includes the copyrights on the programming code, the machine learning model and other related aspects."

2. Artificial Intelligence’s creations as subjects to copyright

The qualification of AI-generated text, programming code or even an image as a work covered by copyright protection is debatable. Of course, the perspective of the specific national legislation of the jurisdiction of the analysis is essential. It is this author’s opinion that the text created by ChatGPT cannot fall within the definition of Article 3, paragraph 1 of the CNRA, according to which "any literary, artistic and scientific work resulting from a creative endeavor and expressed by any mode and in any objective form shall be the object of copyright". The fact that the content created by ChatGPT is based entirely on the application of an algorithm by an inanimate entity automatically rules out the possibility of creative activity. Further, Article 5, first sentence of the CNRA specifies that “an author is a natural person whose creative endeavors have resulted in the creation of a literary, artistic or scientific work”. Of course, a legal entity may also be the owner of copyright, subject to certain legal requirements, but the original authorship, under our national law, always belongs to the natural person - the creator.

The strict application of the current text of the CNRA excludes the possibility that the texts generated by ChatGPT in compliance with the algorithm set by OpenAI can be qualified as copyright works. In a similar vein, most academic articles on the subject have been developed by specialists in various jurisdictions. The issue at this stage is most often considered in terms of images created by artificial intelligence, but the conclusions would be analogous with respect to literary texts.

3. Artificial Intelligence data as a copyright work

ChatGPT's answer to the question of whether it creates original texts is as follows:

“As a language model, I generate original texts rather than copying from existing ones. When generating my responses, I utilise the knowledge I have acquired during my training from a large volume of texts from various sources. My training procedure is based on predicting the next word in the text using contextual information from preceding words. This enables me to generate new and original texts in response to a given question. However, based on my training, I can employ common grammatical structures and phrases that are found in many texts. This may result in similarity or resemblance in expressions to other texts, but it does not mean that I simply copy or repeat existing texts.” There is no clear evidence as to what part of the information used in the "training" of the system is subject to third party copyright, but it is undisputed that there is such. Moreover, according to OpenAI's terms and conditions(5), content provided by ChatGPT users, as well as content generated by the service in response to user queries, may be used by the system for the purposes of providing its services. In short, some of the "knowledge" of artificial intelligence constitutes someone else's copyright. And while users, by using the service, agree to the applicable terms and conditions and validly license their rights, the legality of the use of existing copyright material in the initial "training" of the system remains questionable. Insofar as the service is provided for commercial purposes and ChatGPT also has a paid version, it does not fall within the scope of free use under Bulgarian legislation.

Copyright risks in using content created by ChatGPT

Having clarified that the content created by ChatGPT could not qualify as subject to copyright, the following questions remain open.

Who owns the rights to the generated texts? OpenAI's terms and conditions state that both the data provided by users of the system and the results obtained in response to a query belong to the respective user.

What are the risks if one uses ChatGPT-generated content in their business? The most significant risk is that you may start using content that is identical or similar to a high degree to content used by another. According to ChatGPT: “As a language model, I can generate different responses to the same question due to several factors: (1) Model Variation: With each use of the model, the input parameters and initial state can differ, leading to different outputs. This means that even with identical questions, the answers can vary. (2) Generation Variation: During the process of generating responses, the model incorporates randomness, which can result in different choices when formulating the answers. Small changes in the model's internal processes can lead to different texts. (3) Lack of Context: The model does not remember previous answers or interactions with users as it lacks session-to-session memory. This means that when questions are asked again, the model doesn't have a predefined context from previous responses and can generate different answers. However, in certain questions or scenarios where the answer is clearly defined and logically follows, the model may provide more similar or matching responses to identical questions.” That is, the possibility of generating two identical texts is available.

Another possible risk arises from the potential use of texts created by ChatGPT and passing them off as one's own. Leaving aside the moral aspect of such an act, it is important to consider several other factors. Artificial intelligence bases its texts on all the information available to it at the time but is not trained to refer to it systematically and to produce texts with clear references to the sources used. Therefore, the users expose themselves to the risk of plagiarism. The problem of the lack of uniqueness of a text is also essential. Publishers of thousands of scientific journals have banned or restricted contributors' use of chatbot services run by artificial intelligence, amid concerns that it could clutter academic literature with flawed and even fabricated research. The widespread use of ChatGPT in preparing papers has already led to it being cited as a co-author on a number of papers. In response, the leading US journal, Science, announced an updated editorial policy banning the use of text from ChatGPT and clarifying that the program cannot be listed as a co-author(6). At the same time, some of Hollywood's major movie studios have announced their intention to use artificial intelligence to draft scripts. This caused a violent reaction of the Writers Guild of America, who expressed great indignation and pointed out that ChatGPT would become a "plagiarism machine" due to the software's lack of ability to create independently(7).

Last but not least, the risk of AI providing outdated, incorrect, misleading or self-contradictory information should not be underestimated. As a lawyer, the author finds comfort in the clarification provided by ChatGPT itself: "It is always important to note that the information I provide should be verified and confirmed by reliable sources, especially when it comes to specific facts, current events, or legal matters." The complexity and importance of the legal issues are highlighted by their explicit mention in this disclaimer. And while artificial intelligence software dedicated to legal work is now available, I believe the day when programming code can replace thinking, experienced and skilled lawyers is a long way off. The same undoubtedly applies to creative industries.

Instead of concluding, I would like to mention that a few days ago, the CEO of OpenAI, Sam Altman, told the US Congress that the future development of artificial intelligence is unthinkable without building a clear legislative framework. He called for the creation of a US or international agency to license the activities of the most powerful systems using artificial intelligence. A few years ago, the EU also initiated a debate on introducing regulation of artificial intelligence(8). The draft at this stage is entitled Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts(9). For the sake of completeness, reference should also be made to the European Parliament resolution of 20 October 2020 on intellectual property rights for the development of artificial intelligence technologies(10). The discussions on the proposal for regulation clearly show the many questions raised by the emergence of artificial intelligence and a large number of views on how to regulate the issues raised. The explanatory memorandum to the draft states that: “Against this political context, the Commission puts forward the proposed regulatory framework on Artificial Intelligence with the following specific objectives: (1) ensure that AI systems placed on the Union market and used are safe and respect existing law on fundamental rights and Union values; (2) ensure legal certainty to facilitate investment and innovation in AI; (3) enhance governance and effective enforcement of existing law on fundamental rights and safety requirements applicable to AI systems; (4) facilitate the development of a single market for lawful, safe and trustworthy AI applications and prevent market fragmentation.” The main points of the proposal are "prohibited AI practices"; "classification rules and main categories of high-risk AI systems"; "transparency obligations for certain AI systems"; "measures to support innovation"; "governance systems at Union and national level"; "codes of conduct". Also interesting is the definition of "artificial intelligence system" contained in the proposal: “software that is developed with one or more of the techniques and approaches listed in Annex I(11) and can, for a given set of human-defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with”. The main focus of future regulation is data security and the adapted use and development of technology. Copyright aspects are currently not at the centre of the debate, and questions towards legislators remain open.

(1)https://openai.com/blog/march-20-chatgpt-outage

(2)https://www.forbes.com/sites/siladityaray/2023/05/02/samsung-bans-chatgpt-and-other-chatbots-for-employees-after-sensitive-code-leak/

(3)https://www.bbc.com/news/technology-65139406

(4)https://www.theguardian.com/technology/2023/apr/06/australian-mayor-prepares-worlds-first-defamation-lawsuit-over-chatgpt-content

(5)https://openai.com/policies/terms-of-use

(6)https://www.science.org/doi/10.1126/science.adg7879

(7)https://en.as.com/entertainment/hollywood-writers-call-chat-gpt-plagiarism-machine-n/

(8)https://www.bbc.com/news/world-us-canada-65616866

(9)https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206

(10)https://www.europarl.europa.eu/doceo/document/A-9-2020-0176_EN.html

(11) (a)Machine learning approaches, including supervised, unsupervised and reinforcement learning, using a wide variety of methods including deep learning;

(b)Logic- and knowledge-based approaches, including knowledge representation, inductive (logic) programming, knowledge bases, inference and deductive engines, (symbolic) reasoning and expert systems;

(c)Statistical approaches, Bayesian estimation, search and optimization methods.