Free and Open Source Software Licensing Requirements and Copyright Infringement Involving Artificial Intelligence Technologies
Authors
Department
Juristische Fakultät
Collections
Loading...
Abstract
Es wurde viel über den urheberrechtlichen Schutz von KI-Output und der KI-Programmierung diskutiert. In dieser Arbeit soll aufgezeigt werden, wie wichtig es ist, den Urheberrechtsstatus der Daten zu berücksichtigen, auf denen große Sprachmodell-KIs (sogenannte LLMs) trainiert werden. Die Gefahr für das Urheberrecht, die von KI ausgeht, wird durch die Copilot-Sammelklage demonstriert, die im Jahr 2022 in den USA eingereicht wurde.
Anhand einer Analyse des urheberrechtlichen Rahmens der EU und der USA wird in dieser Arbeit das Problem erörtert, das beim Training von KI auf öffentlich verfügbarem FOSS-lizenziertem Code entsteht. Der Grundgedanke dieser Arbeit ist, dass FOSS ein wesentlicher Bestandteil der Softwareentwicklung ist und als solcher vor einer möglichen Ausbeutung durch Big Tech geschützt werden muss. In dieser Arbeit wird festgestellt, dass Copilot gegen die in Lizenz bestimmte Verpflichtungen zur Namensnennung verstößt, was die Grundsätze der FOSS-Bewegung untergräbt.
Die rechtliche Analyse ergab jedoch, dass in der EU die Ausnahmeregelung für Text- und Data-Mining (TDM) in den Artikeln 3 und 4 der Richtlinie über das Urheberrecht im digitalen Binnenmarkt höchstwahrscheinlich auf Copilot und viele andere LLMs Anwendung finden wird. Obwohl es dem Nutzungsvorbehalt gemäß Art. 4(3) unterliegt, bedeutet dies, dass Copilot in der EU nicht als Verstoß gegen das Urheberrecht angesehen wird. In den USA ist es unwahrscheinlich, dass die Fair-Use-Ausnahme Anwendung findet, da eine ganzheitlichere Bewertung zulässig ist.
Die normativen Debatten zu diesem Thema spiegeln die Schwierigkeit wider, einen Ausgleich zwischen konkurrierenden politischen Interessen zu finden. Diese Arbeit soll zeigen, dass die Bedeutung von FOSS und die Förderung der Qualität und Zugänglichkeit von Software vom Gesetzgeber berücksichtigt werden sollte.
There has been much discussion about the copyright protection of AI output and the AI programming itself. This thesis seeks to demonstrate the importance of considering the copyright status of the data on which large language model AIs are trained. The Copilot class action lawsuit which emerged in the US in 2022 serves as a good illustration of this dilemma, which this thesis leverages. By analysing the EU and US copyright frameworks, this thesis discusses the problem posed by AI machine learning training on publicly available software code protected by free open-source software (FOSS) licenses. The cornerstone of this thesis is that FOSS is integral for software development and, as such, requires protection from potential exploitation by Big Tech. The thesis analyses the thirteen licences available in the data on which Copilot trains to conclude that eleven of the licences stipulate attribution as a condition of use. Yet, LLM’s programming makes it impossible to track how its output arises, creating a paradox which undermines the principles around which the FOSS movement is centred. Despite this, the legal analysis showed that in the EU, the text and data mining exception (TDM) in articles 3 and 4 of the Directive on Copyright in the Digital Single Market will most likely apply to Copilot and many other LLMs. While it is subject to the author’s opt-out right from article 4(3), it effectively means that in the EU, Copilot will not be considered in violation of copyright. In the US, the fair use exception is unlikely to apply as a more holistic evaluation of factors is permitted. The final determination, however, remains to be made by the courts. The normative debates surrounding this topic reflect the difficulty of balancing the competing policy interests. However, this thesis seeks to demonstrate that the importance of FOSS and the effort to promote the quality and accessibility of software should be borne in mind by policymakers.
There has been much discussion about the copyright protection of AI output and the AI programming itself. This thesis seeks to demonstrate the importance of considering the copyright status of the data on which large language model AIs are trained. The Copilot class action lawsuit which emerged in the US in 2022 serves as a good illustration of this dilemma, which this thesis leverages. By analysing the EU and US copyright frameworks, this thesis discusses the problem posed by AI machine learning training on publicly available software code protected by free open-source software (FOSS) licenses. The cornerstone of this thesis is that FOSS is integral for software development and, as such, requires protection from potential exploitation by Big Tech. The thesis analyses the thirteen licences available in the data on which Copilot trains to conclude that eleven of the licences stipulate attribution as a condition of use. Yet, LLM’s programming makes it impossible to track how its output arises, creating a paradox which undermines the principles around which the FOSS movement is centred. Despite this, the legal analysis showed that in the EU, the text and data mining exception (TDM) in articles 3 and 4 of the Directive on Copyright in the Digital Single Market will most likely apply to Copilot and many other LLMs. While it is subject to the author’s opt-out right from article 4(3), it effectively means that in the EU, Copilot will not be considered in violation of copyright. In the US, the fair use exception is unlikely to apply as a more holistic evaluation of factors is permitted. The final determination, however, remains to be made by the courts. The normative debates surrounding this topic reflect the difficulty of balancing the competing policy interests. However, this thesis seeks to demonstrate that the importance of FOSS and the effort to promote the quality and accessibility of software should be borne in mind by policymakers.
Description
Keywords
KI, Machine Learning, Urheberrechtsverletzung, Text- und Data-Mining, Lizenzverletzung, Recht auf Namensnennung, Free Open-Source-Software, Large Language Models (LLMs), AI, machine learning, copyright infringement, text and data mining, license violation, right of attribution, free open source software (FOSS), large language models (LLMs)
Dewey Decimal Classification
346 Privatrecht, 340 Recht, 005 Computerprogrammierung, Computerprogramme, Daten
Citation
Novobilská, Linda.(2023). Free and Open Source Software Licensing Requirements and Copyright Infringement Involving Artificial Intelligence Technologies. 10.18452/27658