Do as I Say, Not as I Code: GitHub's Copilot Prompts IP Litigation with International Implications

The rapid proliferation of Large Language Models (LLMs) in con-temporary technological ecosystems has sparked significant legal debates, particularly regarding intellectual property (IP) rights. One particularly notable yet under-reported case is Doe v. GitHub, currently stayed in the Northern District of California after the court certified an order for interlocutory appeal on September 27, 2024. This lawsuit involves OpenAI, GitHub, and its parent company Microsoft, focusing on the use of open-source software (OSS) code to train LLMs, specifically GitHub’s Copilot—a programming assistance tool currently powered by OpenAI’s GPT-4 model and previously by Codex, a modified, fine-tuned version of GPT-3 additionally trained on gigabytes of publicly available source code. Although praised for its potential to enhance programming productivity, open-source developers and communities have raised concerns about Copilot due to its tendency to reproduce material from public repositories without properly attributing authorship or adhering to terms and conditions of the original open-source licenses. The upcoming decision by the Ninth Circuit on whether claims under DMCA § 1202(b)(1) or (b)(3) must meet an “identicality” requirement carries significant implications for the AI industry, particularly in shaping standards for copyright compliance in models that use open-source data.