Contact Us


Disruptive Competition Project

655 15th St., NW

Suite 410


Washington, D.C. 20005

Phone: (202) 783-0070
Fax: (202) 783-0534

Close

Generating Nonsense: A Deep Dive on the Copyright Office AI Report

Credit: alashi

The U.S. Copyright Office (USCO) recently released a pre-publication version of its long-awaited Report on copyright and AI. While the Report correctly concludes that no legislation is necessary at this time, it also makes several points that deserve reexamination.

The Report at once dives deeply into describing the various forms of AI yet later only focuses on very specific uses of foundation models, while ignoring the vast number of other uses they lead to. The discussion of other types of models and their uses complicates the issue and is particularly problematic given that the Report’s focus was supposed to be on the potential copyright issues involved in training models, not on their outputs. USCO’s over-focus on the outputs infects its substantive analysis, leading to the fundamental errors that in turn result in a motivated analysis lacking support in statute or case law.

Warhol

As one example, because the Report keeps examining the possible infringement of an output, it fails to grapple with the Supreme Court’s decision in Warhol. The fourth fair use factor examines the effect of the use on the market for the used works. But Warhol emphasized the importance of looking at the correct use. What might be a fair use for one purpose might not be in another circumstance. The use in question here is not the output—it’s training leading to the creation of a multi-purpose AI model. 

Analyzing fair use on the basis of the potential outputs of the model is like analyzing the fair use of a photo of an artist based solely on its commercial usage, allowing that usage to override critical or transformative uses. Even to the extent that the output is relevant—and it might be in some circumstances—the presence of safeguards that restrict the model’s ability to output such works or to filter out such outputs would be important to the analysis. And if the model itself isn’t tuned to produce infringing outputs, it isn’t even necessary to reach that issue, as the fair use analysis would see the use as the creation of a multi-purpose AI model.

Licensing

The Report also dedicates a full sub-section to the possibility of losing licensing opportunities. By assuming that the licensing market exists and is directed to the copyright in the works, USCO finds that there is a loss of a licensing market. But those licensing opportunities only exist if training is not fair use. As the Supreme Court notes, “it is a given in every fair use case that plaintiff suffers a loss of a potential market if that potential is defined as the theoretical market for licensing the very use at bar.” 

And while the Report discusses a number of training data deals that have already been cut, it characterizes those as being fundamentally about copyright and thereby decides that the market for licensing that data exists. Copyright is almost certainly part of these deals—AI companies often license things they might not need to license in order to avoid the cost of litigation. But USCO ignores what is likely the more important part of these deals: AI companies particularly desire high-quality training data, not solely for the quality of the prose or the information contained within, but just as importantly for that data to be accurately identified and labeled and provided in a form that is convenient to access. And finally, there’s also value in having a clear, convenient, and straightforward process for obtaining this data. 

Market Harm

Finally, and most harmfully, the intense focus on the outputs leads USCO to embrace a theory of market harm that even they “acknowledge [] is uncharted territory”. Many copyright scholars and commentators have highlighted this flaw as the most problematic of USCO’s errors. Market harm is part of the fair use analysis, but it does not expand to cover any potential harm to the sales of your work. As the Supreme Court has noted, when examining market harm, “a potential loss of revenue is not the whole story”—courts “must consider not just the amount but also the source of the loss.” Not all market harms are cognizable under the fourth factor. 

However, USCO discusses “significant potential harm” to the “value of copyrighted works” that can lead to “lost sales,” even if “outputs are not substantially similar” to “any specific copyrighted work,” claiming they “can dilute the market for works similar to those found in its training data, including by generating material stylistically similar to those works,” as well as existing licensing options and those “likely to be feasible”. 

In other words, USCO argues that the creation of an admittedly non-infringing output might compete with the training works and is thus an unfair use. But the simple possibility that a work might compete in the general market is not the inquiry the fourth factor requires. The fact that the author of a work used to create a competing work might suffer “some economic loss [] as a result of this competition does not compel a finding of no fair use,” particularly where the competing use is transformative (as it is in the case of AI). A contrary holding would reverse decades of case law on topics like reverse engineering and compatibility.

Technology

There are other, smaller errors in the Report—e.g., it appears to think that in retrieval-augmented generation (RAG), the model is being trained on the corpus of documents from which it retrieves information. This is a fundamental misunderstanding of how RAG works—models don’t train on their retrieval corpus, they access and interpret the documents that are part of that external database. The AI model developer never has access to those documents and couldn’t possibly have trained on them. Instead, the model is trained to allow the end user to better access, interpret, and analyze their own documents—a highly transformative use (that often will not even involve copyright infringement, as many RAG users use it on their own documents to which they own the rights). 

Whether looking at the Report’s analysis of licensing opportunities and market harm via dilution, or looking at their understanding of AI, there’s only one real response: 

Intellectual Property

The Internet enables the free exchange of ideas and content that, in turn, promote creativity, commerce, and innovation. However, a balanced approach to copyright, trademarks, and patents is critical to this creative and entrepreneurial spirit the Internet has fostered. Consequently, it is our belief that the intellectual property system should encourage innovation, while not impeding new business models and open-source developments.