Ingesting the U.S. Copyright Office’s AI Report
Earlier this month, the United States Copyright Office (USCO) released a pre-publication version of part three of its report on copyright and artificial intelligence (AI), focusing on the use of copyrighted works in training generative AI models. The report centers on fair use, offering recommendations to address tensions between copyright holders and AI developers. While the Office admits legal uncertainties remain, it currently endorses a market-driven, voluntary licensing approach—reserving statutory intervention only if such markets fail.
The USCO makes five conclusions regarding generative AI:
- While the first and fourth factors in fair use analysis are most important, a determination of fair use is context-specific and is determined on a case-by-case basis;
- Licensing markets will be critical in fair use determinations;
- Government intervention in licensing markets is premature;
- Regular monitoring is needed to ensure laws keep pace with AI advancements; and
- A balanced approach is essential to protect both copyright and innovation.
The USCO’s Understanding of AI
The report begins with an overview of AI training methods and copyright concerns. The USCO maintains that memorization is a key issue, in which training data (often protected works) can be generated verbatim—or at least in a substantially similar manner—thereby embodying the expressive works the model is trained on. Copyright holders criticize this, preferring generalization, where models extracts broader patterns to apply in new contexts.
Still, the USCO argues that generalization isn’t free from concern. The report states that AI learns at multiple levels of abstraction, including themes and direct phrasing. The USCO claims this is especially relevant for retrieval-augmented generation (RAG), which they say involves storing and retrieving parts of training data, raising serious fair use questions.
Fair Use
Fair use, an affirmative defense, requires a fluid four factor test, weighing different factors in distinct ways, depending on the situation. In its fair use discussion, the USCO focuses mainly on the first and fourth factors, which the Supreme Court has also noted are the most important. The four factors, codified in 17 U.S.C. § 107, are:
- The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
- The nature of the copyrighted work;
- The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
- The effect of the use upon the potential market for or value of the copyrighted work.
Courts weigh these four factors, not the USCO. Within fair use analysis for AI, there are two main factions: those who believe widespread, unauthorized use of protected materials will cause great harm, and those who say regulating AI and related technologies will stifle technological progress. To the USCO, fair use must be evaluated in the context of how the new work is ultimately being used.
Factor 1
Within the first factor, courts review transformativeness, commerciality, and how access was gained. A use is more likely transformative if it adds new meaning or expression, rather than merely substituting for the original. Citing Campbell v. Acuff-Rose and Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, the report suggests that training AI models is generally not transformative—especially when the outputs reflect the “essence” of the training data. The Office says new expression is not always transformative but can be relevant to determining the purpose and character of the work. Following this logic, the USCO argues RAG is less likely to be transformative because individual works are recalled in order to tailor responses based on user input. In the eyes of the USCO, training AI is not inherently transformative because these models are trained on works by extracting meaning, and the most transformative use is for research deployment or non-substitutive tasks. Generating outputs that are substantially similar to copyrighted works is considered less transformative.
Commerciality turns on whether the specific use is for a commercial purpose or advances a commercial purpose. For lawful access, the USCO defers to courts on how heavily bad faith weighs against fair use.
Factor 2
The nature of the copyrighted work is straightforward: training datasets often contain expressive works and use of more creative or expressive works is less likely to be fair use compared to functional or factual works.
Factor 3
The amount and substantiality, in the USCO’s view, is that “To the extent there is a transformative purpose, the use of entire works on that scale should be reasonable.” This factor may weigh less heavily against generative AI training when there are limits placed on outputs so that copyrighted training material is not available to the public.
Factor 4
Lastly, the effect of the use on the potential market can be quite vast given that AI poses real risk of market dilution due to its speed and scale. Still, the USCO urges courts not to assess harm narrowly. It acknowledges valid public interest arguments on both sides. Courts consider lost licensing opportunities as market harm under fair use only when the licensing market is traditional, reasonable, or likely to be developed.
Licensing
The report discusses compulsory, voluntary, collective, and extended collective licensing (ECL). Compulsory licensing is viewed with skepticism because these licenses become deeply embedded in the industry and are difficult to undo, even when the rationale behind its implementation no longer exists. Voluntary licensing, while preferred, is currently untested at the scale needed for AI. Collective licensing and ECL could help fill the gap, though ECL’s opt-out structure raises concerns since U.S. copyright law is traditionally opt-in. Nonetheless, it can block unwanted use. The USCO maintains that such licensing schemes should only be considered if voluntary markets fail.
USCO’s Final Takeaways
As with previous reports in this series, the USCO refrains from calling for immediate legislative action. It believes the courts should first clarify key legal questions. The USCO advocates for no government intervention, particularly with licensing markets and advocates for a voluntary licensing approach. If, however, market failures result, then specific and targeted intervention, such as extended collective licensing, should be considered. For the USCO, it is simply too early for broad legislative or government action.