looking back at authors guild, inc v google, inc
who remember
Here’s a legal battle over digital systems and copyright I had completely forgotten about: Authors Guild, Inc. v. Google, Inc. The short of it is that Google Books, involving as it did the scanning and digitizing of many books under copyright, was sued for copyright infringement. Google argued—successfully—that their replication of the books was fair use.
I was aware of this litigation when it was happening but, to be completely honest, did not care about it (and thus forgot about it).1 Google Books was obviously useful. I used it. I didn’t understand why I’d be mad about it and I didn’t see that anything was at stake. One vociferous critic of Google Books, though, was Ursula K. Le Guin, which is why I was reminded of it recently. When the Authors Guild attempted to settle with Google, Le Guin resigned. She also wrote two posts (one, two) on her personal blog discussing her issues with the settlement.2 I was reading these posts yesterday and found this passage, from her first blog post on the matter, familiar:
All the time the Settlement has been in the courts, Google has been blithely going ahead digitalizing any book it wanted without obtaining permission, let alone contractual terms. (I can attest to this, since they have thus pirated several of my books, with no attempt whatever to contact the publishers, my agent, or myself — none of whom are exactly hard to locate.)
First time as farce… second time, also as farce? Anyway, it sparked some thoughts. They are below.
As a practical matter, I do use Google Books—as well as its much more legally dubious cousin, the Internet Archive. However, Le Guin’s resistance here is not explicitly anti-digital. Her position is that instead of a private company scanning materials, there should be an internet equivalent of the public library. Such a library could only be formed through patient political work and—probably—copyright reform that would involve rolling back the Mickey Mouse copyright extension and coming up with a practical approach to orphaned works. You’re no doubt shocked to learn that nobody wanted to undertake that task… so… it did not happen.
At the present time, in the States, there is a legal form of the digital library: it’s called Libby. Libby is very convenient if you are an ereader user (as I am) or read on your phone (I do not do this except under duress because it hurts my eyes). It’s also, for libraries, incredibly expensive: one library estimated in 2025 that Libby costs them $8,000 a week. Unlike authors in the United Kingdom, Canada, and (I think?) Australia, authors in the United States do not receive royalties from library checkouts. While authors do receive a small cut whenever a library is forced to repurchase a license—just as they do when libraries repurchase physical books that have worn out—the steep cost of Libby does not benefit authors.
Libby’s cost to libraries, much like credit card transaction fees, exists for me in the category “real problems that are not exactly reasonable to expect people to care about on a day-to-day level.” That is: people should not agonize over whether or not they really need a Libby book. They should not feel bad if they check one out and return it unread. People shouldn’t feel bad for paying with a credit card at a small business either. However, credit card fees are bad for the business and will either be passed back to the consumer or eat away at the business’s profit margin. The system that exists is not sustainable. Sooner or later, something will give.
Thus a question the Google Books contretemps raises for me now is: where would writers be if the idea that creating a free digital public library was a necessary project had really been taken up? If that had been built, rather than passively anticipating that a private company would eventually do something, what would have happened? How much of the sheer stuff that trains LLMs would actually not exist in the same way if Google had not scanned thousands and thousands of books? How would the entire LLM project be different? I don’t know the answers to these questions. The answer might be: we’d be in exactly the same place we are now.
To pivot slightly. As I see it, there are four main types of objections to LLMs. These are:
Environmental: they use too much water and they are bad for the local ecosystem.
Practical: they are bad at what people want them to do.
Copyright-based: they have been trained on an enormous amount of stolen material.
Moral (“Butlerian Jihad”): even if they were clean, effective, and licensed all their training material, it would be wrong to use them for almost all purposes.
I think it would be fair to say most people I know occupy positions one or four. While they might be bothered by copyright issues, if those were solved, it wouldn’t change their negative view of the technology. Personally, as I have said before, I think that LLMs can be useful assistants for people who know what they’re doing but they seem to be lethal to developing skills and knowledge if you don’t know what you’re doing.3 I would, as it were, caucus with the Butlerian Jihadists on issues of LLM usage, but it’s not exactly where I stand.
The details of my own position are irrelevant most of the time, though, because there are also not many applications of LLMs that would be useful to me in my own work. They don’t pose some huge temptation that I am virtuously resisting.4 Recently, however, I read about VLMs, which absolutely could be useful to me:
Traditional OCR tools work through pattern recognition: detect where text appears on a page and then identify individual characters and words. VLMs take a different approach. Rather than processing characters and words in isolation, they integrate visual perception alongside a sophisticated understanding of language and the relationships between words. This allows VLMs to recognize that the above source is a table of cities organized alphabetically by state, that lontpeller falls under the Vermont heading, and that the intended city is therefore likely to be Montpelier.
Apple’s native OCR has become pretty good—good enough that it’s been unclear to me if there’s any reason to purchase a dedicated OCR program. Still, it’s not perfect. There are blocks of clear and typewritten text it can’t read. It is very bad at handwriting. A computer program that could reliably read, transcribe, and search handwritten material would be extremely helpful to me. Obviously I would still be the person reading and annotating and analyzing and so on. But, to pick an example that has really happened, one time I had written down a paraphrase of a fact without carefully noting where the source text was. I have been doing my citations in chunks as I go, and when I got to the paraphrase I realized I needed the exact letter. However, the letters in question were written in awful handwriting—I couldn’t even skim the pages myself, much less use Apple to search them—and I had to reread a lot, very slowly, before I could find it.
One takeaway from that story is that I made a costly mistake and I won’t make it again. (True! Except for the times preceding this event when I had already made it.) Still, using a VLM would be great… except that I can’t. If I wanted to use a VLM, I would, as far as I can tell, have to put my material into something like Google Gemini. And that is something that, ethically and legally speaking, I cannot do with the handwritten material I have, which is not only under copyright but private. (And wouldn’t do, to be very clear!) It would both violate the trust of the archives that have let me make copies and the rights of the copyright holders. And frankly, even if Google promised security and privacy, I would not really believe them (or any company developing LLMs) and so I would not entrust them with sensitive material.5
So again I wonder where we might be if people had taken the Google Books lawsuit as a wake-up call for more ambitious ways of thinking about how to preserve both public access to knowledge and private claims. As I said above, maybe we’d be nowhere. In retrospect, though, what happened with Google Books looks like a lost opportunity.
Also, should I buy a dedicated OCR program? I’m still thinking no.
I do remember some project called something like “the hand of Google” that collected instances when you could see somebody’s hand in the scans on Google Books.
Her resignation post itself seems to have been removed; I don’t know why.
For this reason, I think keeping them out of education is probably more important than any other particular case of using or not using LLMs. I would support some state of affairs where you don’t get your LLM license until you’re 21, or something. Don’t ask me how that would work. I would just support it. Probably. Not if using it before 21 was I don’t know punishable by death. But otherwise.
My non-use of LLMs is more like my non-ownership of a mega-yacht in that respect. Would I own a mega-yacht if I were a billionaire? Probably not. I would own a bunch of horses. That’s my money pit of choice. Honesty demands however that I admit I have not been put to the test on this issue and probably will never be unless something very strange happens.
I would still say I’m not virtuously resisting anything here because I am just honoring agreements I’ve already made.

I occupy positions 1-4 lol. LLMs are civilization-destroying semi-functional trash.
not really the point at all but i’ve been thinking a lot this week that given the state of the job market and the way “adults” behave i can’t blame children for cheating with llms. if they are honest they will not be rewarded, and a lot of the time they won’t even learn. paying teachers more is literally the only way out of it. incentivizes them to engage in conversations instead and build community that builds social currency around engagement. and paying teachers more would do nothing to llm capitalists - we could all continue on. alas