Making the Plagiarism Machine Write Code Still Counts as Plagiarism

27 May 2026

A stylized halftone image of a meat mincer, based on a photograph from Ebay seller retrometropolous66 named "VINTAGE RETRO SPONG CAST ALLOY HAND MEAT GRINDER MINCER MADE IN ENGLAND".

I have been struggling to put into words my frustration with the open source community in failing to meet the present moment.

In case you've missed the last three years: AI labs have strip-mined the internet, robbing creators of their work and selling it back to the public as a machine designed to crush labour. Thanks to their relentless lobbying of businesses and governments, there is immense pressure on workers to offload their cognition onto AI products backed by a Large Language Model, on the false pretense that the model is a reliable source of knowledge.

(For avoidance of doubt, a reliable source of knowledge doesn't habitually make fraudulent citations, and doesn't invent answers then spout them off with 100% confidence. These behaviours are referred to as hallucination; a problem which researchers from OpenAI and Anthropic admit is intrinsic of LLMs and essentially not fixable.)

It is no surprise that AI's biggest success story to date has been in computer programming, a field where a lot of people have no professional standards and are allergic to solidarity. A lot of attention has been paid to new projects that have embraced vibe coding; usually for their oafish levels of complexity, repetition and serious design flaws. Less attention has been paid to established open-source projects, where in an attempt to avoid corporate pressure or alienating key contributors, some level of LLM-driven development has been accepted.

Everywhere I look, there are big projects opening the door to LLM-generated code submissions. The Linux kernel. systemd. Kubernetes. Firefox. Python. Rust. Even ScummVM, a project I contribute to, has added such a policy. All perhaps motivated by different reasons, but collectively they provide a broad social license to ignore all ethical concerns and normalise the presence of LLM code generation.

I wish I was shocked at the lack of a visible resistance movement in this space as there has been with writing and visual arts, but that would be in denial of the extreme free-market libertarianism that has dominated software/the tech sector for the last 50 years. There has always been a level of dry rot visible from space, and now we get to watch part of the house collapse from its own weight.

There are plenty of reasons to despise and oppose the frontier AI labs. For example, the cartoonish level of waste found at every stage of LLM technology. From the billions of investor cash set alight for training and inference and warehouses of unused GPUs, all the way to unhinged multi-agent cack factories like Steve Yegge's Gas Town, a project which openly boasts "you won't like [it] if you ever have to think, even for a moment, about where money comes from".

Other good reasons: the constant AI lab doom-mongering about sentience and mass unemployment to goose their valuation and land more marks, the worldwide breakdown of trust and consensus reality as their slop permeates everything, the end of discoverability as the internet drowns, and their willingness to sell their tech for accelerating mass surveillance and genocide.

In addition, the AI labs have been so good at subsidising demand that the major cloud providers have committed hundreds of billions toward new datacenter spending. These datacenter projects are best known for siphoning massive amounts of water and energy away from communities, creating atmospheric and noise pollution, inflating the worldwide cost of essential parts like memory chips, and adding a destabilising amount of construction debt to the global financial system.

And of course, the legal question is still open on whether training an LLM on copyright-infringing works is itself an act of infringement, or whether LLM-generated works are copyrightable at all. Early rulings by the US Copyright Office have stated that works where "the expressive elements are determined by a machine" are not eligible for copyright, and lobbyists are desperately trying to get new laws to firm up their position before the wave of lawsuits fully breaks.

All of these are important and justified grounds for opposing the onslaught of AI companies into every facet of life, and have been written about by people more eloquent and better-read than myself.

To keep things brief, let's talk about the ethics of using any large language model trained on all published open-source code to generate new code. Just that case! Not provide advice on punctuation and spelling, not work as a version of American Fuzzy Lop that costs 20 grand a throw, not the mostly-transformative use cases that are wheeled out to justify all LLM usage being good. New code.

We know on a fundamental level that frontier models are a laundromat for existing content, and AI labs have focused on randomising the output enough to make direct plagiarism claims harder to prove. We know that the frontier model companies go to extreme lengths to take everything, including material clearly licensed or marked by the author as not for use in LLM training. Even the so-called "open weight" LLMs, the golden boy of bikeshedded ethical LLM theory, have opaque training sets stuffed to the gunnels with scraped content that fails the same ethical test. We know that training a personal model with just stuff you wrote is physically not enough data to get a workable result, and plenty of charlatans will sell you this false promise and fake it with a frontier model.

My view is that generating code wholesale with a frontier LLM, then sticking your own name and copyright notice on it, is a massive slap in the face and should be treated with contempt, regardless of how easy or expected it is. Hundreds of thousands of developers contribute to the commons, often on their own dime, in exchange for attribution and making the world a better place. Working for exposure! And you can't even do that all of a sudden? That's now too hard? Spare me.

Having the LLM generate a merge request wholesale for an existing project combines the above with a second slap in the face. There's plenty of debate about how many lines of unvetted changes still counts as polite to throw at a maintainer. I would start at 0. Why should a maintainer bother to read a changeset that you couldn't be bothered to write, and most likely won't be able to answer follow-up questions on?

As a maintainer of ScummVM's Macromedia Director support, where the aim is to recreate the precise ordering and edge-cases of the original engine, the number of lines changed doesn't even matter. It could be one line! The work is in the bench-testing that has been done to confirm how the original engine does it, and incorporating it in a way that doesn't break other games or other versions of Director. In this context, LLM generation would be a pretty strong signifier that no thought was put into the manner or consequence of these changes outside of getting a specific game to work, which is against the project's goal of universal compatibility.

For what it's worth, it is possible to both loathe LLM-driven development, and not get mad at non-developers using them in a personal capacity to solve a code problem. They are a tiny fraction of the overall usage. My ire is firmly directed at the self-described software engineers talking up their love of scabwork.

But here we are. A lot of prominent developers decided they want to slam on the treat button ad nauseam and still call it engineering, and the united front is dead before it ever had a chance to form. That's life. My fault for expecting better.

I feel for every single person who has had their future ripped out from under them, and I want no part of it. The overwhelming urge has been to wind up all my coding projects and move to a hobby which is harder to ruin with slop.

Still, we've been here before. This isn't the first time a technology has been hyped to the world with the promise of improved productivity, then prove to be a disappointing (and now load-bearing!) albatross that drinks money. Nor will it be the last! Just look at the number of businesses still welded to Oracle Forms; management paid top dollar for that asbestos and by God it's going to stay in those walls until they collapse.

Maybe this is a flash in the pan and developers will regain their senses once LLM pricing is dragged kicking and screaming toward what it costs to run. Maybe the trauma of the oil crisis smashing into the impending debt collapse will be enough to spark a new renaissance, and we will collectively rediscover the joy of what is only possible through human labour. But until then, all we can do is look at the comments.

"I have never been so productive" they say, drunk on an indecipherable colossus they've spent $5000 to build, marveling at all the time they used to waste writing their own words of intimacy or comfort. "Claude writes much better code than I ever did" they say, face still notionally human but with a deadness behind the eyes, long ago having smothered any dreams of self-improvement or professional responsibility. "You obviously haven't tried the latest model" they say, turgid with excitement for the heat death of everything, celebrating as the sky is blotted out by their poisonous exhaust.

Posted 2026/05/27 - open source, llm, plagiarism