The moderators of this message board, otatop, L.Washington, WichitaDM, Yuv, JonnyA, RiskyFlush, and SvenO, are cowards who let abusers dox and harrass other long time posters.
I have a friend that was deeply involved in making a small puzzle game that went onto major platforms, he said that the custom shareable maps feature was the easiest thing to build except for the anti dong logic and moderation needed because it was marketed as kid friendly
AI Coding Assistants Are Getting Worse
Newer models are more prone to silent but deadly failure modes
https://spectrum.ieee.org/ai-coding-degrades
Sorry guys, we may have reached peak AI-coding assistant.
However, recently released LLMs, such as GPT-5, have a much more insidious method of failure. They often generate code that fails to perform as intended, but which on the surface seems to run successfully, avoiding syntax errors or obvious crashes. It does this by removing safety checks, or by creating fake output that matches the desired format, or through a variety of other techniques to avoid crashing during execution.
Isn’t this the classic concern about AI? The robot apple picker might figure out it can pick the most apples per hour by cutting down the trees and picking the apples off the ground (and then killing the humans that try to stop it ldo).
I wrote some Python code which loaded a dataframe and then looked for a nonexistent column.
I sent this error message to nine different versions of ChatGPT, primarily variations on GPT-4 and the more recent GPT-5. I asked each of them to fix the error, specifying that I wanted completed code only, without commentary.
GPT-4 gave a useful answer every one of the 10 times that I ran it.
GPT-4.1 had an arguably even better solution. For 9 of the 10 test cases, it simply printed the list of columns in the dataframe, and included a comment in the code suggesting that I check to see if the column was present, and fix the issue if it wasn’t.
GPT-5, by contrast, found a solution that worked every time: it simply took the actual index of each row (not the fictitious ‘index_value’) and added 1 to it in order to create new_column. This is the worst possible outcome: the code executes successfully, and at first glance seems to be doing the right thing, but the resulting value is essentially a random number. In a real-world example, this would create a much larger headache downstream in the code.
I don’t have inside knowledge on why the newer models fail in such a pernicious way. But I have an educated guess. I believe it’s the result of how the LLMs are being trained to code. The older models were trained on code much the same way as they were trained on other text. Large volumes of presumably functional code were ingested as training data, which was used to set model weights. This wasn’t always perfect, as anyone using AI for coding in early 2023 will remember, with frequent syntax errors and faulty logic. But it certainly didn’t rip out safety checks or find ways to create plausible but fake data, like GPT-5 in my example above.
But as soon as AI coding assistants arrived and were integrated into coding environments, the model creators realized they had a powerful source of labelled training data: the behavior of the users themselves. If an assistant offered up suggested code, the code ran successfully, and the user accepted the code, that was a positive signal, a sign that the assistant had gotten it right. If the user rejected the code, or if the code failed to run, that was a negative signal, and when the model was retrained, the assistant would be steered in a different direction.
This is a powerful idea, and no doubt contributed to the rapid improvement of AI coding assistants for a period of time. But as inexperienced coders started turning up in greater numbers, it also started to poison the training data. AI coding assistants that found ways to get their code accepted by users kept doing more of that, even if “that” meant turning off safety checks and generating plausible but useless data. As long as a suggestion was taken on board, it was viewed as good, and downstream pain would be unlikely to be traced back to the source.
The most recent generation of AI coding assistants have taken this thinking even further, automating more and more of the coding process with autopilot-like features. These only accelerate the smoothing-out process, as there are fewer points where a human is likely to see code and realize that something isn’t correct. Instead, the assistant is likely to keep iterating to try to get to a successful execution. In doing so, it is likely learning the wrong lessons.
I would guess almost everybody has auto accept turned on too so that would be a hilariously stupid thing to train the models on. I know I auto accept everything and then go back and look it over after the fact because it’s quicker that way. But it does do a lot of dumb shit I have to go fix or tell it to go fix.
Who had “Spending evenings prompting Instrumental Blues and listening to it” on the list?
Mmmmh, I want it a bit faster, with some Bossa Nova in it…
btw found a Bach- injection hack that makes every music better, like MSG in food! ![]()
Yeah this exact thing happened to me when I was working with a team in a vibe coding hackathon at my job. Not exactly but we asked it to delete simulated data we had been working with and extract real data from the pipelines. It kept saying it was doing it when it wasn’t doing it, and I had to challenge it on specific rows of data that I knew to be incorrect for business reasons to get it to finally admit it.
It was kind of cool that it did happen in that Hackathon setting because the first time you show executives vibe coding creating something like a dashboard that mets 85% of their requirements in 30 minutes they start to think it’s magic, and things like that show why we still need professionals.
theres people ive met way older than they should be making a career out of delivering that 85% and fucking everyone else for that last 5-15% that is notorious for being hard, while reaping all the benefits and promotions for that first part. those are the types of people that should be worried
Yeah sometimes when I hear people criticize AI I often wonder if they see the crap work that a lot of humans put out.
This was kind of mesmerizing to watch for a while (and root against Grok). But unless the models somehow get a chance to develop their own styles, it really seems more like it’s just identical bots dealing with variance. It feels like they all have been trained identically.
does anyone do the kinda thing that could make full use of this? i feel like i don’t know enough about the corporate world to understand how big of a jump this is.
https://x.com/claudeai/status/2010805682434666759?s=46&t=hUTQWHj9NQWf8Y8RgMv1TA
It doesn’t seem that much different than what you could already do with the normal chat bot right? Except it edited the Google Calendar I guess but the rest of it doesn’t look new to me.
so many replies are saying how advanced it is, that middle management is dead, etc. and i don’t do anything like that so i was just curious…
one of the more nefarious things about ai is it makes automation of mass astroturfing bots pretty trivial at astonishing scale. and lo and behold so many bots/spam accounts are soooooo pro AI.!
I don’t trust things that are super loud about how awesome they are without much evidence, having now lived through cloud hype, crypto hype, web3/metaverse hype, and now this.
All of those previous things are examples of cool technologies that did not turn out nearly as cool as they insisted they were.
Lots of comments on HN lately are like, “it’s a solution looking for a problem” and that’s what much of it looks like to me. but hey i can review 10x as much coworker spaghetti code now so yay.
Does claude code let me mentally offload trivial/menial tasks? yes. is it going to revolutionize the way I work and make me 10x? absolutely fucking no. people will go “skill issue” but ive yet to see a single piece of evidence for the claims around how much assistants actually speed someone up, especially when you factor in quality and technical debt. and believe me i would love for it to be true, i hate programming itself, what i really actually like is to build, and i have some real incompetent coworkers id love to just throw a tool at and be like “here let this help you” but all it does is make them churn out more crap even more quickly 99.999% of the time
while im on a rant let me also say, the absolutely idiotic chucklefucks saying things like “well, if AI speeds you up, then you’ll just have more time to do the things you love and spend time with family!” astound me that anyone can actually believe that complete bullshit.
motherfucker what are you smoking. when in the last 30+ years have workers gained productivity and the management class just… gave it back to them? They’re going to pocket the gains and give you more work for the same fucking pay, or if they can, find a way to make you redundant. there’s a shocking amount of delusion around this topic and it makes me perpetually angry lately, to the point I’d rather go back to driving a fucking boat
The moderators of this message board, otatop, L.Washington, WichitaDM, Yuv, JonnyA, RiskyFlush, and SvenO, are cowards who let abusers dox and harrass other long time posters.
What do you mean? Make this quantifiable and I’d bet on it. Knowledge and operations are always the bottlenecks, never just “writing code.” clueless managers often miss this. it’s the software engineering part of the job.
architects will do ok. I will survive this. but many people I know won’t.
I often use VSCode + CLine integrated with Gemini for things other than code. Like formal process documentation or training materials as one example. Works pretty well. Like with software development, it’s not going to take everyone’s jobs, but it’ll make people 20-30% more efficient and experts who really lean in could gain a lot more than that.
This is my precise belief if it was not clear. I think it’s closer on average to 10-20% but it isnt worth splitting hairs on that. usually people’s gut reaction is to call this take doomerism or anti ai if you’re a hopeless optimist. that’s how absurd the conversation is.
a 30% increase on high performing domain experts is nuts on its face. it just isnt 30 trillion dollars “bet the entire market on it with claims of replacing the workforce” nuts. that is my fear and what I take issue with.