Indeed. It occurs to me that asking an AI to write the code ti implement what you ask and the tests for it is asking for trouble. I tried that as an experiment and sure it generated tests that did not really test what they claimed to and I have seen that when asked to fix a failing test it change the test to make it pass rather than fixing the code.
But again, this is something our AI has in common with humans. When I worked in safety critical software (avionics, military stuff) the process was that one team writes the code from the specifications. Meanwhile a different team wrote the tests from the same specifications. The idea being that if the code developers had misunderstood the spec they could not then write the tests with the same misunderstanding and make everything look OK when it is not. Also prevents deliberate cheating.
Or do you mean that non-mammals generally cannot be trained? That would also be incorrect. Think of Birds but fish and bees can also be trained. (I am omitting links, search the internet if interested)
So there is no logic, but the non-existing logic should be flawed? Or there is logic, but you can't detect it? Or you can't detect flawed logic? Or human written logic is detectable, AI written logic not? This sounds weird...
They can not be trained in the same sense as AI can not be teached: there are no reliability. That's why “unfortunate incidents” happen with non-mammals much more often in spite of them being represented in much smaller numbers among “domesticated” animals.
Birds are much closer to mammals than to alligators, in that regard. They both are sophisticated enough not to survive if their thermal regulation fails, for one thing.
It's been a while by I just noticed this. The first assertion there is demonstrably wrong. I think by now we have all gotten AI to generate small programs that do what we want and are basically reasonable code. I have had AI make perfectly reasonable and working translations from C to Rust or BASH to Python and so on. Not large scale of course, nothing novel, but basically doing a lot of tedious work.
That leads the question about when, at what scale, at what complexity or at what degree of novelty does the AI generation break down. I'm inclined to agree that by the time that happens it's too late, one then has a pile of unfathomable code that is unfixable. Of course that happens on large human created projects as well.
My only defence against that just now is that I don't care. There is no intention to develop what I have had AI generate further. It works, it's done. And none of it is critical to our enterprise.
Yes. And it already contains small pieces that are subtly wrong, even if it's only 500 or 1000 lines of code (maybe 100 lines of could wouldn't be problematic, 3 lines are genuinely good).
It's the same with human-written code, that part is not different for AI!
The part that's different is that when your code is maybe 500 or 1000 lines of code you would go over it, comb thrugh it with an attempt to make it less problematic and more solid — and then you may create something bigger that's still works. That requires System 2 thinking that current AI doesn't have.
Sure, if you add human to the loop then things are not as bleak… but then it only works when you are dealing with something which human knows very deeply, mean where AI help is not really needed at all.
When you go outside of human competence situation is bleak: AI does subtle bugs and human couldn't fix them because these are outside of human's expertise and LLM couldn't fix them because LLM couldn't even play games without violation game tules!
On the other hand the other day I was idly wondering about rendering WGPU directly to the hardware on embedded Linux systems that are headless (No X11, Wayland etc, etc). So I asked my AI friend about it, he came up with a plan, then proceeded to implement it step by step. Two hours later I had 900 lines of Rust and shader code that sets up everything required and draws the famous triangle in three colours. Not only that it has different "fall back" code paths for the difference between the hardware support on Raspberry Pi, Nidia Jetson and other devices we have.
Now, from my perspective, this is way outside my human competence. The WGPU API is huge and complex, and I know almost nothing about it. Similarly bolting that onto the hardware driver facilities available is something I know nothing about. I could have started to research all and try lean all that. Except that it would take me a year and I had no idea where to even start.
Looking at the code I understand very little of it, not because it's AI generated but because I don't know the API's it's using. I can see some issues with it, like a very long method half of which is used to use a particular hardware path but is unused if another hardware path is available, diving off into some other "fall pack" path method. Certainly room for some nice refactoring in there. Oh, and it duplicated the exact same WGSL shader code in both those paths, crazy!
The situation is not bleak. I have something that works. Something I can now bolt egui or whatever onto. I have a serious working example of all this WGPU/DRM/KMS stuff that I can use as a starting point for studying all those things. It's a huge win in that respect.
I do grant you that if I continue blindly adding more and more to this I'm sure it would all collapse at some point.
After about hundred repetitions of “this time it works for sure, honest, trust us” spiel becomes old.
It's true that at some point these things may work, absolutely… but right now it remings me of that sales guy from Japan who rented CRT TVs in 1980th and published ads, every few weeks, that flat TVs would soon replace them (to boost his business).
As we all know nothing like that have happened, back then, even if we all use flat TVs, these days. These needed few decades of developments before they were ready to replace good old CRT ones.
Similarly here: even if we may all program with the use of “coding agents” in year 2050, that doesn't mean we would see them doing good job in year 2030, let alone year 2026…
My experience is that AIs mess up even simple programs. I had a log file with occasional entries that spanned multiple lines that described a single higher level event, I wanted to extract some key values from such a group of lines and generate a single CSV line from it. Relatively simple, and a one-off analysis script. Worth trying AI. It managed to screw up the case of other unrelated log messages in between, then when it fixed that it broke two back to back log messages, and then it didn't handle a log message at the end of the file correctly.
I spent more time trying to get the AI to do it than it would have taken to just write it myself. And I resorted to just rewriting it properly myself in the end.
Now, what AIs are good for is coming up with jq or ffmpeg command lines (probably well covered by their training dataset). Or simple refactoring (e.g. "split this large gui function into 4 ones, one for each tab" or "convert series of if-else into a match"). Anything much beyond that has been a massive disappointment in my experience.
Do you actually using a paid subscription? I have currently these $20 monthly subscriptions for OpenAI, Google Gemeni, and Claude Anthtropic, and all work mostly well for medium task. Currently I am using mostly Claude Code in the terminal.
I have sometimes read that free unpaid subscriptions might work much worse, which make sense, as the compute is expensive, and quality of responses is not so important for people asking for the nearest restaurant or a recipe for a chocolate cake.
And yes, I do exactly that. I typically ask for some help using the word "please", and try to write mostly correct text with correct capitalization, correct spelling and mostly correct grammar. And I try to avoid asking too trivial questions, or too many questions -- typically not more than one question per week, or in rare cases a few per day, following by a break of a few weeks, so that the LLM can recover a bit
Do you mean it spat back the wgpu hello_triangle example to you? Claiming that it would take a year to learn wgpu is exactly the sort of learned helplessness that is really the only reason to consider what AI is doing to be amazing.
Is it useful? Sometimes, sure. Is it doing anything you couldn't do? No, never. Does that mean you should never use it? Of course not.
WGPU really just isn't that hard. It's extremely boilerplate, it might not be useful for you to learn it, but let's not get crazy here; you could have easily gone through any of dozens of tutorials and copy pasted for exactly the same result and probably about as fast. But yes, getting it to spit out a single use small internal tool is otherwise a pretty ideal use case.
Really, the problem with AI is really just that the hard bit of programming is debugging a weird bug in a big unfamiliar codebase, and AI is very very close to completely useless there, it is far worse than you at reading a bunch of code and keeping it in memory and understanding intent.
Yes, this was for my day job, and they have the paid Copilot offering. I even selected one of the fancy models (Claude something or other). AI is just pretty bad.
Then I assume that you are working in an area for which very few up-to-date training data exits. For example for the Xilem GUI or the latest Bevy versions, my experience with AI was not that great. Because Bevy changes drastically from version to version, and because for Xilem we have not much documentation and the internal Xilem structure is "experimental" and not covered by standard text books.
Isn't that the common case unless you are working with Web, react and tailwind? (I think those are the names? I honestly really have no clue about modern web dev.) Most things for my dayjob will be almost entirely internally written code in the whole stack above the OS, and I expect the same is true for most people who work outside of web. (In my case: hard realtime industrial machine control.). The code base (millions of line of C++, some dating back to the late 90s) is also too large for an AI to understand.
But AI seems (as described in an earlier post of mine in this thread) to fail pretty badly even on simple scripts that aren't in that main code base. If you are doing a task outside the training set it sucks. Which is probably why it works OK at churning out yet another website much like all the others, or figuring out ffmpeg command lines.
I've had pretty mediocre at best experience even with the just released Claude Opus 4.6 in the copilot CLI (because the WebStorm plug-in is spectacularly useless past the auto complete) in the "think hard" mode, with bog standard typescript react code.
Asking it to convert the repo off the obsolete node10 moduleResolution leads to it not finding all the node10 affected projects because it doesn't properly understand how extends works, once you get it past that, it runs tsc to typecheck and assumes that it succeded because it didn't see any output within 5 seconds, and reports everything succeeded.
It also refused to update to typescript 6 beta "because that's not a real version", and even trying to get it to add a test setup to a package based on the convention of the sibling packages required manual intervention for literally every single line it suggested and required me to manually add the test library dependency it failed to add.
Asking it to diagnose bugs has had an approximately 10% hit rate, and has been overall a wash in terms of time saved vs time wasted. My best results so far has been as a lazy structural search and replace where I make one edit then tell it to do it to every other instance. If I learned how the actual structural search and replace feature of WebStorm worked I could probably do that just as easily and much faster.
The difference as far as I can tell is just this is a big complicated repo with lots of weird inconsistent projects.
AI is ok (if you're lucky) at writing greenfield code up to about 10-30kloc, but so is just about any intern. That doesn't impress me.
Yes. In the same way I, and many others, have been anthropomorphizing computers and software for decades. Especially when they don't do what we want and we start cussing at them. Or the same way many anthropomorphize cars and many other objects. Also my father was a Czech, in the Czech language everything has a gender. I have been saying "My AI friend" as a kind joke. I guess it does not come over that way in text on a forum post.
Ha! If the only thing AI could do was come up with working ffmpeg command lines that would be worth every dollar invested in it.
No. If that is what I wanted I could have cut and pasted it for myself. I have already looked at those WGPU examples and a good part of "Learn Wgpu". What I wanted was to be able to use WGPU on embedded Linux maches (Nvidia Jetson, Raspberry Pi, etc) running headless, no X11 or Wayland, no window manager etc. That means no winit crate as is normally used to get a surface to draw on in the browser or desktop. Directly to the hardware.
What I got was something that:
Draws an animated triangle with wgpu (Vulkan backend preferred).
Attempts to create a DRM/KMS surface via wgpu’s Vulkan path.
If that surface cannot be created (common on Jetson with split GPU/display DRM devices), it automatically switches to a CPU scanout fallback:
Render offscreen with wgpu to an RGBA8 texture
Read back to CPU
Convert RGBA → XRGB8888
Blit into DRM "dumb" buffers
Page-flip using KMS with EVENT flag and wait for vblank (double-buffered, tear-free)
Plus a bunch of keyboard/mouse event handling. All in all much bigger than just "hello triangle" almost 1000 lines of code.
Given the complexity of what I have got I stand by my claim that it would take a year for me to get that working especially given that I have real work filling my time. So I think that is amazing.
Is it good code? Well, it works but looking at it I can see room for some nice refactoring and some repetition that is not required. It has two copies the shader code, one in each arm of the pipeline, for example.