Wow. It's probably nice to live in some alternate universe where effective correctness provers exist.
Can you tell for us, who live in this universe, where automated theorem provers are extremely limited and pitiful how is it works for you over there? Have the wormhole you have used to reach our universe was designed with these tools or not?
</ sarcasm >
Practically speaking theorem provers are extremely limited because they hit, very quickly, not the Rice's theorem but combinatorial explosion.
In practice what people are using are prover assistants, prover verifiers and other such things.
In theory it should be possible to force ChatGPT to produce guidance for Coq or other such tool but in practice it's hard to convince ChatGPT to even produce correct piece of code if it's more than half-dozen of lines, making something that Coq would accept is a pipe-dream.
Whether the usual AI-style approach (add 10x more processing power to that model and if that wouldn't help do 100x more) would solve that issue is unclear, but today… it's almost impossible to prompt ChatGPT to write something verifyable unless problem you are discussing is trivial.
It's pretty nice dictionary tool for the APIs, but that's it. Program have to be written by human after ChatGPT would show said human some functions which can be useful for the task. And even then it can be quite non-trivial to make it produce the functions you may want to actually use.
Nah, it doesn't. Instead of “but halting problem” you now have “sorry, I couldn't do that” problem. IOW: instead of succeeding to produce pile of garbrage with some useful results in that pile ChatGPT would now start producing excuses with explanation about why it couldn't do anything.
Hardly an improvement.
If you would try that right now, today, they you would find out that there are no exploration it all: after you force ChatGPT to fix few compiler errors an attempt to push further would just produce programs which would have some checks omitted to fit the new bugs you asked about! You go in circles without ever producing satisfactory result.
I recommend you to actually try ChatGPT before you would build all these dream castles in your mind. Maybe one day GPT-10 would be useful for your plan, but GPT-4 as it exists today... nope.
And all these levels are way, way, WAY beyond what ChatGPT may produce today.
What it produces today is “random pile of code which looks vaguely similar to what your professor wants to see on exam to fool such professor into giving you good grade if he doesn't have enough time to try to understand what you wrote”.
Even bringing it to the first step “code actually compiles and thus can be tested” is very often a very frustrating battle.