Rust code generation and Chat AIs

This is an FYI rather than a question. Not sure if that is allowed here?

I've been working through the Rustlings exercises. I'm currently in the errors section. Just finished errors4.

The Rustlings solution uses if-else to fill in the missing code. My own solution used match.

I thought I'd run this through the various Chats to see what they would come up with. So I pasted in the code leaving a TODO comment in the new() function and asked them to use match to fill it in. To my surprise ChatGPT got it totally wrong at the first attempt and also didn't format and syntax highlight the output. After some prompted corrections it was able to get it right. Anyway, without going into too much detail, the best results overall were in this order...

Phind
Perplexity
Bard
Bing Chat
YouChat
Open Assistant (Correct in matching the conditions but produced weird non-compiling extra code)
ChatGPT (totally wrong until corrected)

Phind specialises in developer and techie queries. The top three Chats explained what they did. Bing Chat did so only with a follow-up prompt. YouChat was correct but didn't explain itself.

Last week I did a similar exercise with the football scores HashMap example. This was very surprising in that all of the Chats failed badly even though, from their explanations, it seemed like they understood what I was asking them to do.

I was able to get Bing Chat to get there in the end but I had to give it several successive prompts. I think it succeeded on about the 7th prompt. The others I couldn't make any progress on.

In general, Phind seems to be he best at code-related tasks but, as seen, not perfect.

Conclusion? Developers are not going to be out of work for a while yet! :smiley:

I'm not sure why this is surprising. AI is not magic [1].

It is rather humorous that getting accurate results from AI still requires a human operator. But this is not exactly a groundbreaking conclusion, either.


  1. Arthur C. Clarke was right, "any sufficiently advanced technology is indistinguishable from magic." ↩︎

3 Likes

The surprise was partly because I tried Phind first and it got it beautifully right with a nice explanation. So I expected ChatGPT to be at least comparable. Plus most of the others got it right.

But, in general, outside of dev tasks they all have their fare share of errors. They're often counter-intuitive in that, some queries that you expect to be trivial, are somehow very hard for the AIs to get right while others you expect to be hard they get right. And there are cases where a standard search gets what you want accurately.

In my experiments of the past few weeks I'd say that Bing Chat is the current best overall. Though with one query it started out correct and then gave wrong answers a week or so later!

From my limited experience playing around with ChatGPT, it is certainly not good at generating Rust code. However, it had no problems generating correct Python code after the first prompt for a few simple functions. The ChatGPT documentation explicitly mentions Python as the language you can expect best results.

3 Likes

Modern chat AIs can produce good responses if they have seen your exact problem or an essentially identical problem before. Otherwise, they tend to produce very questionable results, which might look convincing to someone with little to no knowledge of the topic, but which are wrong nonetheless. While these results can be a good starting point for further research, they should be treated with as much caution as any search result on the Internet.

3 Likes

With these models, “first attempt” really doesn’t matter. Whether or not your first attempt succeeds is up to chance. To evaluate how good it handles the prompt, you’ll need to ask it several times, then you can get a feeling for, statistically, how often the answer is correct.

That said, I just tested it about 10 times and it gave me a correct answer every single time. Out of curiosity what was you precise prompt and the “totally wrong” answer? Mine's the following (I tried with or without the ```s, without them it sometimes also returned code without code fences and thus displayed without formatting)

Use `match` while completing the code:

```
#[derive(PartialEq, Debug)]
struct PositiveNonzeroInteger(u64);

#[derive(PartialEq, Debug)]
enum CreationError {
    Negative,
    Zero,
}

impl PositiveNonzeroInteger {
    fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> {
        // TODO
    }
}

#[test]
fn test_creation() {
    assert!(PositiveNonzeroInteger::new(10).is_ok());
    assert_eq!(
        Err(CreationError::Negative),
        PositiveNonzeroInteger::new(-10)
    );
    assert_eq!(Err(CreationError::Zero), PositiveNonzeroInteger::new(0));
}
```

The code it gave back was/contained most commonly something like e.g.

impl PositiveNonzeroInteger {
    fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> {
        match value {
            x if x > 0 => Ok(PositiveNonzeroInteger(x as u64)),
            x if x == 0 => Err(CreationError::Zero),
            _ => Err(CreationError::Negative),
        }
    }
}

or

impl PositiveNonzeroInteger {
    fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> {
        match value {
            x if x < 0 => Err(CreationError::Negative),
            0 => Err(CreationError::Zero),
            x => Ok(PositiveNonzeroInteger(x as u64)),
        }
    }
}

From my limited experience playing around with ChatGPT, it is certainly not good at generating Rust code.

In this case, I was able to get ChatGPT to generate it correctly after giving it one hint.

Given this Rust code

#[derive(PartialEq, Debug)]
struct PositiveNonzeroInteger(u64);

#[derive(PartialEq, Debug)]
enum CreationError {
Negative,
Zero,
}

impl PositiveNonzeroInteger {
fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> {
// TODO
}
}

Fill in the missing code after the // TODO comment using pattern matching syntax that matches on the value parameter.

Initial answer (unformatted)

Here's the Rust code with the missing code added:

#[derive(PartialEq, Debug)] struct PositiveNonzeroInteger(u64);

#[derive(PartialEq, Debug)] enum CreationError { Negative, Zero, }

impl PositiveNonzeroInteger { fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> { match value { v if v < 1 => Err(CreationError::Zero), v if v > u64::MAX as i64 => Err(CreationError::Negative), v => Ok(PositiveNonzeroInteger(v as u64)), } } }

In the new function, we use pattern matching syntax to match on the value parameter. If the value is less than 1, we return an Err variant with CreationError::Zero. If the value is greater than the maximum value of a u64 converted to an i64, we return an Err variant with CreationError::Negative. Otherwise, we create a new PositiveNonzeroInteger with the value and return it as an Ok variant.

Then I gave it a hint

This is not right. When value is less than 0 it should return Err(CreationError::Negative). When value is 0 it should return Err(CreationError::Zero). When value is greater than 0 it should return Ok(PositiveNonzeroInteger(v as u64)). Please can you redo?

Next answer

#[derive(PartialEq, Debug)]
struct PositiveNonzeroInteger(u64);

#[derive(PartialEq, Debug)]
enum CreationError {
    Negative,
    Zero,
}

impl PositiveNonzeroInteger {
    fn new(value: i64) -> Result<PositiveNonzeroInteger, CreationError> {
        match value {
            v if v < 0 => Err(CreationError::Negative),
            0 => Err(CreationError::Zero),
            v => Ok(PositiveNonzeroInteger(v as u64)),
        }
    }
}

The surprise was not so much that it got it wrong but that it got it badly wrong while all the others got it either right or approximately right.

Another exercise I tried that I admittedly expected would be trickier was to feed it the scenario from the Rustlings HashMap soccer scores example. I didn't actually ask it to write any code but just to work out the results - total goals scored and total goals conceded for each team. I ran this through several AIs. Some did actually generate Python code for me, but all got it wrong. Even with coaxing they still got it wrong. Though I didn't try this exhaustively.

My initial expectation was that they would not understand my wording. But they all seemed to from their explanations, yet still produced the wrong answers. That's what was weird.

1 Like

Ah, you didn’t give it the test. Even so: I tried your prompt a handful of times without getting any bad results. Though I’m not 100% whether the default ChatGPT for me as a Plus user is different from the free one. To be sure, I made a free account and tried it, too, a few (i.e. 3) times, with successful/correct answers each time. Looks like you might have just been really unlucky :slight_smile:

As I mentioned above, since the responses are always a bit random, and different each time you ask the same question, you really need to test more than a single time to get a better picture of how reliably the AI is giving a good answer ^^

To be fair to the AI, humans do this a lot too. :grinning:

Remember that the current generation of chat AIs are mostly just re-mixing text data that they've seen before, based on statistical data about what looks good and what doesn't. These AIs don't have an accurate causal model of how programming languages or anything in the real world works. Until they do, it would be too much to expect them to reason about any topic as accurately as an experienced human.

At best, chat AIs help us trawl through a large amount of human-generated data, much like a search engine. This is useful in itself, but doesn't guarantee that the answers it gives are right.

3 Likes

My expectation is they will continue to do so. They don't posses inner dialogue. Their answers are probabilistic. Advantage is they are fast due to not being advanced.

Maybe it got it right for you because it had absorbed my corrected answer?

I don't think it works like that, though of course we can never be sure what kind of training they actually do.

FWIW, I've been occasionally leaning on them in working through Rustlings and most of the time in small bits of code they've been right. My original example seemed to be straightforward, so I wasn't surprised when most of them got it right while GPT initially didn't. If it had ben hit and miss it would have ben more understandable.

The soccer scores one was at least intuitively more complicated in that I was asking them to do more interpretation. The odd thing here is that in one respect they did more than I expected but from another they did less.

This was my prompt from the HashMap soccer scores sample.

Consider these lines.

England,France,4,2
France,Italy,3,1
Poland,Spain,2,0
Germany,England,2,1

Interpret "England,France,4,2" to mean England beat France by 4 goals to 2.

Calculate the total goals scored and total goals conceded for each team and display the results as a table.

This was partly an exercise to see how loose I could be in my description and have the AIs end up with the right answer.

Another question which stumps all the AIs (but requires up-to-date data, so excludes ChatGPT) is asking for a list of declared Republican and Democrat Presidential candidates for 2024. Bing Chat was best at this about a month ago but went backwards. But, in general, the AIs tended to try and list all those who the media think are going to run, rather than the declared candidates.

The first time I tried this you could just do a regular search and get the correct information from Wikipedia.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.