2 Versions of the pipeline pattern: Why is one version faster than the other?

I want to use channels to kind of use the parallel pipeline pattern in Rust. Now I have two different implementations and realized that one of them is much faster. I would like to know why?

Version 1:

// set up the pipes
let (main_out, sa_in) = channel();
let (sa_out, sb_in) = channel();
let (sb_out, sc_in) = channel();

// start the stages 
thread::spawn(move || {stage_a(sa_in, sa_out, list1)});
thread::spawn(move || {stage_b(sb_in, sb_out, list2)});
let last_stage = thread::spawn(move || {stage_c(sc_in, missing_data_list)});
// pipeline is now ready

// send the ids to stage A
for id in costumer_ids {
    main_out.send(id).unwrap();
} 

complete code: Rust Playground

Version 2:

// set up the channel to the first stage
let (main_out, sa_in) = channel();

// start the pipe
let main_in = stage_c(stage_b(stage_a(sa_in, list1), list2), missing_data_list);

// send the ids to stage A
for id in costumer_ids {
    main_out.send(id).unwrap();
} 

In this Version, every stage returns the Channels Receiver to the next stage.
Complete Code: Rust Playground

On my MacBook with 4 Cores Version 2 runs three times faster than Version 1. I suppose, it has something to do with the Receivers that are created in the main thread in Version 1.

Can someone explain this to me? Thanks in advance!

(Semantically, the code makes no sense in some ways - its just an example!)

Every time you send a value to a channel it allocates some memory to hold the value.