Rust CI builds get stuck forever in gitlab ci

We use a custom gitlab-runners using spot instances from AWS and we have cicd which build rust projects on them. We have 5 projects which share 4 m2.4xlarge, m2.4xlarge, c1.xlarge,c1.xlarge runners. But one time or the other the any one of the 5 projects would be stuck and times out in an hour.

#7 98.38 Compiling aws-sdk-sso v0.28.0

#7 98.38 Compiling aws-sdk-ec2 v0.26.0

#7 110.6 Compiling vaultrs v0.7.0

#7 121.8 Compiling aws-config v0.55.3

#7 188.2 Compiling vaultrs-login v0.1.7

and the project would eventuality timeout this kills a lot of developer productivity

we use DIND in gitlab and for docker build we use rust:bookworm image.

We have also tried implementing sscache with s3 as backend but the projects keeps getting stuck one time or the other.

Any ideas to would be greatly appreciated as building them locally hardly takes 5 minutes and building them on CICD takes atleast 10 mins to infinite if the jobs are stuck and timesout after 60 min

Are you building Docker images containing your rust code in them? Would cargo-chef help?

1 Like

I don't think caching would help because we use spot instances as our gitlab runner so they come and go, so every time a new docker runtime, but appreciate the comment though

Hrm, can't say as I have any obvious answers then. A minute to compile aws-config seems fishy. How fast does it take to compile locally if you start from scratch, i.e. no docker cache, no cargo cache?

Do you know which runners get timeouts? The internet tells me that m2.4xlarge instances have 68Gb of RAM and c1.xlarge have 7Gb or RAM. Running DinD on 7Gb of RAM seems like it might get tough.

Otherwise I would have to start monitoring the runners to see what's limiting it--network, CPU, memory. I'm sorry I don't have better answers for you.

It takes maximum 5 minutes without cargo cache and docker cache