Tokio stack overflow crash when using fmt::Display implementation for Error

Hi, I have this weird problem where Tokio will crash with the error thread 'tokio-runtime-worker' has overflowed its stack when an error happens, and you use ? to propagate it up, AND the Error implement std::fmt::Display

First, the setup. Axum as the framework(hence the tokio usage), using GraphQL(async-graphql) and MongoDB. Async-graphql requires that an Error must implement std::fmt::Display, so I added this to my error definition:

pub enum ZataError {
    DBError(String),
    AxumError(String),
    NotFound(String),
    ValidationError(String),
    WrongCredentials,
    MissingCredentials,
    TokenCreation,
    InvalidToken,
}

impl fmt::Display for ZataError {
    fn fmt(&self, f: &mut fmt::Formatter) -> Result<(), fmt::Error> {
        write!(f, "{}", self)
    }
}

The query that failed looks like this(but it's not really relevant, I put it here just for information):

#[async_graphql::Object]
impl QueryRoot {
    /// Returns all available courses starting today and forward.  Does not return old courses
    async fn course(&self, ctx: &Context<'_>) -> async_graphql::Result<Vec<Course>> {
        let db = &ctx.data::<DataSource>()?.db;
        let tutors = graphql_get_all_current_course_db(&db).await?;
        Ok(tutors)
    }
}

The root of my problem was in my Course struct definition, there was a disparity between the field definition and what the database returned, so when I did the query, it returned an error but I never saw that error since tokio itself crashed with a stack overflow error. Upon searching a bit, I stumbled on someone who had a similar problem and it was because of a reference to "self" in a Display implementation, and it caused some kind of infinite Borg-space-time-distortion loop in tokio while trying to propagate the error using '?'. This strangely looked like the change I had just made to comply with async-graphql requirement. And indeed, if I use 'unwrap()' to make it panic right away, instead of '?' I could see the real error that was field related.

But my question is this, how to prevent tokio crashing when trying to propagate the error up? What is wrong with the Display implementation? I searched online, but all std::fmt::Display implementation I saw used self in it. Am I stuck using unwrap here?

This implementation directly and unconditionally calls itself (write! uses the implementation of Display for the passed value, which is self), so it will trigger stack overflow without any async code - playground. You either need to derive Debug and use it, or implement Display manually (or use something like thiserror to derive it).

1 Like

Searching around a little, I've found the reason for the error to pop out on question mark operator:

So, to use the question mark, you have to either:

  • implement Display correctly (this is advised anyway);
  • use explicit error type in async_graphql::Result instead of default, so that the conversion doesn't take place;
  • convert your error to async_graphql::Error manually with map_err, before propagating it up.
1 Like

Woaw, thanks for sharing that, especially the links to the various part of the docs(my Achilles talon is trying to make sense of the API docs.).

I thought it was implemented the right way :stuck_out_tongue: (one reference I used is Rust by example here

I was about to ask "What is the right way" but I was also trying your tips about thiserror at the same time, and it worked, no crash and I got the custom error back. So I went into that crate to see how Display is implemented. 8-o It's a tad more than the 3 usual lines... I now need to dissect it and try to understand the Display thing a bit better..

Btw, interesting that it crashes without tokio involved too. I never thought of trying that.

There is a big difference. That piece of code says write!(f, "{}", self.0), and not write!(f, "{}", self), i.e. it defers the implementation to the wrapped value, not to itself.

1 Like

In general, when implementing something for enum, it almost always would somehow end in a match. In particular, thiserror explicitly matches on the enum value and then processes every branch using the corresponding attribute by feeding it into write! macro. Manual implementation would probably be very similar, like this:

impl fmt::Display for ZataError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            ZataError::DBError(data) => write!(f, "Database error: {}", data),
            // skipped other branches
            ZataError::InvalidToken {} => write!(f, "Invalid token"),
        }
    }
}

Ah, so basically, it's about the same as implementing axum IntoResponse for that enum. I am keeping a note about this.

So I removed thiserror and redid the Display implementation

impl fmt::Display for ZataError {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
            ZataError::AxumError(msg) => write!(f, "Axum error: {}", msg),
            ZataError::DBError(msg) => write!(f, "Database error: {}", msg),
            ZataError::NotFound(msg) => write!(f, "Not Found: {}", msg),
            ZataError::ValidationError(msg) => write!(f, "Validation error: {}", msg),
            ZataError::WrongCredentials => write!(f, "Wrong credentials"),
            ZataError::MissingCredentials => write!(f, "Missing credentials"),
            ZataError::TokenCreation => write!(f, "Token creation error"),
            ZataError::InvalidToken {} => write!(f, "Invalid token"),
        }
    }
}

I also reproduced my struct mistake, and relaunch the GraphQl query that should now fail and...

{
  "data": null,
  "extensions": {},
  "errors": [
    {
      "message": "Database error: The server returned an invalid reply to a database operation: invalid type: map, expected a formatted date and time string or a unix timestamp",
      "locations": [
        {
          "line": 2,
          "column": 3
        }
      ],
      "path": [
        "getCourseForTutor"
      ]
    }
  ]
}

Bingo!!! I got the custom error AND the more descriptive system error.

Big thanks here, this is perfect. And as a bonus, I got a much better understanding of implementing Display now. :+1:

1 Like