Enhancing and adding to a request root span for tracing\logging

I'm using tower to build a web service - well it's tower wrapped inside smithy-rs. I'm using tokio::tracing for the logging inside the web service. I have multiple services "sandwiched" some which are kind of standard, some which are specific. I'm trying to achieve something on the lines of a RootSpan, that is a span that wraps the entirety of my request through all the layers and services that I have, and that I can reference and enhance with information as it passes through my layers\services.

E.g: I have and authentication layer and if authentication is successful I would like to enhance the span with the information for the caller ID, that is add that field to the RootSpan that will be logged in each event. Like that every subsequent log entry will also contain the identification of the caller from the point of the successful authentication on. I have multiple such fields that I would like to add along - that is beside each even't specific log message and fields.

I could not find a direct way in the tracing create to access that which I call the request root span - basically at the entry point of my request I add a span that I would like to be this RootSpan . I have found no way to retrieve that from the tracing crate down the line, directly in the Service\Layer implementations that form my request path.

So I have been exploring how to achieve the above. I have a couple of ideas but none seems to be ideal. I would like some opinion about the problem in general and the possible solutions that I have concoted so far. Maybe anyone has found a way to have the type of behaviour that I'm talking about. Disclaimer: I'm a Rust language novice, I have 6 months of experience with it, so what I'm saying might have some rough edges.

I have tried to use the task_local mechanism that I think it's also used internally by the tracing crate to move the spans across contexts and async tasks. It's a bit of a hack, as what I need is not really to have the same variable with possible different values in different scopes, but mostly to move along my request path and along task contexts a shared variable.

// this is my request span structure
#[derive(Debug, Clone)]
pub struct RequestSpan {
    pub root_span: Span, // ! not sure if I need to wrap this up in an Arc<Mutex<>> also
    pub props: HashMap<String, String>,
}

// this is my task local, which I feel is a bit of a hack
#[derive(Debug, Clone)]
pub struct RequestSpanWrapper(pub Arc<Mutex<RequestSpan>>);

tokio::task_local! {
    pub static REQUEST_SPAN: RequestSpanWrapper;
}

// here is the service implementation;
// this will be the first executed service and layer in my
// tower "sandwich\lasagna"
impl<S> Service<Request<Body>> for RequestSpanService<S>
where
    S: Service<Request<Body>, Response = Response<BoxBody>> + 'static + Send + Clone,
    <S as Service<Request<Body>>>::Future: Send + 'static,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = Pin<Box<dyn Send + Future<Output = Result<Self::Response, Self::Error>>>>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, req: Request<Body>) -> Self::Future {
        let clone = self.inner.clone();
        let mut service = mem::replace(&mut self.inner, clone);

				// here I create the `RootSpan`
        let span = self.make_span.make_span(&req);

        Box::pin(async move {
            // initialize the request span
            REQUEST_SPAN
                .scope(
                    RequestSpanWrapper(Arc::new(Mutex::new(RequestSpan::new(span)))),
                    async move {
                    		service.call(req).await
                    },
                )
                .await
        })
    }
}

My span is created similar to the below:

	...
	 tracing::span!(
                $level,
                REQUEST_SPAN_NAME,
                http.url = %request.uri(),
                http.method = %request.method(),
                hostname = %span.hostname(),
                process_id = %span.process_id().to_string(),
                // the properties field is here in order to be able to add fields to the span along a request flow
                // that is necessary because: fields associated with a span are part of its Metadata.
                // The Metadata describing a particular span is constructed statically when the span is created
                // and cannot be extended later to add new fields.
                properties = $crate::common::make_span::private::tracing::field::Empty
            )
	...

So in every other service\layer I would have to access this task's local variable - REQUEST_SPAN - and modify it. I add some (key, value) pair to the props map and then I inject that modified map into the span's empty field properties from above . Below I have my functions doing that (simplified a bit to illustrate the example):

pub fn try_record_span_prop(key: &str, value: &str) -> Result<(), ()> {
    if let Err(e) = REQUEST_SPAN.try_with(|ctx| {
		let mut req_span = ctx.0.lock().unwrap();
        req_span.props.insert(key.to_owned(), value.to_owned());
		req_span.root_span.record("properties", req_span.props.clone());
    }) {
        return Err(());
    }

    Ok(())
}

I'm not a very big fan of the result for several reasons:

  • I am not sure how that interacts with the tokio::tracing implementation which is based on the same concept of task_local to manage its spans(more in-depth testing is needed, not sure if it's worth investing the time)
  • it occurred to me that it is required to re-implement the spawning of tasks in the fashion of the tokio crate, in order to be able to automatically inject my task local variable in the async tasks that are created from the point from where the RootSpan scope is opened.
  • It is not clear at this moment( and it would require more analysis to understand) how that interacts with the default implementation inside tokio and what happens at the intersection with the tokio::tracing based logging in my dependent crates
  • to some extent it makes it easier from the CX point of view, but it also requires to always remember to inject the context in the current scope, make sure the scope lives long enough for me to be able to inject the logging information by using the custom implementation
  • it feels like I'm bending the concept of task_local as I'm using it not to have a local independent copy of some variable, but to actually share data, which is hidden behind a Arc<Mutex<>> construct, in between tasks/threads; that muddles the semantics of the code using those concepts and it brings into play the need to deal with synchronisation issues
  • Seems that context is lost for logs from the service’s dependencies, since those dependency logs will only be associated with context stored in tokio::tracing's thread local, and not my custom task local.
  • At least with how I'm currently doing it, I need shared mutability, which requires me to wrap it in an Arc<Mutex<..>> or Arc<RwLock<..>>. That means each request will have to pay the performance cost of locking. It also means developers that will maintain that code need to understand the entire system of code to avoid the potential of deadlocks, as well as cyclical references causing memory leaks. The performance of locking a Mutex or a RwLock is fairly small, so it isn't what I'd consider a huge downside, but I'm still uncomfortable using the whole Arc<Mutex<..>> pattern. As I'm not a Rust expert myself, I might also miss some of the semantics of using that whole construct at this time ...

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.