HTTP parser optimization

I have an HTTP parser and here is its code. What do you think could be improved, changed or added to it?

use crate::*;

#[derive(Debug, Clone, PartialEq, Eq)]
/// Request.
pub struct Request {
    /// Request method.
    pub method: HttpMethod,
    /// Request URL.
    pub url: String,

    /// Host.
    pub host: Option<SocketAddr>,

    /// Cookies.
    pub cookie: HashMap<String, String>,
    /// Additional Content.
    pub add_content: HashMap<String, String>,
    /// Last line of request.
    pub last_line: String,
}

/// Functions for parsing HTTP request in [Request].
impl FromStr for Request {
    type Err = bool;

    #[inline]
    /// Function for parsing a request
    /// * data = HTTP request.
    /// # Examples
    /// ```
    /// const DATA: &str = "GET /response HTTP/1.1 \r\nHost: 127.0.0.1:443 \r\nCookie: net=qwe";
    /// DATA.parse::<Request>().unwrap();
    /// ```
    fn from_str(data: &str) -> Result<Request, bool> {
        let mut split_line: Vec<&str> = data.lines().collect();

        let muh: Vec<&str> = split_line.first().ok_or(false)?.split_whitespace().collect();
        let (method, mut url, last_line) = (
            muh.first().ok_or(false)?,
            muh.get(1).ok_or(false)?.to_string(),
            split_line.pop().ok_or(false)?.to_string(),
        );

        let host = split_line.iter()
            .find(|line| line.starts_with("Host: "))
            .map(|host_line| host_line.trim_start_matches("Host: ").to_socket_addrs())
            .and_then(|addr| addr.ok().and_then(|mut addrs| addrs.next()));

        let cookie = split_line.iter()
            .find(|line| line.starts_with("Cookie: "))
            .map(|cookie_line| Self::get_data(cookie_line.trim_start_matches("Cookie: "), "; "))
            .unwrap_or_default();

        let add_content = if !last_line.contains(": ") {
            Self::get_data(&last_line, "&")
        } else if let Some(index) = url.find('?') {
            Self::get_data(&url.split_off(index + 1), "&")
        } else {
            HashMap::new()
        };

        Ok(Request {
            method: method.parse()?,
            url,

            host,

            cookie,
            add_content,
            last_line,

        })
    }
}

impl Request {
    #[inline]
    /// Function for parsing a string in a [HashMap].
    /// * data = Parsing string.
    /// * char_split = Divide symbol.
    /// # Examples
    /// ```
    /// const DATA: &str = "net=qwe&qwe=qwe&asd=asd";
    /// Request::get_data(DATA, "&").unwrap();
    /// ```
    pub fn get_data(data: &str, char_split: &str) -> HashMap<String, String> {
        data.split(char_split)
            .filter_map(|part| {
                let mut split = part.splitn(2, '=');

                if let (Some(key), Some(value)) = (split.next(), split.next()) {
                    Some((String::from(key.trim()), String::from(value.trim())))
                } else {
                    None
                }
            })
            .collect()
    }
}

//

#[derive(Debug, Clone, PartialEq, Eq)]
/// HTTP method. Information taken from [the site](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods).
pub enum HttpMethod {
    /// The GET method requests a representation of the specified resource. 
    /// Requests using GET should only retrieve data and should not contain a request content.
    Get,
    /// The HEAD method asks for a response identical to a GET request, but without a response body.
    Head,
    /// The POST method submits an entity to the specified resource, 
    /// often causing a change in state or side effects on the server.
    Post,
    /// The PUT method replaces all current representations of the target resource with the request content.
    Put,
    /// The DELETE method deletes the specified resource.
    Delete,
    /// The CONNECT method establishes a tunnel to the server identified by the target resource.
    Connect,
    /// The OPTIONS method describes the communication options for the target resource.
    Options,
    /// The TRACE method performs a message loop-back test along the path to the target resource.
    Trace,
    /// The PATCH method applies partial modifications to a resource.
    Patch,
}

impl FromStr for HttpMethod {
    type Err = bool;

    #[inline]
    /// Function for parsing a request
    /// * data = HTTP request.
    /// # Examples
    /// ```
    /// const DATA: &str = "GET";
    /// DATA.parse::<HttpMethod>().unwrap();
    /// ```
    fn from_str(data: &str) -> Result<HttpMethod, bool> {
        match data {
            "GET" => Ok(HttpMethod::Get),
            "HEAD" => Ok(HttpMethod::Head),
            "POST" => Ok(HttpMethod::Post),
            "PUT" => Ok(HttpMethod::Put),
            "DELETE" => Ok(HttpMethod::Delete),
            "CONNECT" => Ok(HttpMethod::Connect),
            "OPTIONS" => Ok(HttpMethod::Options),
            "TRACE" => Ok(HttpMethod::Trace),
            "PATCH" => Ok(HttpMethod::Patch),
            _ => Err(false)
        }
    }
}

Your Request type is stringly typed, which is mostly considered an antipattern.

What exactly is the problem? I don't see any problems with this approach.

No? Too bad. HTTP defines a finite set of valid methods while your data structure allows any string to pass as an HTTP method.

Also "¶ŧ←←" is not a valid URL. Though your Request would accept it as one.

And so on, and so forth...

Also parse_to_self() could be implemented more idiomatically by implementing FromStr, which actually would allow for let request: Request = string.parse()?;.

2 Likes

This assumes that the HTTP request will be fine.

Sometimes you have to sacrifice something for the sake of something, in this case strong typing, for the sake of performance.

After all, when trying to add strict typing to the parser, performance may drop, which is unacceptable in my requirements.

And thanks for this idea, I never thought about it.

May...

What benchmarks have you run? What measurements have you taken?
I am inclined to assume that a plain enum may outperform a heap-allocated string.

4 Likes

First, 10_000 Tcptream are connected to the server, after which a request is sent and the response is read, like this for each Tcptream. Exactly this period of time is measured, and so on for 10 laps.

According to my configuration:

  • Intel Pentium n3540 2.16ghz
  • 4 gigs of RAM.
  • Windows 7

Gives from 3 to 0.8 seconds per 1 lap.

Currently you are copying the strings you receive, including allocating memory for each new string. That is going to at least require iterating through the string anyway, so it seems unlikely to be faster than the code that you would get just matching on the string, and future uses of the parsed value would benefit from the fact it has been reduced to a single small value.

And then for what items do I need to do this?

In fact, storing the HTTP type does not matter, since it is at least tailored for http/1.1.

What is the idea of ​​using SocketAddr instead of String in host? I think this will be much better than passing String.