after hitting esc, the cursor is "ungrabbed", but the cursor is still over the window
this sounds like you don't want an actual grab, as a cursor grab either confines the cursor to the window region (CursorGrabMode::Confined, not supported on web) or locks it to a single position within the window (CursorGrabMode::Locked). I'm not certain what you actually want; it sounds like what you want is just to hide the cursor and keep the active window focused?
@zeroexcuses is not describing a desired behavior, but the inherent and mandatory behavior of web browsers; they allow pointer lock to be canceled by the user unconditionally, so that a page cannot prevent the user from using the mouse elsewhere.
This should be possible just with the winit focus events, I would think. If the game canvas is focused, (attempt to lock and hide the cursor and) allow the game to be controlled, and when the game canvas is unfocused, (potentially pause the game and) don't attempt to capture any inputs (other than to refocus).
As a user, if focus and input capture get out of sync, it's typically surprising and "why did they do it this way." Though whether my user idea of focus matches the DOM focus model (and what winit reports), I don't know... I would need to test both the typical flow and some edge cases before being properly confident in how the various state events interact with user intent.
Focus is a separate state from pointer lock. I think you're imagining a world where the application can be fully in charge of (in winit terminology) the cursor grab mode, but that's not true on the web, because the platform has to provide users ways to recover from malicious code trying to hog the user input.
when the game canvas is unfocused, (potentially pause the game and) don't attempt to capture any inputs (other than to refocus).
This is not sufficient, because pointer lock can be cancelled any time by user action (escape key), and that action is not also a focus-changing action; the application must deal with that, usually by exiting the mode where it wants pointer lock (which might include pausing).
Pointer lock isn't a focus loss because if it were, where should the focus be instead? The user intent is unspecified, so focus stays where it is until moved by click or tab.
Maybe I'm misremembering how tab-focus works, but I'm expecting ESC to defocus the focused element, e.g. swapping my arrow keys from manipulating the input (if relevant) back to manipulating the page and removing the element focus styling. With a following TAB still selecting the next element from wherever the navigation focus was prior.
Thinking more about it, though, there is a difference between "navigation focus" and "edit focus" for e.g. text areas, and moving from the latter to the former doesn't remove focus styling.
I do use keyboard navigation some, but not extensively, obviously. Maybe it's a bit naive, but I'd personally expect winit's focus event to be the "edit focus" layer, since that's the better analogue for OS window focus than "navigation focus" and better matches the behavior of native elements, IMHO.
The analogue to "navigation focus" on the desktop, in as much as it exists, would be selecting a window in ALT-TAB but before releasing ALT and giving the window "edit focus". I don't know if any desktop window manager gives an event for that, tbh.
Browsers and platforms do have various approaches to keyboard navigation of the page, and I can't say what all of them are, but as a fact about “Web APIs”: only a single concept of “focus” is exposed to the page scripting. If in some cases there are two kinds as you describe, the page can't tell.
And, as far as I know the pointer lock cancel action (which also cancels fullscreen, for similar reasons) doesn't change that single focus; at least, it doesn't on my current environment (Chrome, macOS), so that's an existence proof that pages cannot assume that focus events will fire in that case.