Possible to create a full screen overlay that catches clicks, but does not come into focus?

Hi, conceptual question here whether this can be pulled off in Rust.

Background

I'm interested in creating assistive technology for language learning. For this purpose I want to create an overlay of the size of the whole screen that draws something on top, like an overlay on top of your monitor.

Coming from a Python background and beginner in Rust, but interested in diving deeper into Rust through a project of interest.

Technical requirements

  • Mouse click coordinates are picked-up and possibly trigger some action depending on where is clicked.
  • The mouse clicks are still forwarded to the underlying application (e.g. browser, text editor, etc), so your normal computer usage is not impacted as the overlay will never be in focus (my Rust program won't interact with applications below the overlay).
  • Works on either Linux (Ubuntu) or Windows (potentially Android in the future)

Why Rust

I want to use a Deep Neural Network trained in Python for some analysis. Python is hard to package, so this model can be exported to libtorch (C++). However, I rather dive into Rust than C++ out of personal interest. tch-rs allows for such exported models to be used in Rust.

Is a screen overlay possible?

  • I was wondering if these is doable in Rust, either in Rust natively compiled for Windows/Linux with an UI library like egui? or the WASM road?
  • If so, any pointers where to get started are highly appreciated!

I can't speak to other platforms, but I can talk about how I'd approach this on Windows, which is likely to be similar on other platforms.

The parts you need are:

  • Creating a transparent window for the overlay
  • Drawing transparent content into said window (most apis will fill the entire background as a side effect!)
  • Overriding the window hit-test to report the overlay as being entirely hit-transparent, so underlying windows get mouse events
  • Hooking raw mouse events, to be able to still handle mouse events when your window is entirely hit-transparent.
  • Dispatching said hooked window events into whatever widget/UI library is drawing your UI.

So maybe you can see that some of this depends on why you need an overlay, for example if you just need to track events over the whole screen, but only draw regular opaque controls that block the mouse like usual, then you don't actually need the transparent window at all, just the hook and a regular window!

In practice, actually implementing this is quite a bit tricky, using Rust or not. You might as well use Rust, since it's going to be a ton of fiddly code and most libraries out there (windowing, rendering or gui) will not work, so you might as well use a nice language.

If you want to draw to the overlay and still click through, you will need to use WS_EX_LAYERED | WS_EX_TRANSPARENT - the former alone lets you create partially transparent windows "officially", and either set a color key to be transparent or use an alpha channel, depending on what you call SetLayeredWindowAttributes() with. Either way, this lets opaque parts of the window (or partially transparent with alpha) receive mouse events, and everything else falls through to the window behind. With WS_EX_TRANSPARENT as well, all mouse events fall through to the window behind, and you never get the event yourself!

However this only gets you a transparent window. You still need to receive mouse input and draw transparently to the window, ideally with a windowing library.

To hook mouse input even for other windows, you have two options:

  • Use RegisterWindowsHookExW() with WH_MOUSE_LL, which will interrupt all processes GetMessage()'s that are about to receive mouse events, switch context to your app, call the hook, then return back to the other app. You should be very careful to process the event as quickly as possible: ideally post it to a queue to handle later.
  • RegisterRawInputDevices() with usUsagePage: HID_USAGE_PAGE_GENERIC, usUsage: HID_USAGE_GENERIC_MOUSE, dwFlags: RIDEV_INPUTSINK, hwndTarget: hwnd, which will deliver WM_INPUT messages to hwnd that you will need to decode - this is great if you want to know exactly what happened, as the events are completely raw, but terrible if you want to know where the cursor is, as the events are completely raw. You get the location as either a delta from the last event, or a normalized 0..65535 x and y on the screen (e.g. for pen/touch input?). It also doesn't apply mouse acceleration. This is terrible for a UI trying to match the cursor, but it's great for game input!

To draw transparently to the window, in my experience, you have to use Direct3D or a layering API like Direct2D; GDI, OpenGL and Vulkan all can't write the alpha channel (fairly arbitrarily: they all understand alpha).

egui would in theory work, but you'd have to do something like this and replace the winit platform to adapt the backend to use your custom window: egui_example/main.rs at master · hasenbanck/egui_example · GitHub

I've got some of this going, but no actual rendered output just yet. Hopefully this gets you at least part way there for now, and I'll probably keep poking at this over the weekend.

2 Likes

@simonbuchan Thank you so much for this detailed breakdown!
I wanted to get an impression of how easy/difficult this was going to be, and now I have a better grasp of that. Will dive a bit more into Rust first before attempting this.

I wanted a reason to really learn Rust, so glad to know it's a similar effort for many languages.

This is good to know. Would see myself struggling here to find out why it's not working.

This looks tricky in case I'm catching the mouse event of a full screen app (e.g. game). My guess is that it might minimize the app because the context is switched?

Where does the number 65535 come from? I could understand it being normalized 0..1.

:upside_down_face:

It was really helpful! I'm not in a hurry and it's more like, I want to do this someday, so no need to rush for me. If you figure something out though, please share it back here!

No, just thread context. Windows calls that foreground/focus switching "window activation". What I meant here was that any code you write here is going to slow down every application. Try to make sure you capture what you need and return as quickly as possible, eg push to a local event queue and request an update, rather than redrawing immediately.

Keep in mind this is a low level API from the late nineties or so, so they are trying to keep things cheap, and floating point was fast slower then: 65535 is the largest 16 bit integer: 0xFFFF, so it's u16::MIN_VALUE to u16::MAX_VALUE.