Data prefetching algorithms

Say I have two "threads"/tasks/jobs/whatever.

An IO bound (gets data from a DB) and a CPU bound (crunches the data).

I want the IO thread to prefetch data for the CPU thread to work on.

Ideally it fetches no more than the CPU is capable of crunching, but always has enough in the reserve so that the CPU never has to wait for an IO fetch to complete.

I'm sure this must be a common problem, with theoretical solutions/algorithms, but I don't know the name of it, or couldn't find any information about it.

One strategy is to use a channel with backpressue.

Which will work... the IO task will fill up some statically determined buffer limit and then wait before progressing. But it can not adjust in realtime to the demands of the CPU task.

I wonder if there is an known dynamic algorithm (or library) to solve this problem, or if there is an optimal mathematical algorithm for determining the buffer size... I.e. i'm sure it has to do something with the average amount of time it takes the CPU task to crunch 1 unit, and the average amount of time it takes the IO to fetch 1 unit... but i'm a little too dumb to see the answer :smirk:

Any advice?

Thanks!

It sounds like you want a feedback control algorithm that does something like this:

  1. Prefetcher guesses how much database rows the worker can handle per second, and fetches this amount (say X rows / sec), repeatedly putting this many rows in a variable-size buffer.

  2. Worker starts processing data.

  3. Prefetcher monitors the average size of the task buffer and the idle time of the worker thread (idle time might have to be communicated by the worker thread over a message channel). Ideally, both numbers should be as close to 0 as possible.

  4. If Idle Time >> 0, i.e. Task Buffer queue length is often 0, then increase X, e.g. by adding a fixed amount, X = X + 1.

  5. If Idle Time == 0 and Task Buffer queue length >> 0, then decrease X, e.g. X = X / 2.

This is similar in principle to the AIMD algorithm used in TCP congestion control to control data transfer speeds over the Internet.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.