In the code below I try to use shift(1).over("id")
to shift (lag) the values of a series by one row. Without over it seems to work fine (see lag_val column), but when I add over (which I need because each column contains more than one id), I get 2 lags instead of 1 (see lag_1_val). If I use shift(2), I get 3 lags instead of 2 (lag_2_val). Could someone please point out what I am doing wrong?
df: shape: (12, 5)
┌─────┬─────┬─────────┬───────────┬───────────┐
│ id ┆ val ┆ lag_val ┆ lag_1_val ┆ lag_2_val │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i32 ┆ i32 ┆ i32 ┆ i32 │
╞═════╪═════╪═════════╪═══════════╪═══════════╡
│ a ┆ 1 ┆ null ┆ null ┆ null │
│ a ┆ 2 ┆ 1 ┆ null ┆ null │
│ a ┆ 3 ┆ 2 ┆ 2 ┆ null │
│ a ┆ 4 ┆ 3 ┆ 3 ┆ 2 │
│ … ┆ … ┆ … ┆ … ┆ … │
│ c ┆ 1 ┆ 14 ┆ 13 ┆ 11 │
│ c ┆ 2 ┆ 1 ┆ 2 ┆ 13 │
│ c ┆ 3 ┆ 2 ┆ null ┆ null │
│ c ┆ 4 ┆ 3 ┆ null ┆ null │
└─────┴─────┴─────────┴───────────┴───────────┘
/*
[dependencies]
polars = { version = "*", features = ["lazy"] }
*/
use polars::prelude::*;
fn main() {
let df = df!(
"id" => &["a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c"],
"val" => &[1, 2, 3, 4, 11, 12, 13, 14, 1, 2, 3, 4]);
let d = df.unwrap().clone().lazy()
.with_columns([
col("val").shift(1).prefix("lag_"),
col("val").shift(1).over("id").prefix("lag_1_"),
col("val").shift(2).over("id").prefix("lag_2_"),
])
.collect()
.unwrap();
}