# Aggregating windows into arrays in pandas DataFrame

I have a data fame like so:

```
df = pd.DataFrame({"a": [1,2,3], "b": [4,5,6], "c": [7,8,9]})
a | b | c
1 | 4 | 7
2 | 5 | 8
3 | 6 | 9
```

And I would like to get one like so:

```
a | b | c
[1,2] | [4,5] | [7,8]
[2,3] | [5,6] | [8,9]
```

So I have tried the most obvious thing: `df.rolling(2).apply(lambda values: np.array(values))`

which unfortunately is not working as `rolling().apply`

strictly expects a scalar (float) as a return type.

So I was playing around with comprehensions.

```
window = 2
df = pd.DataFrame({"a": [1,2,3], "b": [4,5,6], "c": [7,8,9]})
df = pd.DataFrame({column:[df[column].iloc[i-window:i].values for i in range(window, len(df)+1)] for column in df})
```

This is correct but it looks ugly and is really slow. Also it looses the index type which used to be a date (now int). Is there any better, cleaner way?

Daniel Mesejo
answered question

### 1 Answer

Using the get_sliding_window_function, you could do something like this:

```
import pandas as pd
from numpy.lib.stride_tricks import as_strided as strided
def get_sliding_window(df, W, return2D=0):
a = df.values
s0, s1 = a.strides
m, n = a.shape
out = strided(a, shape=(m - W + 1, W, n), strides=(s0, s0, s1))
if return2D == 1:
return out.reshape(a.shape[0] - W + 1, -1)
else:
return out
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
result = pd.DataFrame(data=[list(zip(*r)) for r in get_sliding_window(df, 2)], columns=df.columns.values)
print(result)
```

**Output**

```
a b c
0 (1, 2) (4, 5) (7, 8)
1 (2, 3) (5, 6) (8, 9)
```

If the output must be a list, you could do the following:

```
def row(r):
return list(map(list, zip(*r)))
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
result = pd.DataFrame(data=[row(r) for r in get_sliding_window(df, 2)], columns=df.columns.values)
print(result)
```

**Output**

```
a b c
0 [1, 2] [4, 5] [7, 8]
1 [2, 3] [5, 6] [8, 9]
```

**UPDATE**

You could drop the usage of list, map and zip by directly leveraging numpy, like this:

```
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
result = pd.DataFrame(data=[r.T.tolist() for r in get_sliding_window(df, 2)], columns=df.columns.values)
print(result)
```

Daniel Mesejo
posted this

## Have an answer?

JD