# Map a function with non-overlapping window

1134 views
7

I have a certain container in Python (say a `list`) with `N` elements: `x_1, x_2, ... x_N`. I want to map a certain function `func()` to the elements in group of `k`, i.e. using a window of size `k`. I want the window to be non-overlapping:

``````x_1, x_2, x_3, x_4
``````

becomes:

``````func(x_1, x_2), func(x_3, x_4)
``````

More concretely:

``````l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

def func(values):
return sum(values)

grouped_map(func, l, 2)
# [1, 5, 9, 13, 17]
``````

Eventually, I would like to do this also with NumPy arrays (with NumPy-aware functions).

10

The problem can be solved for any iterable by first defining a `grouped()` function that just does the splitting:

``````def grouped(items, size):
iterators = [iter(items)] * size
return zip(*iterators)

l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(grouped(l, 2))
# [(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
``````

Then, the regular `map()` can be used, e.g.:

``````def avg(values):
return sum(values) / len(values)

list(map(sum, grouped(l, 2)))
# [1, 5, 9, 13, 17]

list(map(avg, grouped(l, 2)))
# [0.5, 2.5, 4.5, 6.5, 8.5]
``````

For NumPy arrays, one can still use the machinery from above, but there may be faster alternatives based on the actual function to be used.

If the function is vectorized or can otherwise accept `np.ndarray` inputs, one can use:

``````def grouped_map(func, arr, size):
return func(*(arr[i::size] for i in range(size))

arr = np.array(l)
grouped_map(lambda *x: avg(x), arr, 2)
# [0.5 2.5 4.5 6.5 8.5]
``````

or, if the function supports an `axis` parameter:

``````def grouped_map_axis(func, arr, size):
return func(arr.reshape(-1, size), axis=1)

# `np.mean()` is the equivalent of `avg()`
grouped_map_axis(np.mean, arr, 2)
# [0.5 2.5 4.5 6.5 8.5]
``````

Timewise, rhe first approach is only competitive for a very small number of items, regardless of `size`. For larger inputs, the NumPy based methods are much faster, and if `size` is small, `grouped_map()` is fastest, while for larger `size`, `grouped_map_axis()` gets faster.

For example, for `size == 5`:

while for `size == 100`:

The above graphs were produce with the code from here using:

``````SIZE = 2  # or: SIZE = 100

def gen_input(n):
return np.random.random(n * SIZE)

def equal_output(a, b):
return np.all(np.isclose(a, b))

def my_grouped(arr):
return np.array(list(map(avg, grouped(arr, SIZE))))

def my_grouped_map(arr):
return grouped_map(lambda *x: avg(x), arr, SIZE)

def my_grouped_map_axis(arr):
return grouped_map_axis(np.mean, arr, SIZE)

input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_grouped, my_grouped_map, my_grouped_map_axis

runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=gen_input, equal_output=equal_output,
input_sizes=input_sizes)

plot_benchmarks(runtimes, input_sizes, labels)
``````

Obviously, one may want to get a more robust solution for when the number of items is not a multiple of `size`, e.g.:

``````def grouped(items, size, truncate=False, fill=None):
iterators = [iter(items)] * size
if truncate:
return zip(*iterators)
else:
return itertools.zip_longest(*iterators, fillvalue=fill)

def align_to_size(arr, size, truncate=False, fill=0):
if len(arr) % size != 0:
if truncate:
arr = arr[:len(arr) // size * size]
else:
fill_arr = np.full(size - len(arr) % size, fill, dtype=arr.dtype)
arr = np.concatenate([arr, fill_arr])
return arr

def grouped_map(func, arr, size, truncate=False, fill=0):
arr = align_to_size(arr, size, truncate, fill)
return func(*(arr[i::size] for i in range(size)))

def grouped_map_axis(func, arr, size, truncate=False, fill=0):
arr = align_to_size(arr, size, truncate, fill)
return func(arr.reshape(-1, size), axis=1)
``````

Timewise, for `size = 100` and making sure to generate non-aligned inputs:

``````def gen_input(n):
return np.random.random(n * SIZE + 1)
``````

One could possibly explore, for NumPy solutions, to support an `axis` parameter, but this goes beyond the scope of this question / answer.

posted this