Map a function with non-overlapping window

1134 views python
3

I have a certain container in Python (say a list) with N elements: x_1, x_2, ... x_N. I want to map a certain function func() to the elements in group of k, i.e. using a window of size k. I want the window to be non-overlapping:

x_1, x_2, x_3, x_4

becomes:

func(x_1, x_2), func(x_3, x_4)

More concretely:

l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


def func(values):
    return sum(values)


grouped_map(func, l, 2)
# [1, 5, 9, 13, 17]

Eventually, I would like to do this also with NumPy arrays (with NumPy-aware functions).

answered question
Add a Comment

1 Answer

10
norok2 0 Comments

The problem can be solved for any iterable by first defining a grouped() function that just does the splitting:

def grouped(items, size):
    iterators = [iter(items)] * size
    return zip(*iterators)


l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(grouped(l, 2))
# [(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

Then, the regular map() can be used, e.g.:

def avg(values):
    return sum(values) / len(values)


list(map(sum, grouped(l, 2)))
# [1, 5, 9, 13, 17]

list(map(avg, grouped(l, 2)))
# [0.5, 2.5, 4.5, 6.5, 8.5]

For NumPy arrays, one can still use the machinery from above, but there may be faster alternatives based on the actual function to be used.

If the function is vectorized or can otherwise accept np.ndarray inputs, one can use:

def grouped_map(func, arr, size):
    return func(*(arr[i::size] for i in range(size))


arr = np.array(l)
grouped_map(lambda *x: avg(x), arr, 2)
# [0.5 2.5 4.5 6.5 8.5]

or, if the function supports an axis parameter:

def grouped_map_axis(func, arr, size):
    return func(arr.reshape(-1, size), axis=1)


# `np.mean()` is the equivalent of `avg()`
grouped_map_axis(np.mean, arr, 2)
# [0.5 2.5 4.5 6.5 8.5]

Timewise, rhe first approach is only competitive for a very small number of items, regardless of size. For larger inputs, the NumPy based methods are much faster, and if size is small, grouped_map() is fastest, while for larger size, grouped_map_axis() gets faster.

For example, for size == 5:

benchmark2

while for size == 100:

benchmark100


The above graphs were produce with the code from here using:

SIZE = 2  # or: SIZE = 100


def gen_input(n):
    return np.random.random(n * SIZE)


def equal_output(a, b):
    return np.all(np.isclose(a, b))


def my_grouped(arr):
    return np.array(list(map(avg, grouped(arr, SIZE))))


def my_grouped_map(arr):
    return grouped_map(lambda *x: avg(x), arr, SIZE)


def my_grouped_map_axis(arr):
    return grouped_map_axis(np.mean, arr, SIZE)


input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)  
funcs = my_grouped, my_grouped_map, my_grouped_map_axis


runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=gen_input, equal_output=equal_output,
    input_sizes=input_sizes)


plot_benchmarks(runtimes, input_sizes, labels)

Obviously, one may want to get a more robust solution for when the number of items is not a multiple of size, e.g.:

def grouped(items, size, truncate=False, fill=None):
    iterators = [iter(items)] * size
    if truncate:
        return zip(*iterators)
    else:
        return itertools.zip_longest(*iterators, fillvalue=fill)


def align_to_size(arr, size, truncate=False, fill=0):
    if len(arr) % size != 0:
        if truncate:
            arr = arr[:len(arr) // size * size]
        else:
            fill_arr = np.full(size - len(arr) % size, fill, dtype=arr.dtype)
            arr = np.concatenate([arr, fill_arr])
    return arr


def grouped_map(func, arr, size, truncate=False, fill=0):
    arr = align_to_size(arr, size, truncate, fill)
    return func(*(arr[i::size] for i in range(size)))


def grouped_map_axis(func, arr, size, truncate=False, fill=0):
    arr = align_to_size(arr, size, truncate, fill)
    return func(arr.reshape(-1, size), axis=1)

Timewise, for size = 100 and making sure to generate non-aligned inputs:

def gen_input(n):
    return np.random.random(n * SIZE + 1)

benchmark3


One could possibly explore, for NumPy solutions, to support an axis parameter, but this goes beyond the scope of this question / answer.

posted this

Please login first before posting an answer.