# Python Pandas Group Recursion

2244 views
-1

The question I have is closely related to this post. Assume I have the following dataset:

``````df = pd.DataFrame({"A":range(1,10), "B":range(5,14), "Group":
[1,1,2,2,2,2,3,3,3],"C":[0,0,10,0,0,16,0,0,22], "last":[0,1,0,0,0,1,0,0,1],
"Want": [19.25,8,91.6,71.05,45.85,16,104.95,65.8,22]})
``````

The last observation for the group is straight forward. This is how the code looks like:

``````def calculate(df):
if (df.last == 1):
value = df.loc["A"] + df.loc["B"]
else:
``````

for all other observation PER GROUP, the row value is calculated as follows:

``````value = (df.loc[i-1, "C"] + 3 * df.loc[i, "A"] + 1.65 * df.loc[i, "B"])
return value
``````

To further clarify, these are the formulas for calculating the Want column for Group 2 using excel: F4="F5+(3*A4)+(1.65*B4)", F5="F6+(3*A5)+(1.65*B5)", F6="F7+(3*A6)+(1.65*B6)", F7="A7+B7". There's some kind of "recursive" nature to it, which is why I thought of the "for loop"

I would really appreciate a solution where it's consistent with the first if statement. That is

``````value = something
``````

rather than the function returning a data frame or something like that, so that I can call the function using the following

``````df["value"] = df.apply(calculate, axis=1)
``````

12

You don't need `apply` here. Usually, `apply` is very slow and you'll want to avoid that.

Problems with this recursive characteristic, however, are usually hard to vectorize. Thankfully, yours can be solved using `cumsum` and `np.where`

``````df['Want'] = np.where(df['last'] == 1, df['A'] + df['B'], 3*df['A'] + 1.65*df['B'])
df['Want'] = df[::-1].groupby('Group')['Want'].cumsum()
``````

posted this