How to calculate averages in a faster way

5010 views python
3

Here is a snippet of my dataset: Data snippet

1:

I tried to embed a picture of my data here but I think I'm not allowed to do that yet.

There are several rows for each movieId having different ratings given by different userIds. I want to get an average rating for each movieId.

Here is the approach I tried:

rat_1 = pd.DataFrame()

for i in range(0,len(k)): # k is a list containing all the unique movieIds
    
    rat_2 = rating[rating['movieId']==k[i]] # Taking a subset of the original dataframe containing rows only of
                                            # the specified movieId 
    
    rat_2['rating']=sum(rat_2['rating'])/len(rat_2) # Calculating average rating
    

    
    rat_1 = pd.concat([rat_1,rat_2]) # Appending the subset dataframe to a new dataframe

However, the file is fairly big (about 660 MB) because of which the code is taking too long to execute. Is there a faster way to do this?
Thank you in advance!
P.S. This is the first time I'm posting a question here so I apologize if my doubt is not clear enough.

answered question

Welcome to stackoverflow. Please post your code directly into your question rather than using images

@Chris, okay I'll keep that in mind from the next time.

1 Answer

9

You should use groupby and mean.

df.groupby("movieId")['rating'].mean()

posted this

Have an answer?

JD

Please login first before posting an answer.

Ads

Categories