Here is a snippet of my dataset: Data snippet
I tried to embed a picture of my data here but I think I'm not allowed to do that yet.
There are several rows for each movieId having different ratings given by different userIds. I want to get an average rating for each movieId.
Here is the approach I tried:
rat_1 = pd.DataFrame() for i in range(0,len(k)): # k is a list containing all the unique movieIds rat_2 = rating[rating['movieId']==k[i]] # Taking a subset of the original dataframe containing rows only of # the specified movieId rat_2['rating']=sum(rat_2['rating'])/len(rat_2) # Calculating average rating rat_1 = pd.concat([rat_1,rat_2]) # Appending the subset dataframe to a new dataframe
However, the file is fairly big (about 660 MB) because of which the code is taking too long to execute.
Is there a faster way to do this?
Thank you in advance!
P.S. This is the first time I'm posting a question here so I apologize if my doubt is not clear enough.