There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.
import pandas as pd ratings = pd.read_csv('ratings.csv') def aver_lifetime(data): data['date'] = pd.to_datetime(data['timestamp'], unit='s').dt.date lifetimes = data.groupby('userId')['date'].agg(['min', 'max']) lifetimes['lifetime'] = lifetimes['max'] - lifetimes['min'] return lifetimes['lifetime'].mean() aver_lifetime(ratings)