There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.
df = pd.DataFrame(data = [[1, 31, 2.5, 1260759144], [1, 1029, 3.0, 1260759179], [1, 1061, 3.0, 1260759182], [1, 1129, 2.0, 1260759185], [1, 1172, 4.0, 1260759205]], columns = ['userId', 'movieId', 'rating', 'timestamp']) def average_lifetime(df): ''' df: input dataframe ''' df_max = df.groupby(['userId']).max() df_min = df.groupby(['userId']).min() df_final = pd.merge(df_max, df_min, on = ['userId'], suffixes = ('_max', '_min')) df_final['average_lifetime'] = df_final['timestamp_max'] - df_final['timestamp_min'] return df_final df = average