There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) for each NaN line in this column, find a non-NaN line which has similar values in the pulse or respiratory_rate columns (the difference range is 20%) 3) replace NaN with values from the corresponding rows
def my_f(col_name, df): # get the rows for which the column is NaN df = df[ np.isnan(df[col_name]) ] # find the rows that have similar pulse df_pulse = df[ abs( (df.pulse - df.pulse.shift(1)) / df.pulse ) <= 0.2 ] # find the rows that have similar respiratory_rate df_respiratory_rate = df[ abs( (df.respiratory_rate - df.respiratory_rate.shift(1)) / df.respiratory_rate ) <= 0.2 ] # merge the dataframes df_merged = pd.concat( [df_pulse, df_respiratory_rate] ) # get rid of duplicates df_merged = df_merged.drop_duplicates() # return the rows return df_merged