There is a Pandas dataframe: Last Updated Installs 7479 2010-05-21 100000 7430 2011-01-30 50000 10282 2011-03-16 100000 8418 2011-04-11 5000000 8084 2011-04-16 50000 9067 2011-04-18 50000 5144 2011-05-12 100 7237 2011-06-23 1000 10460 2011-06-26 5000 1544 2011-06-29 1000000 7080 2011-07-10 5000000 8200 2011-09-20 50000 5561 2011-09-22 1000000 Write a function that creates a 'year' column with only the year from the 'Last Updated' column (which contains dates in 'Timestamp' object format) added to it
data = [['2010-05-21',100000], ['2011-01-30',50000], ['2011-03-16',100000], ['2011-04-11',5000000], ['2011-04-16',50000], ['2011-04-18',50000], ['2011-05-12',100], ['2011-06-23',1000], ['2011-06-26',5000], ['2011-06-29',1000000], ['2011-07-10',5000000], ['2011-09-20',50000], ['2011-09-22',1000000]] df = pd.DataFrame(data, columns = ['Last Updated', 'Installs']) def date_to_year(a): return pd.to_datetime(a).year df['year'] = df.apply(lambda x: date_to_year(x['Last Updated']), axis=1)