Get current directory Read a dataframe called "global_dataframe" from a csv file called: "Sum Data.csv". Group by the index "Volt" and get mean information of indexes "NEVs" and "MWhT" . Generate a table of 1000 samples of sub lists componed by 10% of the rows of "global dataframe chosen randomly, storing the rows choosen. Group by the index "Volt" and get mean information of indexes "NEVs" and "MWhT" per each sample, as done for the "global_dataframe". Calculate the gap beetwen each sample information grouped with "global_dataframe" information grouped, organizing the information for minimum to maximum absolute gap. Report into a text file: absolute gap, detailed gap, the rows choosen per each sample; delimited by ",".
import os import numpy as np import pandas as pd #get current directory def get_current_dir(): return os.path.dirname(__file__) #read dataframe def read_dataframe(file_name): return pd.read_csv(file_name) #group by and get information def group_by_get_info(dataframe, by_index, get_indexes): return dataframe.groupby(by_index)[get_indexes].mean() #get random samples from dataframe function def get_random_samples(dataframe, num_samples, sample_size): samples = [] indexes_choosen = [] for i in range(num_samples): #get random indexes indexes = np.random.choice(dataframe.shape[0], size=sample_size) #get random sample sample = dataframe.iloc[indexes, :] #store random sample samples.append(sample) #store