null Profile | Programming Helper

All generations.

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral') plt.title('Comparison of Nutrient and Mineral Content') plt.xlabel('Nutrient|Mineral') plt.ylabel('Amount') plt.legend() plt.show()

Python

generate functionWed, 21 Dec 2022

Change the class so that self.seniority defaults to 1 and self.intlawards = 2 class Designer(Employee): def init(self, name, seniority, awards): super().init(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python

generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def find_duplicates(lst): new_lst = [] for i in lst: if i[0] not in new_lst: new_lst.append(i[0]) else: i[0] = "9090" return new_lst find_duplicates(lst)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

df_new['incident_type'].value_counts()[df_new['incident_type'].value_counts() >= 5].sort_values(ascending = False)

Python

generate functionTue, 13 Dec 2022

A string with words and numbers separated by spaces (one space between words and/or numbers). The words consist only of letters. You need to check if there are three words in a row in the original string. For example, the line "start 5 one two three 7 end" has three words in a row. Input: A string with words (str). Output: Answer as a boolean expression (bool), True or False. Examples: checkio("Hello World hello") == True checkio("He is 123 man") == False checkio("1 2 3 4") == False checkio("bla bla bla bla") == True

def checkio(str): cnt = 0 for i in str.split(): cnt = cnt + 1 if i.isalpha() else 0 if cnt >= 3: return True return False

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('data.csv') df.sort_values('perc_of_5star', inplace=True) df.reset_index(inplace=True) plt.hlines(y=df.decade, xmin=0, xmax=df.perc_of_5star, color='skyblue') plt.gca().invert_yaxis() plt.show()

Python

generate functionSat, 25 Feb 2023

Write code that finds all rows in the rate_group column of Pandas with "%" at the end and removes it with regex. Rows without "%" must be replaced with np.nan.

import numpy as np import pandas as pd df=pd.DataFrame(data={'rate_group':['A%','B%', 'C']}) df['rate_group'] = df['rate_group'].str.replace('%', '') df['rate_group']=df['rate_group'].replace(r'^\s*$', np.nan, regex=True) print(df['rate_group'])

Python

generate functionFri, 23 Dec 2022

Write a simple regex to validate a username. Allowed characters are: lowercase letters, numbers, underscore Length should be between 4 and 16 characters (both included).

def validate_usr(username): import re if len(username) < 4 or len(username) > 16: return False else: return bool(re.match("^[a-z0-9_]*$", username))

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

import pandas as pd sl = [0.05, 0.4, 0.5, 0.95] sw = [0.7, 0.8, 0.3, 0.9] data = {"sl": sl, "sw": sw}

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def name_list(group, dct): lst = [] for key, value in dct.items(): if value[-1] == group: lst.append(value[0] + ' ' + value[1] + ' ' + value[2]) lst.sort() for i in range(len(lst)): print('{}. {}'.format(i + 1, lst[i])) name_list('BST161', dct)

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(students, stnums): for student in students: if student[0] in stnums: student[0] = replacing_num return students

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In your answer, specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers.

from math import sqrt def conf_interval(n, mean, sig, conf): z_value = abs(conf - mean) sq_n = sqrt(n) interval = z_value / sq_n return interval conf_interval(3, 1, 2, 1)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

# Solution 1 df.groupby(['userId'])['rating'].sum().sort_values(ascending = False).head() # Solution 2 df.groupby(['userId'])['rating'].sum().sort_values(ascending = False).head().reset_index()

Python

generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

def goes_after(word, first, second): for i in range(len(word) - 1): if word[i] == first and word[i+1] == second: return True return False

Python

generate functionTue, 10 Jan 2023

Solve the system of equations with numpy and linalg: 4x + 2y + z = 4 x + 3y = 12 5y + 4z = -3

import numpy as np def system_solver(a, b, c, d, e, f): return np.linalg.solve(a, b) a = np.array([[4, 2, 1], [1, 3, 0], [0, 5, 4]]) b = np.array([4, 12, -3]) system_solver(a, b)

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) for each NaN line in this column, find a non-NaN line which has similar values in the pulse or respiratory_rate columns (the difference range is 20%) 3) replace NaN with values from the corresponding rows

def fill_na(df, col_name='rectal_temp', range_thresh=0.2): col_vals = df[col_name] col_vals_notnull_ind = col_vals.notnull() col_vals_notnull_ind = col_vals_notnull_ind.values[:, np.newaxis] col_vals_notnull = col_vals.loc[col_vals_notnull_ind].values col_vals_notnull_idx = col_vals_notnull_ind.nonzero()[0] for i in range(0, col_vals.shape[0]): if not col_vals_notnull_ind[i]: for j in range(0, col_vals_notnull.shape[0]): if (col_vals_notnull[j] - col_vals_notnull[j] * range_thresh) < col_vals[i] < (col_vals_notnull[j] + col_vals_notnull[j] * range_

Python

generate functionMon, 26 Dec 2022

Convert the data so that it is represented as a dictionary in which the keys are student numbers and the values are a list of other student data. students = [ ["0001", "Antonov", "Anton", "Igorevich", "20.08.2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def convert_to_dict(students): students_dict = {} for student in students: students_dict[student[0]] = student[1:] return students_dict convert_to_dict(students)

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: index title rating year decade 0 0 "BLOW THE NIGHT!" Let's Spend the Night Togeth... 3.000000 1983 1980-1990 1 1 "Great Performances" Cats (1998) 2.896648 1998 1990-2000 2 2 #1 Cheerleader Camp (2010) 2.111111 2010 2010-2020 3 3 #Captured (2017) 3.750000 2017 2010-2020 4 4 #Female Pleasure (2018) 3.666667 2018 2010-2020 ... ... ... ... ... ... 58953 58953 …And the Fifth Horseman Is Fear (1965) 3.250000 1965 1960-1970 58954 58954 キサラギ (2007) 3.250000 2007 2000-2010 58955 58955 チェブラーシカ (2010) 3.166667 2010 2010-2020 58956 58956 貞子3D (2012) 2.375000 2012 2010-2020 58957 58957 줄탁동시 (2012) 4.000000 2012 2010-2020 Write a function that groups titles by decade and calculates the percentage of titles with a rating of 5.0 relative to all titles in that decade

def group_by_rating(df): grouped = df.groupby('decade') result = {} for decade, group in grouped: result[decade] = (group.rating == 5.0).sum() / len(group) return result

Python

generate functionThu, 23 Mar 2023

This code gives an error AttributeError: 'MultiComparison' object has no attribute 'plot_simultaneous' from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey.plot_simultaneous(comparison_name="C") Fix it!

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm from statsmodels.formula.api import ols import statsmodels.stats.multicomp as multi %matplotlib inline data = pd.read_csv('/Users/josephfrasca/Coding/Python/Python-Practice/Stats/ToothGrowth.csv') data.head() data.head()

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_value(dct): value = 0 for k, v in dct.items(): if v["Value"] > value: key = k value = v["Value"] return key max_value(dct)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that finds all rows with letters in the column and replace them with NaN. Example: find_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def find_non_numbers(data, column): return data[~pd.to_numeric(data[column], errors='coerce').notnull()] #OR import re def find_non_numbers(data, column): return data[~data[column].map(lambda x: bool(re.search(r'[a-zA-Z]', x)))]

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, replace with replacing_num (but only the 2nd value) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090' Example: In the list of stnums we see the value '4004'. It occurs twice in the list of students. We need to change the second one to replacing_num. The result should be this: [ ... ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ... ["9090", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_sec(stnums, students, replacing_num): for stnum in stnums: num = 0 for student in students: if stnum == student[0]: num += 1 if num > 1: student[0] = replacing_num return students replace_sec(stnums, students, replacing_num)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Merge the rows with duplicates and sort the name column in ascending order

#function that: There is a Pandas dataframe import pandas as pd def merge_df(df): df.drop_duplicates(subset = ["name"], keep = 'first', inplace = True) return df.sort_values(by = 'name') df = pd.read_csv('test_data.csv') merge_df(df)

Python

generate functionThu, 22 Dec 2022

The Collatz conjecture (also known as 3n+1 conjecture) is a conjecture that applying the following algorithm to any number we will always eventually reach one: [This is writen in pseudocode] if(number is even) number = number / 2 if(number is odd) number = 3*number + 1 #Task Your task is to make a function hotpo that takes a positive n as input and returns the number of times you need to perform this algorithm to get n = 1. #Examples hotpo(1) returns 0 (1 is already 1) hotpo(5) returns 5 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(6) returns 8 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(23) returns 15 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

def hotpo(n): steps = 0 while n > 1: if n % 2 == 0: n /= 2 else: n = 3 * n + 1 steps += 1 return steps

Python

generate functionWed, 22 Mar 2023

There are two variables p.speed and boosted_p.speed. Plot two boxplots with their values next to each other

def plot_boxplots(p1, p2): fig = plt.figure() ax1 = fig.add_subplot(111) ax1.boxplot(p1) ax1.boxplot(p2) plt.show() plot_boxplots(p.speed, boosted_p.speed)

Python

generate functionWed, 08 Mar 2023

Add to the code below winning $1 on "000", "007" combinations for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def add_777999_555333_11110_77_7_00_0(a): if a == 777: return 200 elif a == 999: return 100 elif a == 555: return 50 elif a == 333: return 15 elif a == 111: return 10 elif a%10 == 7 and a//10%10 == 7: return 5 elif a%10 == 7: return 3 elif a//10%10 == 0 and a//100%10 == 0: return 2 elif a//10%10 == 0: return 1 else: return 0 def find_score(num_trials): money = 0 successes = 0 for i in range(num_trials): money += add_777999_555333_11110_77_7_00_0(np.random.randint(1, 1000)) successes += 1 return money, successes

Python

generate functionMon, 12 Dec 2022

Complete function saleHotdogs/SaleHotDogs/sale_hotdogs, function accepts 1 parameter:n, n is the number of hotdogs a customer will buy, different numbers have different prices (refer to the following table), return how much money will the customer spend to buy that number of hotdogs. Use the only ternary operator number of hotdogs price per unit (cents) if n < 5 then 100 if n >= 5 and n < 10 then 95 if n >= 10 90

def sale_hotdogs(n): return (n < 5) * (n * 100) + (n >= 5 and n < 10) * (n * 95) + (n >= 10) * (n * 90)

Python

generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def add(a, b): return a + b add(1, 2)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def graph_difference(data, title): difference = data['nutri'] - data['mineral'] percentage = (difference / data['nutri']) * 100 plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral') plt.title(title, fontsize = 20) plt.grid(axis = 'y') for i in range(len(data)): plt.text(x = data.index[i] - 0.3, y = data['nutri'][i] + 0.1, s = str(round(percentage[i], 1)) + '%', size = 12) plt.show()

Python

generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guess_blue(blue_start, red_start, blue_pulled, red_pulled): return (blue_start - blue_pulled) / (blue_start - blue_pulled + red_start - red_pulled)

Python

generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

from sklearn import datasets import numpy as np X, y = datasets.load_svmlight_file('data.txt') X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) X_train = X_train.toarray() X_test = X_test.toarray() y_train = y_train.astype('float') y_test = y_test.astype('float')

Python

generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame(columns=['date', 'title', 'link']) for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass if link not in site.link.values: row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd

Python

generate functionTue, 14 Mar 2023

Write code that imports the minmax scaler library from sklearn and initializes it into the scaler variable

from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler()

Python

generate functionSat, 04 Mar 2023

Add the x-axis value captions to the code of this graph and change its color to purple gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar')

gdpdiff = gdpdiff.dropna(subset=['GDP per capita']) gdpdiff = gdpdiff[gdpdiff['GDP per capita'] > 0] gdpdiff = gdpdiff.sort_values(by='GDP per capita') gdpdiff['Country or region'].replace({ 'United States': 'USA', 'United Kingdom': 'UK' }) gdpdiff.plot(x='Country or region', y='GDP per capita', kind='bar')

Python

generate functionWed, 22 Feb 2023

question5 = "How likely would you work for a company whose mission is not bringing social impact ?" question5 = df[question5].value_counts() label = question5.index counts = question5.values Write code that builds a vertical bar graph plotly.express in which along the axis: y - label x - counts

import plotly.express as px fig = px.bar(y = label, x = counts) fig.show()

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas as pd ratings = pd.read_csv('ratings.csv') print(ratings.head()) # Create a function which return the average lifetime of users def lifetime(group): return group.max() - group.min() lifetime_users = ratings.groupby('userId').agg(lifetime) print(lifetime_users) average_lifetime = lifetime_users['timestamp'].mean() print(average_lifetime) # output: # userId movieId rating timestamp # 0 1 31 2.5 1260759144 # 1 1 1029 3.0 1260759179 # 2 1 1061 3.0 1260759182 # 3 1 1129 2.0 1260759185 # 4 1 1172 4.0 1260759205 # timestamp # userId # 1 203560 # 2 866607 # 3 8

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with number_of_hits on the x-axis and performer on the y-axis.

import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv(filename) df.plot(kind='bar',x='performer',y='number_of_hits') plt.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd df = pd.read_csv('artist_song_chart_debut.csv') df['chart_debut'] = df['chart_debut'].apply(lambda x: str(x)[:4]) df

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

df_hot_years = df_ru[df_ru["av_temp"] > 12]["year"] " print(df_hot_years)

Python

generate functionTue, 13 Dec 2022

[2, 1, 10] --> 9 In descending order: [10, 2, 1] Sum: (10 - 2) + (2 - 1) = 8 + 1 = 9 If the array is empty or the array has only one element the result should be 0 (Nothing in Haskell, None in Rust).

def sum_of_differences(arr): if len(arr) <= 1: return 0 arr.sort(reverse=True) return sum(arr[i] - arr[i + 1] for i in range(len(arr) - 1))

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

#Fix it! misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test.iloc[index]['Message'], 'actual': y_test.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def check_sample_size(conf = 0.95, delta = 0.05, sigsqr = 225): """ confidence level - уровень доверия delta - погрешность sigsqr - дисперсия """ p = 1 - ((1 - conf) / 2) z = norm.ppf(p) n = (z ** 2) * sigsqr / delta ** 2 return int(n) print(check_sample_size())

Python

generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig/(n**(1/2))*z return round(h*2)

Python

generate functionSat, 18 Feb 2023

There are 2 charts in Python: df.plot(kind='scatter', x='User_Score', y='Global_Sales', plt.ylim(0, 40) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', plt.ylim(0, 40) Rewrite the code so that the two charts are not separate but are subplots.

fig, ax = plt.subplots() df.plot(kind='scatter', x='User_Score', y='Global_Sales', ax=ax) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', ax=ax) plt.ylim(0, 40)

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

df.groupby(level=0).cumcount()

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def function(x, y, z): return x + y function(1, 2)

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate Wilcoxon statistics stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

# def my_mannwhitneyu(data, col, by): def my_mannwhitneyu(data, col1, col2): # select data for the two conditions # data1 = data[data[by] == 'gate_30'] # data2 = data[data[by] == 'gate_40'] data1 = data[data[col1] == col1] data2 = data[data[col2] == col2] # calculate Mann-Whitney U-statistics and p-value stat, p = mannwhitneyu(data1, data2) print('Statistics=%.3f, p=%.3f' % (stat, p)) return stat, p my_mannwhitneyu(data, 'gate_30', 'gate_40')

Python

generate functionFri, 23 Dec 2022

There is a list lst = '[0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1]' Glue it on one line without commas and spaces to get '01011110001100111'

def glue_list(lst): glue_string = '' for i in lst: glue_string += str(i) return glue_string print(glue_list([0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1]))

Python

generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums money = 0 winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

def money_add(lst): for i in lst: if i >= 100: money += 1

Python

generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end in list def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) while start <= end: lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] date_range('2022-12-01', '2022-12-05')

Python

generate functionMon, 26 Dec 2022

Add to this code the numbering of all group_list values from 1 dct = convert_to_dict(students) def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() return group_list group_list(dct, 'BST161')

def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() return group_list group_list(dct, 'BST161')

Python

generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def bar_plot(question_text, title, y_title, name): question = df[question_text].value_counts() label = question.index counts = question.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title) fig.update_yaxes(title_text=y_title) fig.update_xaxes(title_text=name) fig.show() question_text = 'What is the most preferred working environment for you.' title = 'Какая рабочая среда для вас наиболее предпочтительна?' y_title = 'Ответы' name = 'Количество ответов' bar_plot(question_

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df = df.copy() df.drop(columns=['time_on_chart', 'consecutive_weeks', 'decade', 'num_of_hits'], inplace=True) years = df.chart_debut.unique() year_leaders_df = pd.DataFrame() for year in years: year_df = df[df.chart_debut == year] year_max_row = year_df.loc[year_df.num_of_hits.idxmax()] year_leaders_df = year_leaders_df.append(year_max_row, ignore_index=True) return year_leaders_df

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

for i in winnums: print(i) print(i +=1)

Python

generate functionTue, 21 Feb 2023

paid['App'].groupby(paid['Type']).sum()\ .plot(kind='pie', figsize=(5, 6), autopct='%1.1f%%', startangle=90, title='Процент бесплатных и платных приложений') Add to this chart a display of the legend

paid['App'].groupby(paid['Type']).sum()\ .plot(kind='pie', figsize=(5, 6), autopct='%1.1f%%', startangle=90, title='Процент бесплатных и платных приложений', legend=True)

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_rock_rows(df): ...

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python

generate functionWed, 15 Mar 2023

Write code that brings all the values of the data frame to the same scale using the StandardScaler library from the sklearn.preprocessing library: sepal_length sepal_width 0 5.1 3.5 1 4.9 3.0 2 4.7 3.2

from sklearn.preprocessing import StandardScaler std_scaler = StandardScaler() df_scaled = std_scaler.fit_transform(df) print(df_scaled)

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python

generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end instead of return interval from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] If start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) interval = end - start return interval except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20') Translated with www.DeepL.com/Translator (free version)

from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) interval = end - start for i in range(interval.days + 1): lst.append(start.strftime('%Y-%m-%d')) start = start + td(days=1) return lst except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20')

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column (in object data format) into categories. Write a function that creates a rate_group column and adds values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15'

def rate_group(value): if value > 15.00: return '>15' elif 10.00 <= value <= 15.00: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group) df

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_dct(dct, key): return max(dct, key=lambda key: dct[key]['Value']) dct = { 'value1': {'Value': 3}, 'value2': {'Value': 1}, 'value3': {'Value': 2}, } max_dct(dct, 'Value') # returns 'value1'

Python

generate functionFri, 17 Mar 2023

Write a function that will determine the size of the necessary sample for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number.

import math def sample_size(delta, sigsqr, conf): return math.ceil( (sigsqr * (math.log(1 / (1 - conf)) / delta)**2) / 2 ) sample_size(0.05, 0.02, 0.95)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df = df.groupby('chart_debut').apply(lambda x: x.nlargest(1, 'num_of_hits')) df = df.reset_index() df = df.drop(columns = ['level_1']) return df

Python

generate functionWed, 15 Mar 2023

Write code that brings all the values of the data frame to the same scale using the StandardScaler library from the sklearn.preprocessing library: sepal_length sepal_width 0 5.1 3.5 1 4.9 3.0 2 4.7 3.2

from sklearn.preprocessing import StandardScaler def standardize(df): return df.apply(StandardScaler().fit_transform) standardize(df)

Python

generate functionWed, 18 Jan 2023

In the variable 'years' write a list of all the years from 1950 to 2010.

def years(): return list(range(1950, 2011)) years()

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams)

def multiply_dict(key): grams = 0 cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = cook_book.get('salad')[0].get('quantity') * 5 print(grams) multiply_dict('salad')

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_stnum(stnums, students, replacing_num): for student in students: for stnum in stnums: if stnum in student: student[0] = replacing_num return students print(replace_stnum(stnums, students, replacing_num))

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.plot(data.index, data['nutri'] - data['mineral'], color = 'red', linewidth = 2, label = 'difference') plt.xticks(data.index, data['year'], rotation = 90) plt.xlabel('Year') plt.ylabel('Profit') plt.legend() plt.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd # create a new dataframe dfs = df[['song','performer','chart_debut','peak_position','worst_position','time_on_chart','consecutive_weeks']] # convert the chart_debut to string dfs['chart_debut'] = dfs['chart_debut'].astype(str) # get the date date = dfs.chart_debut.str.split("-", n = 1, expand = True) # drop the chart_debut dfs.drop(columns =['chart_debut'], inplace = True) # insert the date dfs.insert(2,'chart_debut',date[0]) # show the top 5 rows print(dfs.head())

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that changes the parameters of all columns to float64 with try. If this fails, then except pass

def df_to_float(df): try: df = df.astype(float) return df except: pass

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams)

def cook_book(): portions = 5 for key, value in cook_book.items(): for sub_dict in value: for v in sub_dict.values(): ingridient_name = v[0] quantity = v[1] measure = v[2] grams = quantity * portions print(grams) return grams

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

def line_graph(dataframe): fig = plt.figure() ax = fig.add_axes([0,0,1,1]) ax.plot(dataframe.columns, dataframe.iloc[0]) ax.set_xlabel('years') ax.set_ylabel('columns') ax.set_title('line graph')

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 Calculate the scalar value as a percentage for all columns using the following formula: profit / revenue

df['profit_perc'] = df['profit'] / df['revenue']

Python

generate functionThu, 15 Dec 2022

Refine the function below to handle errors 1) if shelf variable is not found in directories, it will show ERROR NO SUCH KEY 2) if doc variable is not found in directories, it will show ERROR NO SUCH VALUE def change_shelf(data): docnum = '11-2' shelf = '3' for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] }

def change_shelf(data, docnum, shelf): if not shelf in data: return'ERROR NO SUCH KEY' for val in data.values(): if docnum in val: val.append(docnum) print('OK') return data return 'ERROR NO SUCH VALUE' directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] }

Python

generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def compare_gdp_and_happiness(df1, df2): df_merge = df1.merge(df2, how='inner', left_on='Country or region', right_on='entity') df_merge = df_merge[['Country or region', 'GDP per capita', 'happinessScore']] df_merge['GDP per capita'] = df_merge['GDP per capita'].apply(pd.to_numeric) df_merge['happinessScore'] = df_merge['happinessScore'].apply(pd.to_numeric) df_merge_sort = df_merge.sort_values(by='happinessScore', ascending=False) #df_merge_sort df_merge_top_1 = df_merge_sort.head(20)[0:1] df_merge_top_1.rename(columns={'Country or region': 'Top1', 'GDP per capita': 'Top1 GDP', 'happinessScore

Python

generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guessBlue(blue_start, red_start, blue_pulled, red_pulled): return (blue_start - blue_pulled)/(blue_start - blue_pulled + red_start - red_pulled)

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

def get_year(date): return int(date[:4]) def get_month(date): return int(date[5:7]) def get_day(date): return int(date[8:]) df['year'] = df['date'].apply(get_year) df['month'] = df['date'].apply(get_month) df['day'] = df['date'].apply(get_day)

Python

generate functionWed, 22 Mar 2023

This code gives an error TypeError: '>' not supported between instances of 'str' and 'float' fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) Fix it!

fvalue, pvalue = stats.f_oneway(p.Speed, p2.Speed)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

df[df['peak_position'] == 1].drop_duplicates(subset = 'song', keep = 'first')

Python

generate functionThu, 16 Feb 2023

There is a df Pandas dataframe: id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_grade 0 1077501 1296599 5000 5000 4975 36 months 10.65% 162.87 B B2 ... Computer 27.65 0.0 735.0 739.0 1.0 3.0 13648.0 83.7% 9.0 1 1077430 1314167 2500 2500 2500 60 months 15.27% 59.83 C C4 ... bike 1.00 0.0 740.0 744.0 5.0 3.0 1687.0 9.4% 4.0 Write a function that will loop through all the column names in this dataframe and apply try .astype('Int64') to them. In case of an error the function should just execute pass

def try_to_int(df): for col in df.columns: try: df[col] = df[col].astype('Int64') except: pass

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

import pandas as pd def split_data(data): # data is a list of tuples # return a tuple of four lists # X_train, X_test, y_train, y_test df = pd.DataFrame(data, columns=['X', 'y']) train, test= train_test_split(df, test_size=0.3, random_state=42) return (train['X'].tolist(), test['X'].tolist(), train['y'].tolist(), test['y'].tolist()) split_data([(1, 2), (3, 4), (5, 6)])

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame( {'place': [1, 2, 3, 4, 5], 'decade': ['2010-2020', '1900-1910', '1970-1980', '2000-2010', '1960-1970'], 'perc_of_5star': [2.3, 1.379, 1.179, 1.176, 1.133]}) df.sort_values(by='perc_of_5star', ascending=True).plot(kind='barh', x='decade', y='perc_of_5star', legend=False) plt.show()

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function that groups the Performer lines by decade (for example, 1996 is 1990-2000)

import pandas as pd import numpy as np df = pd.DataFrame({'name': ['Taylor Swift', 'Imagine Dragons', 'Ed Sheeran'], 'year': [2008, 2012, 2014]}) def decade_equals(year): return year // 10 * 10 df.groupby(decade_equals)['name'].apply(list)

Python

generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004

Python

generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def split_func(a): import json import re a = json.loads(a.strip()) purchases = {} for i, line in enumerate(a): line = re.split(r",|:", str(line)) keys = line[1] values = line[3] purchases[keys] = values return purchases split_func(a)

Python

generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): return [num for i, num in enumerate(arr) if num % i == 0 and i != 0]

Python

generate functionWed, 18 Jan 2023

Write a function that calculates the value_counts in the decade pandas dataframe column if the rows in the rating column = 5.0 if av_rating[av_rating.rating == 5.0]: av_rating['decade'].value_counts() else: pass

def value_counts(df, year): if df[df['rating'] == 5.0]: return df["year"].value_counts() else: return "no movie with 5.0 rating in this year"

Python

generate functionWed, 25 Jan 2023

There is a Pandas dataframe news_title 0 /world/ 1 /latest/ 2 /?updated=top 3 /politics/36188461-s-marta-zhizn-rossiyan-susc... 4 /world/36007585-tramp-pridumal-kak-reshit-ukra... 5 /science/36157853-nasa-sobiraet-ekstrennuyu-pr... 6 /video/36001498-poyavilis-pervye-podrobnosti-g... 7 /world/36007585-tramp-pridumal-kak-reshit-ukra... 8 /science/ 9 /sport/ Filter out this dataframe and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd news = pd.read_csv('https://raw.githubusercontent.com/ml-mipt/ml-mipt/basic/homeworks/Lab1_python_basics/news_sample.csv') news[news['news_title'].str.contains('[0-9]{8}-[a-z]+')]

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

# First solution def transpose(matrix): new_matrix = [[] for _ in range(len(matrix[0]))] for i in range(len(matrix[0])): for j in range(len(matrix)): new_matrix[i].append(matrix[j][i]) return new_matrix

Python

generate functionThu, 09 Feb 2023

There is 'actors' Pandas dataframe: actorid name date_of_birth birth_city birth_country height_inches biography gender ethnicity networth 1002 1155 Jackie Chan 1954-04-07 Victoria Peak Hong Kong 68.0 Hong Kong's cheeky, lovable and best known fil... Male NaN 400000000.0 1387 69 Keanu Reeves 1964-09-02 Beirut Lebanon 73.0 Keanu Charles Reeves, whose first name means "... Male Lebanese 360000000.0 2252 141 Sean Connery 1930-08-25 Edinburgh UK 74.0 The tall, handsome and muscular Scottish actor... Male White 350000000.0 291 6 Bruce Willis 1955-03-19 Idar-Oberstein West Germany 72.0 Actor and musician Bruce Willis is well known ... Male White 250000000.0 Write a function that creates a dictionary with all the countries of North America, Western Europe. Then apply this function so that only the rows NOT corresponding to this dictionary are left in the dataframe.

def NAWE(country): NA = ['Canada', 'United States'] WE = ['United Kingdom', 'Germany', 'Netherlands'] if country in NA or country in WE: return False return True df = df[df['birth_country'].apply(NAWE)]

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] add them to df Pandas dataframe in the following format: index treatments value 0 0 A 25 1 1 A 30 2 2 A 28 3 3 A 36 4 4 A 29 5 0 B 45 6 1 B 55 7 2 B 29 8 3 B 56 9 4 B 40 10 0 C 30 11 1 C 29 12 2 C 33

import pandas as pd import numpy as np def func(a, b, c): a = pd.DataFrame(a) b = pd.DataFrame(b) c = pd.DataFrame(c) a['treatment'] = 'A' b['treatment'] = 'B' c['treatment'] = 'C' a['index'] = np.arange(len(a)) b['index'] = np.arange(len(b)) c['index'] = np.arange(len(c)) a = a.rename(columns={0: "value"}) b = b.rename(columns={0: "value"}) c = c.rename(columns={0: "value"}) a = a[['index', 'treatment', 'value']] b = b[['index', 'treatment', 'value']] c = c[['index', 'treatment', 'value']] d =

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

#plot the grid plt.grid(axis='y', alpha=0.75)

Python

generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guessBlue(blueStart, redStart, bluePulled, redPulled): return (blueStart - bluePulled) / (blueStart - bluePulled + redStart - redPulled) guessBlue(5, 5, 2, 3)

Python

generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def load_json(line): import json from json.decoder import JSONDecodeError try: line = json.loads(line) except JSONDecodeError: line = {} return line f = open('purchase_log.txt', encoding='utf-8') purchases = {} for i, line in enumerate(f): line = line.strip().split(',') keys = line[1] values = line[3] purchases[keys] = values

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt %matplotlib inline np.random.seed(1) A = np.random.normal(0, 1, 50) B = np.random.normal(1, 1, 50)

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Write a function that will create the rate_group column and add values to it as follows: 1) if int_rate > '15.00%', rate_group should be '>15' 2) if int_rate is between '10.00%' and '15.00%', then rate_group should be '10-15' You can use regex

def rate_group(x): if x > '15.00%': return '>15' if (x >= '10.00%') & (x <= '15.00%'): return '10-15' df['rate_group'] = df['int_rate'].apply(lambda x: rate_group(x))

Python

generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x,y: x*y, a) - reduce(lambda x,y: x*y, b))

Python

generate functionWed, 22 Mar 2023

This code gives an error TypeError: '>' not supported between instances of 'str' and 'float' fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) Fix it!

speed = np.array([0.0, 83.6, 98.3, 108.5, 124.9, 141.7, 161.3, 187.3, 197.8, 217.1, 232.6, 250.1]) fvalue, pvalue = stats.f_oneway(speed[:6], speed[6:]) # Solution fvalue, pvalue = stats.f_oneway(speed[:6], speed[6:]) print(pvalue)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe named 'dfp': performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type. Sort them in descending order by incident_type column

df_new.groupby(['route', 'incident_type']).size().sort_values(ascending=False).head()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column.

def group_by_performer(df): # make a copy of df df_copy = df.copy() # sort by performer df_copy.sort_values(by='performer', inplace=True) # group by song and take the unique performers df_copy['hits'] = df_copy.groupby('song')['performer'].transform(lambda x: ','.join(x.unique())) # drop duplicates and reset index df_copy.drop_duplicates(subset='performer', inplace=True) df_copy.reset_index(inplace=True) return df_copy group_by_performer(df)

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = * return * print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N, 0, -1)) return my_matrix.sum() sum_matrix(5) sum_matrix(10) sum_matrix(15)

Python

generate functionWed, 22 Mar 2023

This code gives an error TypeError: '>' not supported between instances of 'str' and 'float' fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) Fix it!

def anova_1way(a, b): fvalue, pvalue = stats.f_oneway(a, b) if pvalue < 0.05: print("Reject Null Hypothesis - Significant differences exist between group means") else: print("Fail to Reject Null Hypothesis - No significant difference between group means") anova_1way(p.Speed, boosted_p.Speed)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column

def create_scaled_data(sl, sw): scaled_data_1 = (sl - min(sl))/(max(sl) - min(sl)) scaled_data_2 = (sw - min(sw))/(max(sw) - min(sw)) scaled_data = pd.DataFrame({'sl': scaled_data_1, 'sw': scaled_data_2}) return scaled_data scaled_data = create_scaled_data(sl, sw)

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=label, y=counts, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

# Your code here

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df.sort_values('name', ascending=False)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code with value_counts and sorting by decade column and rating

def sort_df(df): return df.groupby('decade').rating.value_counts().sort_index()

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a for list that sums all the numbers in the list with each other The result should be this: res = 54

def add(arr): res = 0 for num in arr: res += num return res add([2, 1, 10, 5])

Python

generate functionTue, 13 Dec 2022

Given an array of numbers, check if any of the numbers are the character codes for lower case vowels (a, e, i, o, u). If they are, change the array value to a string of that vowel. Return the resulting array IN THE SAME ORDER! Example: Input: inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] Output: [118,117,120,121,117,98,122,97,120,106,104,116,113,114,113,120,106 ]

def vowel_2_index(inp): for i in range(len(inp)): if type(inp[i]) == int: if inp[i] == 97: inp[i] = "a" elif inp[i] == 118: inp[i] = "u" elif inp[i] == 105: inp[i] = "i" elif inp[i] == 111: inp[i] = "o" elif inp[i] == 101: inp[i] = "e" return inp inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] vowel_2_index(inp)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

#df_new.route.value_counts() def df_count(dataframe, column): return dataframe[column].value_counts() df_count(df_new, 'route')

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example: dish = 'salad' portions = 5 The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]}

def get_ingredients(dish, portions): recipes = cook_book.get(dish, []) if not recipes: print('No such dish') return print(f'You need to cook {dish} for {portions} portions') for recipe in recipes: print(f'{recipe["ingridient_name"]}: {recipe["quantity"] * portions} {recipe["measure"]}')

Python

generate functionSat, 25 Feb 2023

Write code that finds all rows in the rate_group column of Pandas with "%" at the end and removes it with regex. Rows without "%" must be replaced with np.nan.

def remove_percent(row): if row.endswith("%"): return float(row.replace("%", "")) else: return float(row) df.rate_group = df.rate_group.map(remove_percent)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer time_on_chart min max "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

import pandas df.sort_values(by=['time_on_chart', 'max'], ascending=False).head(20)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function divide_decade(df, decade) which, when a decade is specified, will perform the appropriate filter of the chart_debut column on the grouped Performer column. for example divide_decade(df, '1980-1990') 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0

import pandas as pd df = pd.DataFrame(data, columns=['performer', 'hits', 'chart_debut']) def divide_decade(df, decade): return df[(df['chart_debut'] >= int(decade[0:4])) & (df['chart_debut'] <= int(decade[5:9]))] divide_decade(df, '1980-1990')['performer']

Python

generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

def goes_after(word, first, second): for i in range(len(word)): if word[i] == first: if word[i+1] == second: return True else: return False else: return False

Python

generate functionTue, 21 Mar 2023

Write code that draws two histograms A and B superimposed on each other

import matplotlib.pyplot as plt a = [1,3,3,4,4,4] b = [2,3,4,4,4,4,4] plt.hist(a, alpha=0.5, color='b') plt.hist(b, alpha=0.5, color='r') plt.show()

Python

generate functionTue, 10 Jan 2023

Create a numpy array with elements from number N to 0. For example, for N = 10 it would be array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

import numpy as np def create_array(N): return np.arange(N-1, -1, -1) create_array(10)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def get_movie_rating(): df = pd.read_csv(MOVIE_DATA_FILE_PATH) df2 = df.groupby(['title'])['rating'].mean().round(1) return df2 get_movie_rating()

Python

generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def search_and_replace(lst, new_id): for i in lst: if i[0] == new_id: i[0] = '9090' print(lst) search_and_replace(lst, '4004')

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

def train_test_split(X, y, test_size=0.3, random_state=42): """ Input: X: A pandas dataframe. y: A pandas series. Corresponds to the labels. test_size: Test size as a fraction of the total dataset. """ # YOUR CODE HERE raise NotImplementedError() return X_train, X_test, y_train, y_test

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

from math import sqrt def distance(a, b): return sqrt((a.x - b.x)**2 + (a.y - b.y)**2)

Python

generate functionMon, 27 Mar 2023

The code below gives an error TypeError: unhashable type: 'list'. Fix it from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point: def __init__(self, x, y): self.x = x self.y = y def distance(a, b): return round(math.sqrt((b.x - a.x)**2 + (b.y - a.y)**2), 6)

Python

generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Python

generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def upgrade(num): if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1 return money, successes

Python

generate functionFri, 27 Jan 2023

Write a Pandas function that shows all str.contains = 'yandex' values

import pandas as pd df = pd.read_csv('https://stepik.org/media/attachments/course/4852/accountancy.csv') def show_all_yandex(df): return df.loc[df['Executor'] == 'Yandex']

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def func(delta, sigsqr, conf): return round(sigsqr * conf ** 2 / delta ** 2)

Python

generate functionThu, 22 Dec 2022

Define a function that removes duplicates from an array of numbers and returns it as a result. The order of the sequence has to stay the same.

def distinct(seq): return list(set(seq))

Python

generate functionThu, 23 Mar 2023

This code gives an error AttributeError: 'MultiComparison' object has no attribute 'plot_simultaneous' from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey.plot_simultaneous(comparison_name="C") Fix it!

from statsmodels.stats.multicomp import pairwise_tukeyhsd tukey = pairwise_tukeyhsd(endog = df['A'], # Data groups = df['B'], # Groups alpha=0.05) # Significance level tukey.summary() fig = tukey.plot_simultaneous(comparison_name="C") fig.show()

Python

generate functionSun, 11 Dec 2022

A list of arbitrary length is given. It is necessary to write code, which on the basis of the original list will compose a nesting level dictionary of the same length as the original list. Examples of how the program works: my_list = ['2018-01-01', 'yandex', 'cpc', 100] Result: {'2018-01-01': {'yandex': {'cpc': 100}}} my_list = ['a', 'b', 'c', 'd', 'e', 'f'] Result: {'a': {'b': {'c': {'d': {'e': 'f'}}}}} Translated with www.DeepL.com/Translator (free version)

def make_dict(my_list): result = {} if len(my_list) > 2: result[my_list[0]] = make_dict(my_list[1:]) else: result[my_list[0]] = my_list[1] return result my_list = ['a', 'b', 'c'] print(make_dict(my_list))

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

def distance(a, b): return math.sqrt((a.x - b.x) ** 2 + (a.y - b.y) ** 2)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade_of_the_movie(movie_rating) if movie_rating > 5: print("Most movies were in the decade") elif movie_rating <= 4: print("Most movies were in the decade") else: print("No movies were found") decade_of_the_movie(5)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

import pandas as pd from pandas import DataFrame sl = [[-0.90068117], [-1.14301691], [-1.38535265], [-1.50652052], [-1.02184904], [-0.53717756], [-1.50652052], [-1.02184904], [-1.74885626], [-1.14301691]] sw = [[1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.], [1.]] data = {'SL': sl, 'SW': sw} print(pd.DataFrame(data))

Python

generate functionFri, 24 Feb 2023

Write a function for the Pandas dataframe that will delete the rows with the names spi_rank and country

def del_col(data_frame): data_frame.drop(['spi_rank', 'country'], axis=1, inplace=True) return data_frame

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

def duplicates(data): #your code here result = duplicates(data)

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def group_list(grnum): group_list = [] for key in dct: if grnum in key: group_list.append(' '.join(dct[key][0:3])) group_list.sort() return group_list group_list('BST161')

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 6 2021-07-02 Yalta Rapsodia 33 27 75276 25641 19305 30330 Write a lambda function with loc method that filters out all data and leaves only revenue > 40000 and total_rooms < 30

df.loc[(lambda x: x["revenue"] > 40000) & (lambda x: x["total_rooms"] < 30)]

Python

generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def check_id(lst): lst = list(set(lst)) count = 0 for i in lst: if i in lst[i + 1:]: print(i) count += 1 if count == 0: print("Non-repeating elements")

Python

generate functionMon, 13 Feb 2023

There is Matplotlib chart code: df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False) plt.show() Improve this code so that each subplot is at least 400x400 pixels in size

df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False, figsize=(10, 10)) plt.show()

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that finds all lines with the value "?" and creates a new df2 dataframe with them

def find_questions(df): df2 = df[df == '?'] return df2 find_questions(data)

Python

generate functionTue, 21 Feb 2023

There is a Pandas dataframe: Last Updated Installs 7479 2010-05-21 100000 7430 2011-01-30 50000 10282 2011-03-16 100000 8418 2011-04-11 5000000 8084 2011-04-16 50000 9067 2011-04-18 50000 5144 2011-05-12 100 7237 2011-06-23 1000 10460 2011-06-26 5000 1544 2011-06-29 1000000 7080 2011-07-10 5000000 8200 2011-09-20 50000 5561 2011-09-22 1000000 Write a function that creates a 'year' column with only the year from the 'Last Updated' column (which contains dates in 'Timestamp' object format) added to it

data = [['2010-05-21',100000], ['2011-01-30',50000], ['2011-03-16',100000], ['2011-04-11',5000000], ['2011-04-16',50000], ['2011-04-18',50000], ['2011-05-12',100], ['2011-06-23',1000], ['2011-06-26',5000], ['2011-06-29',1000000], ['2011-07-10',5000000], ['2011-09-20',50000], ['2011-09-22',1000000]] df = pd.DataFrame(data, columns = ['Last Updated', 'Installs']) def date_to_year(a): return pd.to_datetime(a).year df['year'] = df.apply(lambda x: date_to_year(x['Last Updated']), axis=1)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

# import pandas import pandas as pd # read csv file df_new = pd.read_csv('file.csv') # calculate mean of the "date" column for each "operator" df_new.groupby('operator')['date'].mean()

Python

generate functionSat, 04 Mar 2023

Add the x-axis value captions to the code of this graph and change its color to purple gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar')

gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar', x='Country or region', y='GDP per capita', color='purple')

Python

generate functionThu, 02 Feb 2023

There is a Pandas dataframe: year date route operator group_name bus_garage bus_park injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Write a grouped_operators function that groups all values by operator column and sums all dates for each unique operator value

def grouped_operators(df): return df.groupby(["operator"])["date"].sum()

Python

generate functionSat, 25 Feb 2023

Write code that finds all lines in the Pandas dataframe without the "%" sign and replace all their content with NaN

import pandas as pd df = pd.read_csv('studentscores.csv') def clean_data(df): for index, row in df.iterrows(): for i, item in enumerate(row): if type(item) == str: if '%' in item: df.iloc[index, i] = item.replace('%', '').replace(',', '') else: df.iloc[index, i] = float('NaN') return df clean_data(df)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * 1.1 ** (clothes - load)

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with list comprehension that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

def sum_all(arr): return sum([i+j for i in arr for j in arr if i < j]) sum_all(arr)

Python

generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

def GraphMaker(vec1, vec2, angle): #Implement me return ax GraphMaker(film_2, film_4, 65.9)

Python

generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import matplotlib.pyplot as plt import numpy as np # we need 2 vectors film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) # We need to know how to plot vectors # We need to know how to get the angle between them # We need to know how to make a graph

Python

generate functionMon, 12 Dec 2022

In this simple exercise, you will build a program that takes a value, integer , and returns a list of its multiples up to another value, limit . If limit is a multiple of integer, it should be included as well. There will only ever be positive integers passed into the function, not consisting of 0. The limit will always be higher than the base. For example, if the parameters passed are (2, 6), the function should return [2, 4, 6] as 2, 4, and 6 are the multiples of 2 up to 6.

def find_multiples(integer, limit): return [integer*i for i in range(1, limit+1) if integer*i <= limit]

Python

generate functionTue, 07 Feb 2023

Write a Python function that will query words to form the 'query' list via 'input' until the user writes 'end'. If the user writes 'end', the function will store all the words in comma-separated list format in the 'query' variable and terminate the function.

def words(): query = [] while True: word = input("Enter word: ") if word == "end": print("Ending") break else: query.append(word) print(" ".join(query)) words()

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 6 2021-07-02 Yalta Rapsodia 33 27 75276 25641 19305 30330 Write a lambda function with loc method that filters out all data and leaves only revenue > 40000 and total_rooms < 30

df.loc[lambda x: (x['revenue'] > 40000) & (x['total_rooms'] < 30)]

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 10 78 54 GOOD 64 23 Write a function that finds all rows with letters in the Series column. Example: find_non_numbers(df, col1) Result: GOOD

import pandas as pd import re s = pd.Series(['10','78','54','GOOD','64','23']) def find_non_numbers(s): return s[~s.str.replace('\d+', '').str.contains('\d')] find_non_numbers(s)

Python

generate functionMon, 12 Dec 2022

Complete function saleHotdogs/SaleHotDogs/sale_hotdogs, function accepts 1 parameter:n, n is the number of hotdogs a customer will buy, different numbers have different prices (refer to the following table), return how much money will the customer spend to buy that number of hotdogs. number of hotdogs price per unit (cents) n < 5 100 n >= 5 and n < 10 95 n >= 10 90

def sale_hotdogs(n): if n < 5: return n * 100 elif n < 10: return n * 95 else: return n * 90

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

df = pd.read_csv('customers.csv') df.sort_values(by='name', inplace=True) #df[(df['name'].duplicated(keep=False))].sort_values(by='name', inplace=True) df[(df['name'].duplicated(keep=False))].sort_values(by='name').head(10)

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that finds all lines with the value "?" and creates a new df2 dataframe with them

df=pd.read_clipboard() df2 = df[df=='?'] print("the original dataframe:") print(df) print("the new dataframe:") print(df2)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

def mean_str_len(data, column1, column2): # data - a dataframe # column1 - a string, the name of a column # column2 - a string, the name of another column # YOUR CODE HERE ...

Python

generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) time.sleep(0.3) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) res = get_needed_posts(query) res Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) time.sleep(0.3) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') if title not in site.title.values and link not in site.link.values: row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In your answer, specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers.

def confidence_interval(n, mean, sig, conf): sig = sig / (n ** (1/2)) left = round(mean - sig * norm.ppf((1+conf)/2)) right = round(mean + sig * norm.ppf((1+conf)/2)) return right - left

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return The result of the function should be three values: 12, 28, 21

def interval(n, mean, sig, conf): h = sig * t.ppf((1 + conf) / 2, n - 1) return round(mean - h), round(mean + h), round(mean) interval(n=20, mean=3.3, sig=2.4, conf=0.95)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: time_on_chart min max performer "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

import pandas as pd df = pd.DataFrame({'time_on_chart': [1,1,1,1], 'max': [11,10,1,20]}, index=['"Groove" Holmes', '"Little" Jimmy Dickens', '"Pookie" Hudson', '"Weird Al" Yankovic']) df.sort_values(['time_on_chart', 'max'], ascending=False)

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = * return * print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def dia(N): my_matrix = np.diag(np.arange(N), k=-1) return np.trace(my_matrix) print(dia(4)) print(dia(12)) print(dia(15))

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_na_by_corr(df, col_name='rectal_temp'): df_temp = df[df.columns[df.isna().any()].tolist()].drop(['outcome'], axis = 1) df_temp = df_temp.dropna(subset = [col_name]) cor = df_temp.corr()[[col_name]].drop([col_name]) for i in cor.index: if cor.loc[i, col_name] > 0: df[i] = df[i].fillna(value = df[i].mean() * cor.loc[i, col_name]) else: df[i] = df[i].fillna(value = df[i].mean() / cor.loc[i, col_name]) return df

Python

generate functionTue, 21 Feb 2023

paid['App'].groupby(paid['Type']).sum()\ .plot(kind='pie', figsize=(5, 6), autopct='%1.1f%%', startangle=90, title='Процент бесплатных и платных приложений') Add to this chart a display of the legend

def add(a, b): return a + b add(1, 2)

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

from scipy.stats import ttest_ind import numpy as np a = np.array([1,2,3,4,5]) b = np.array([5,6,7,8,9]) ttest_ind(a, b)

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

def gen_id(row): return str(row.id) + '-' + str(row.hour) df['id'] = df.apply(gen_id, axis=1)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer, but not less than 1 and not more than 4: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def rounder(x): if x < 1: return 1 if x > 4: return 4 else: return round(x) df2['respiratory_rate'] = df2['respiratory_rate'].apply(rounder)

Python

generate functionWed, 22 Feb 2023

Write a function that gives True (except for 777) if the number ends with 77, and False in other cases

def is77(x): str_x = str(x) if str_x[-2:] == '77' and str_x != '777': return True else: return False is77(77)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that replaces all found strings with letters in the column with np.nan. Example: replace_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def replace_non_numbers(df, col): df[col] = df[col].replace(df[col][df[col].str.contains('[^0-9]')], np.nan)

Python

generate functionThu, 22 Dec 2022

The Collatz conjecture (also known as 3n+1 conjecture) is a conjecture that applying the following algorithm to any number we will always eventually reach one: [This is writen in pseudocode] if(number is even) number = number / 2 if(number is odd) number = 3*number + 1 #Task Your task is to make a function hotpo that takes a positive n as input and returns the number of times you need to perform this algorithm to get n = 1. #Examples hotpo(1) returns 0 (1 is already 1) hotpo(5) returns 5 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(6) returns 8 6 -> 3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1 hotpo(23) returns 15 23 -> 70 -> 35 -> 106 -> 53 -> 160 -> 80 -> 40 -> 20 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

def collatz(number): if number % 2 == 0: print(number // 2) return number // 2 elif number % 2 == 1: result = 3 * number + 1 print(result) return result n = input("Enter number: ") while n != 1: n = collatz(int(n))

Python

generate functionMon, 20 Feb 2023

The 'Size' column of the Pandas dataframe contains data in the following format: Size 19M 14M 8.7M 25M 250k 429k 728k 3M Write a function that uses regex to convert strings: - that end in 'M' to a number with 5 zeros (e.g. 2.8M should be changed to 2800000) - that end in 'k' to a number with 3 zeros (for example, 429k should be changed to 429000)

import re def convert_size(size): if size[-1] == 'M': return int(float(size[:-1]) * 1000000) elif size[-1] == 'k': return int(float(size[:-1]) * 1000) else: return int(size) replaces = lambda x: convert_size(x) data['Size'] = data['Size'].apply(convert_size)

Python

generate functionWed, 25 Jan 2023

There is a list with lists inside [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/'], ['/sport/'], ['/middleeast/36131117-divizion-s-400-ne-zametil-ataki-f-35-pod-damaskom/'], ['/economics/36065674-rossiyane-vozmutilis-minimalnymi-zarplatami-v-stranah-es/']] 1) Modify the list to the Pandas dataframe 2) Filter out and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

df[df['text'].str.contains('\d{8}-')]

Python

generate functionTue, 24 Jan 2023

Write a pandas function that groups matching columns traffic_source and region and outputs the values of the third column source_type for them

def my_fun(df, var0, var1, var2): return df.groupby([var0, var1])[var2].max().reset_index() my_fun(df, 'traffic_source', 'region', 'source_type')

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

df = pd.DataFrame(data = [[1, 31, 2.5, 1260759144], [1, 1029, 3.0, 1260759179], [1, 1061, 3.0, 1260759182], [1, 1129, 2.0, 1260759185], [1, 1172, 4.0, 1260759205]], columns = ['userId', 'movieId', 'rating', 'timestamp']) def average_lifetime(df): ''' df: input dataframe ''' df_max = df.groupby(['userId']).max() df_min = df.groupby(['userId']).min() df_final = pd.merge(df_max, df_min, on = ['userId'], suffixes = ('_max', '_min')) df_final['average_lifetime'] = df_final['timestamp_max'] - df_final['timestamp_min'] return df_final df = average

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round(a): return int(a + 0.5) round(2.5)

Python

generate functionSat, 28 Jan 2023

There is seaborn graph code sns.factorplot(x='date', y='rating', col='hotel', data=hotels_rating, col_wrap=3, kind='bar') plt.suptitle('hotel rating change',size=16) plt.subplots_adjust(top=.925) Limit the height of the columns on the y-axis to 3

# Import packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Initialize the figure f, ax = plt.subplots(figsize=(6.5, 6.5)) # Load the example car crash dataset crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False) # Plot the total crashes sns.set_color_codes("pastel") sns.barplot(x="total", y="abbrev", data=crashes, label="Total", color="b") # Plot the crashes where alcohol was involved sns.set_color_codes("muted") sns.barplot(x="alcohol", y="abbrev", data=crashes, label="Alcohol-involved", color="b") # Add a legend and informative axis label ax.legend(ncol=2, loc="lower right", frameon=True) ax.set(x

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

df['country'] = df['country'].astype(str)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column

def get_dataframe(): sl = [1, 2, 3, 4, 5] sw = [6, 7, 8, 9, 10] import pandas as pd return pd.DataFrame(data = [sl, sw], index = ['sl', 'sw']).transpose() get_dataframe()

Python

generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8.

def find_difference(a, b): return max(a) * max(b) find_difference([1, 2, 3], [4, 5, 6])

Python

generate functionSat, 28 Jan 2023

There is seaborn graph code sns.factorplot(x='date', y='rating', col='hotel', data=hotels_rating, col_wrap=3, kind='bar') plt.suptitle('hotel rating change',size=16) plt.subplots_adjust(top=.925) Limit the height of the columns on the y-axis to 3

plt.ylim(3,5)

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that will replace all NaN values in the rectal_temp column with the median of that column

# df2 Pandas dataframe is given def fillna_median(df2): df2['rectal_temp'] = df2['rectal_temp'].fillna(df2['rectal_temp'].median()) return df2 fillna_median(df2)

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the maximum values of each column and transposes them to rows

def colmax(df): maxvalues = [] for col in df.columns: maxvalues.append(df[col].max()) return pd.DataFrame(maxvalues, index=df.columns).T

Python

generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end instead of return somelist from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): lst = [] If start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) return somelist except: return 'Incorrect date format' date_range('2022-12-01', '2022-12-20') Translated with www.DeepL.com/Translator (free version)

from datetime import datetime as dt from datetime import timedelta as td def date_range(start_date, end_date): result = [] if start_date > end_date: return result else: while start_date <= end_date: result.append(start_date.strftime('%Y-%m-%d')) start_date += td(days=1) return result date_range('1992-09-01', '1992-09-04') # ['1992-09-01', '1992-09-02', '1992-09-03', '1992-09-04']

Python

generate functionThu, 22 Dec 2022

Wilson primes satisfy the following condition. Let P represent a prime number. Then, ((P-1)! + 1) / (P * P) should give a whole number. Your task is to create a function that returns true if the given number is a Wilson prime. If n < 0 the function should return False

from math import factorial def wilson_primes(p): if (factorial(p-1)+1)%(p*p) == 0 and p>1: return True return False

Python

generate functionTue, 21 Mar 2023

changing the paired argument in the wilcox.test function to 'FALSE'

wilcox.test(d$Sepal.Length, d$Petal.Length, paired = FALSE)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

df.groupby('song').first().reset_index()

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Write a function that will create the rate_group column and add values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15' You can use regex

import pandas as pd import re df = pd.DataFrame({'loan_amnt': [5000, 2500, 2400, 10000, 3000, 5000, 7000], 'int_rate': ['10.65%', '15.27%', '15.96%', '13.49%', '12.69%', '7.90%', '15.96%']}) df['rate_group'] = None def rate_group(a): if a > 15.00: return '>15' elif a >= 10.00 and a <= 15.00: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(lambda x: (re.sub('%', '', x))) df['rate_group'] = df['rate_group'].apply(lambda x: float(x)) df['rate_group'] = df['rate_group'].apply(rate_group) display(df)

Python

generate functionSat, 25 Feb 2023

Write code to check if there is a % sign at the end of the string

my_string = "How much is this going to cost?" # returns false "%" in my_string # returns true "%" in "Price: $100%"

Python

generate functionMon, 12 Dec 2022

Inspired by the development team at Vooza, write the function that accepts the name of a programmer, and returns the number of lightsabers owned by that person. The only person who owns lightsabers is Zach, by the way. He owns 18, which is an awesome number of lightsabers. Anyone else owns 0. Note: your function should have a default parameter. For example(Input --> Output): "anyone else" --> 0 "Zach" --> 18

def howManyLightSabersDoYouOwn(name="anyone else"): if name == "Zach": return 18 else: return 0

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

def average_temperature(df): avg_temp_by_country = df.groupby(['countrry', 'decade'])['av_temp'].mean().reset_index() coldest_countries = avg_temp_by_country.sort_values('av_temp', ascending = True).head(20) coldest_countries_list = list(coldest_countries['country']) return avg_temp_by_country, coldest_countries_list

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id for each line of the pandas dataframe

import uuid df_new['id'] = df_new.apply(lambda row: uuid.uuid4(), axis=1)

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_key(dct, key): #todo

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

from math import sqrt def interval(n, mean, sig, conf): t = 1.96 # for conf = 0.95 h = t * sig/sqrt(n) return round(h) interval(100, 6, 2, 0.95)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

import pandas as pd df = pd.read_csv('ratings.csv') grouped = df.groupby('userId') top_users = grouped.filter(lambda x: len(x) >= 100)

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def multiply_cook_book(portions): new_book = {} for key, value in cook_book.items(): new_book[key] = [] for d in value: new_book[key].append({'ingridient_name': d['ingridient_name'], 'quantity': d['quantity'] * portions, 'measure': d['measure']}) return new_book print(multiply_cook_book(2))

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def get_year(row): year = row['title'].extract() return "year" df.apply(get_year)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe: hotel how_find_us Alpina agg_101hotels.com 2 agg_airbnb.com 4 agg_booking.com 3 agg_bronevik.com 1 agg_hotellook.ru 4 agg_level.travel 2 agg_onetwotrip.com 2 agg_onlinetours.ru 2 agg_ostrovok.ru 1 agg_other_sites 1 agg_roomguru.ru 2 agg_sutochno.ru 6 agg_travelata.ru 3 agg_tripadvisor.ru 3 agg_trivago.com 6 agg_tvil.ru 3 agg_yandex_travel 5 by_recommendation 8 facebook_adv 1 google_adv 1 instagram_adv 6 outdoor 6 regular_customer 1 seo 5 social 3 telegram_adv 2 tour_agents 3 unknown 4 vk_adv 3 yandex_adv 10 Write a function that creates a new dataframe by grouping the values in the "how_find_us" column according to the following rules: 1) if there are "agg" in the row, replace all row values with "aggregators" 2) if there are words "facebook" or "vk" or "instagram" or "telegram" in the line, replace these values with "social"

import pandas as pd def find_us(val): if val.startswith('agg'): val = 'aggregators' elif val.startswith('vk_adv') or val.startswith('facebook_adv') or val.startswith('instagram_adv') or val.startswith('telegram_adv'): val = 'social' return val df = pd.read_csv('/datasets/how_find_us.csv') df['how_find_us'] = df['how_find_us'].apply(find_us) df['how_find_us'].value_counts()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new data frame and leave only the rows that have the maximum value of num_of_hits in it when grouped by the chart_debut column

def year_leaders(df): leders = df.groupby(by=['chart_debut']).agg({'num_of_hits': 'max'}).reset_index() print(leders) return df.merge(leders, left_on=['chart_debut', 'num_of_hits'], right_on=['chart_debut', 'num_of_hits'], how='inner') year_leaders(df)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

def group_by_performer(df): pass

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users who gave more than 100 ratings. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

data = loadmovies() pdata = pd.DataFrame(data) grouped = pdata.groupby('userId')['timestamp'].agg(['max', 'min']) grouped['diff'] = grouped['max'] - grouped['min'] grouped[pdata.groupby('userId')['rating'].count() > 100].mean()

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def add_x_axis_labels(fig, x, labels): fig.update_layout( xaxis = dict( tickmode = 'array', tickvals = x, ticktext = labels ) ) return fig add_x_axis_labels(fig, counts, label)

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def show_bar_plot(x, y, title): fig = px.bar(x=x, y=y, orientation='h') fig.update_layout(title_text=title) fig.show() show_bar_plot(counts, label, 'Your title')

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

def get_id(x): return x['user_id'] + '_' + x['item_id'] df['id'] = df.apply(get_id, axis=1)

Python

generate functionWed, 18 Jan 2023

Write a function that pulls the year number from the string: Example: string = 'Pulp Fiction (1994)' result 1994

def get_year(string): return string.split()[-1].strip(')') get_year('Pulp Fiction (1994)')

Python

generate functionMon, 20 Feb 2023

Write a function that finds all non-numeric rows in the 'Reviews' column of dataframe 'df1' using regex

def is_numeric(s): return bool(re.search(r'^(\d+)$', s)) df1 = pd.DataFrame({"Reviews": ["1", "2", "3", "4", "5", "apple", "orange"]}) df1[~df1["Reviews"].apply(is_numeric)]

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test[index], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point: def __init__(self, x, y): self.x = x self.y = y def distance(a, b): return math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2) test.assert_approx_equals(distance(Point(0,0), Point(0,1)), 1) test.assert_approx_equals(distance(Point(0,0), Point(1,0)), 1) test.assert_approx_equals(distance(Point(0,0), Point(3,4)), 5)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes <= load: return water else: return water * 1.1 ** (clothes - load)

Python

generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

Python

generate functionFri, 23 Dec 2022

You have a bunch of red and blue marbles. To start the game you grab a handful of marbles of each color and put them into the bag, keeping track of how many of each color go in. You take turns reaching into the bag, guessing a color, and then pulling one marble out. You get a point if you guessed correctly. You've decided to write a function, guessBlue() to help automatically calculate whether you should guess "blue" or "red". The function should take four arguments: the number of blue marbles you put in the bag to start the number of red marbles you put in the bag to start the number of blue marbles pulled out so far (always lower than the starting number of blue marbles) the number of red marbles pulled out so far (always lower than the starting number of red marbles) guessBlue() should return the probability of drawing a blue marble, expressed as a float. For example, guessBlue(5, 5, 2, 3) should return 0.6.

def guess_blue(blue_start, red_start, blue_pulled, red_pulled): total_start = blue_start + red_start total_pulled = blue_pulled + red_pulled return (blue_start - blue_pulled) / (total_start - total_pulled)

Python

generate functionTue, 13 Dec 2022

Write a function to get the first element(s) of a sequence. Passing a parameter n (default=1) will return the first n element(s) of the sequence. If n == 0 return an empty sequence [] Examples arr = ['a', 'b', 'c', 'd', 'e'] first(arr) # --> ['a'] first(arr, 2) # --> ['a', 'b'] first(arr, 3) # --> ['a', 'b', 'c'] first(arr, 0) # --> []

def first(seq, n=1): if n == 0: return [] else: return seq[:n]

Python

generate functionMon, 26 Dec 2022

Преобразуйте код ниже в dict comprehension dct = {} for el in lst: dct[el[0]] = el[1:] dct

lst = [['a', 1, 3], ['b', 2, 4], ['c', 5, 6]] {el[0]: el[1:] for el in lst}

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python

generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

import pandas as pd import numpy as np df = pd.DataFrame({'Category':['ham','ham','spam','ham','ham'], 'Message':['Go until jurong point, crazy... Available only ...','Ok lar... Joking wif u oni...','Free entry in 2 a wkly comp to win FA Cup fina...','U dun say so early hor... U c already then say...','Nah I don\'t think he goes to usf, he lives aro...']}) df['Category'] = df['Category'].map(lambda x : 1 if x == 'ham' else 0)

Python

generate functionThu, 23 Mar 2023

Write code in Python that downloads all the names of the text files in the folder: http://stup.pro/wp-content/uploads/2023/03/*.txt into one list book_files

import requests def get_all_files_names(url): links = [] response = requests.get(url) html_doc = response.text soup = BeautifulSoup(html_doc, 'html.parser') for link in soup.find_all('a'): links.append(link.get('href')) links = list(filter(lambda x: x.endswith('.txt'), links)) return links get_all_files_names('http://stup.pro/wp-content/uploads/2023/03/')

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

df_new.groupby(['route', 'incident_type']).size()

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def filter_1980(df): df["year"] = df["year"].apply(lambda x: x > 1980) return df def avg_temp(df): return df["av_temp"].mean() def list_coldest(df, n=20): df = df.sort_values(by="av_temp") return list(df.sample(n)["country"]) new_df = filter_1980(df) avg_temp(new_df) list_coldest(new_df)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def add_stuff(a, b): return a + b add_stuff(1, 2) def foo(x): return x**2 x = range(5) list(x)

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

def test_result(drugA, drugB): return

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe with missing column names 0 0 0 61 spi_score 32.5 67.434320 92.63 73.45 basic_human_needs 29.91 76.142959 96.85 83.0 foundations_of_wellbeing 34.17 67.774379 93.8 79.4 opportunity 23.37 58.381657 89.3 57.94 Write a function that adds the names of columns A, B, C, D to this dataframe

def add_Columns(df): df.columns = ['A', 'B', 'C', 'D'] return df add_Columns(df)

Python

generate functionSat, 18 Feb 2023

There are 2 charts in Python: df.plot(kind='scatter', x='User_Score', y='Global_Sales', plt.ylim(0, 40) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', plt.ylim(0, 40) Rewrite the code so that the two charts are not separate but are subplots.

import matplotlib.pyplot as plt fig, axes = plt.subplots(nrows=1, ncols=2) df.plot(kind='scatter', x='User_Score', y='Global_Sales', ax=axes[0]) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', ax=axes[1]) plt.show()

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that replace all rows with letters in the column to NaN. Example: replace_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def replace_non_numbers(df, column): df[column] = df[column].astype(str).str.replace('[^0-9]', '') return df[column].astype(float) #replace_non_numbers(df, 'loan_amnt')

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 We need to find customers who have contacted us several times and make a separate table with them. That is, we need to find duplicate values in the name column and leave only them, all the rest removed. Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

# pandas dataframe df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova', 'Ksenia Rodionova'], 'date': ['2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01']}) df.head(10) df_new = df.loc[df.duplicated(subset=['name'], keep=False), :] df_new

Python

generate functionWed, 22 Mar 2023

This code gives an error TypeError: '>' not supported between instances of 'str' and 'float' fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) Fix it!

def compare_two_groups(x, y): fvalue, pvalue = stats.f_oneway(x.Speed, y.Speed) if pvalue < 0.05: return True else: return False

Python

generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68]

def multiple_of_index(arr): if len(arr) <= 1: return [] else: new_arr = [] for i in range(len(arr)): if i * arr[i] == 0 and arr[i] != 0: new_arr.append(arr[i]) return new_arr

Python

generate functionWed, 22 Feb 2023

Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

def vertical_bar_chart(question): question = data[question].value_counts() label = question.index counts = question.values colors = ['gold', 'lightgreen'] fig = go.Figure(data=[go.Bar(x=label, y=counts, marker_color=colors)]) fig.update_layout(title_text=question) fig.show() vertical_bar_chart(question6)

Python

generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

import json def normal_split(data): return data.split(',') purchases = {} for i, line in enumerate(f): line = json.loads(line.strip()) keys = line['user_id'] values = line['category'] purchases[keys] = values

Python

generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_number(lst): for row in lst: if row[0] == "4004": row[0] = "9090" return lst

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade(year): if year >= 1900 and year <= 1910: return "1900-1910" elif year > 1910 and year <= 1920: return "1910-1920" elif year > 1920 and year <= 1930: return "1920-1930" elif year > 1930 and year <= 1940: return "1930-1940" elif year > 1940 and year <= 1950: return "1940-1950" elif year > 1950 and year <= 1960: return "1950-1960" elif year > 1960 and year <= 1970: return "1960-1970" elif year > 1970 and year <= 1980: return "1970-1980" elif year > 1980 and year <= 1990: return "1980-1990" elif year > 1990 and year <= 2000: return "1990-2000" elif year > 2000 and year <= 2010: return "2000-2010" elif year > 2010 and year <= 2020: return "2010-2020

Python

generate functionMon, 13 Feb 2023

Write a formula that rounds all non-integer values to the nearest integer in the Pandas column

def round_to_nearest(a): return round(a) round_to_nearest(2.5)

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

def split_data(df, y, test_size=0.2, random_state=42): X = df y = y X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state) return X_train, X_test, y_train, y_test split_data(tfidf, cats)

Python

generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python

generate functionSun, 25 Dec 2022

Task Give you two strings: s1 and s2. If they are opposite, return true; otherwise, return false. Note: The result should be a boolean value, instead of a string. The opposite means: All letters of the two strings are the same, but the case is opposite. you can assume that the string only contains letters or it's a empty string. Also take note of the edge case - if both strings are empty then you should return false/False. Examples (input -> output) "ab","AB" -> true "aB","Ab" -> true "aBcd","AbCD" -> true "AB","Ab" -> false "","" -> false

def is_opposite(s1, s2): if not s1 and not s2: return False if s1 and s2: return s1.swapcase() == s2 return False

Python

generate functionMon, 26 Dec 2022

Write a function that takes length as an argument and generates a random sequence of digits of length, with at least one digit 3 and one digit 7 in the sequence. The positions of 3 and 7 must be determined randomly. For example: "456379" or "033456127".

def rand_seq(length): import random seq = "" while True: if "3" in seq and "7" in seq: break seq = "".join([str(random.randint(0, 9)) for num in range(length)]) return seq rand_seq(15)

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a sns factorplot plot with x scale = 'date', y scale = 'difference', col = 'hotel'. You should get 6 graphs (one for each 'hotel')

def plot_difference_by_hotel(df): sns.factorplot( x='date', y='difference', col='hotel', data=df, kind='bar') plot_difference_by_hotel(hotels_rev)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function num_of_hits that will sort the dataframe by the number of values (separated by commas) in the hits column

def num_of_hits(df): df['num_of_hits'] = df.hits.apply(lambda x: len(x.split(', '))) return df.sort_values(by='num_of_hits', ascending=False) num_of_hits(df)

Python

generate functionWed, 21 Dec 2022

Create a function that finds the key with the maximum value in 'Value' and displays it in key - value format: {'AUD': {'ID': 'R01010', {'NumCode': '036', {'CharCode': 'AUD', {'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', Value: 41.4856, Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.611, 'Previous': 83.7015},

def find_max(dictionary): """ Dictionary -> String :param dictionary: example {'AUD': {'ID': 'R01010', 'NumCode': '036', 'CharCode': 'AUD', 'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', 'Value': 41.4856, 'Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.

Python

generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: cook_book = { 'пицца': [ {'ingridient_name': 'сыр', 'quantity': 20, 'measure': 'гр'}, {'ingridient_name': 'колбаса', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'бекон', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'оливки', 'quantity': 10, 'measure': 'гр'}, ], 'лимонад': [ {'ingridient_name': 'лимон', 'quantity': 1, 'measure': 'шт'}, {'ingridient_name': 'вода', 'quantity': 200, 'measure': 'мл'}, {'ingridient_name': 'сахар', 'quantity': 10, 'measure': 'гр'}, {'ingridient_name': 'лайм', 'quantity': 20, 'measure': 'гр'}, ] } dish = input('Выберите блюдо: cалат, пицца, лимонад ') portions = int(input('Введите количество порций: ')) ingredients = cook_book[dish] for i in ingredients: for value in i.values(): print(value)

def dish(dish, portions): cook_book = { 'пицца': [ {'ingridient_name': 'сыр', 'quantity': 20, 'measure': 'гр'}, {'ingridient_name': 'колбаса', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'бекон', 'quantity': 30, 'measure': 'гр'}, {'ingridient_name': 'оливки', 'quantity': 10, 'measure': 'гр'}, ], 'лимонад': [ {'ingridient_name': 'лимон', 'quantity': 1, 'measure': 'шт'}, {'ingridient_name': 'вода', 'quantity': 200, 'measure': 'мл'}, {'ing

Python

generate functionThu, 16 Feb 2023

Write a function that finds all object, string values in a column of numbers

def find_obj_str(df): obj_str = df.select_dtypes(['object', 'str']) return obj_str

Python

generate functionSat, 18 Feb 2023

There are 2 charts in Python: df.plot(kind='scatter', x='User_Score', y='Global_Sales', plt.ylim(0, 40) df.plot(kind='scatter', x='Critic_Score', y='Global_Sales', plt.ylim(0, 40) Rewrite the code so that the two charts are not separate but are subplots.

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10,5)) ax[0].plot(df[['User_Score', 'Global_Sales']], df['Global_Sales'].max()) ax[1].plot(df[['Critic_Score', 'Global_Sales']], df['Global_Sales'].max())

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code that shows the best decade with a maximum rating of 5.0

import pandas as pd df = pd.DataFrame( {'id': [4728, 35638, 21445, 40291, 29462, 46978, 42931, 38670, 25506, 43989], 'title': ['Bad Education (2019)', 'Palooka (1934)', 'High Moon (2019)', 'Saint Maud (2019)', 'Mad at the Moon (1992)', 'The Butterfly Ball (1977)', 'Snowboarďáci (2004)', 'Recon 2020: The Caprini Massacre (2004)', 'Karagoez catalogo 9,5 (1983)', 'Starting a Skyscraper (1902)'], 'rating': [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5], 'decade': ['2010-2020', '1934', '2010-2020', '2010-2020', '1990-2000', '1970-1980', '

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding values of this column are written through the "," sign into a new column hits

def group(df): return df.groupby(['performer'])['song'].apply(', '.join).reset_index()

Python

generate functionTue, 24 Jan 2023

Write a pandas function that groups matching columns traffic_source and region and outputs the values of the third column source_type for them

def get_source_type(traffic_source, region): return df[(df.traffic_source == traffic_source) & (df.region == region)].source_type.values get_source_type('twitter', 'east')

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that will try to make pd.to_numeric for all columns in the dataframe with parameter errors='coerce'.

def make_numeric(df): for col in df.columns: df[col] = pd.to_numeric(df[col], errors='coerce') return df

Python

generate functionWed, 01 Feb 2023

There is a Pandas function: df[['performer', 'time_on_chart']].groupby('performer').agg({'min', 'max'}) Sort it by 'max' column

def sort_by(df, column, ascending=True): return df.sort_values(column, ascending=ascending)

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g

def multiply(dictionary): for i in dictionary[key]: i['quantity'] = i['quantity'] * portions return dictionary multiply(cook_book)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

year_leaders = df.groupby('chart_debut').num_of_hits.transform(max) == df['num_of_hits'] df[year_leaders][['performer', 'num_of_hits']]

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

def fix_data(X_train, X_test, y_train, y_test): return X_train[:y_train.shape[0]], X_test[:y_test.shape[0]], y_train, y_test

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

print(X_train.shape) print(y_train.shape) print(X_test.shape) print(y_test.shape)

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): if n < 2: return str(n) else: return to_binary(n // 2) + str(n % 2)

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

import scipy.stats as stats def mann_whitney_test(data1, data2, alpha=0.05): stat, p = stats.mannwhitneyu(data1, data2) print('Statistics=%.3f, p=%.3f' % (stat, p)) if p > alpha: print('Same distribution (fail to reject H0)') else: print('Different distribution (reject H0)') mann_whitney_test(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds'])

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that 1) create a new dataframe hot_years 2) group data from df_ru by "year" column and take average values in "av_temp" column 3) leave in the hot_years dataframe only those lines where av_temp > 15

import pandas as pd df_ru = pd.read_csv('../data/ft_ru.csv', parse_dates=[0], index_col=[0], dayfirst=True) def hot_years(df): df['year'] = df.index.year hot_years = df.resample('A').mean() hot_years = hot_years[hot_years['av_temp'] > 15] return hot_years hot_years(df_ru)

Python

generate functionWed, 15 Mar 2023

Write code that brings all the values of sl and sw variables to the same scale using the StandardScaler library from the sklearn.preprocessing library: sl = data['sepal_length'] sw = data['sepal_width']

from sklearn.preprocessing import StandardScaler def scale(sl, sw): scaler = StandardScaler() return scaler.fit_transform(sl, sw) scale(sl, sw)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.xticks(rotation = 0) plt.xlabel('nutrient') plt.ylabel('%') plt.title('nutrient vs mineral') plt.grid(axis = 'y') plt.legend(loc = 'best')

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import matplotlib.pyplot as plt import numpy as np def plot(A, B): plt.boxplot([A, B]) plt.show() plt.hist([A, B]) plt.show() A = [1, 2, 3] B = [4, 3, 2] plot(A, B)

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value

import matplotlib.pyplot as plt plt.bar(df.index, df[col_name]) plt.show()

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values in these two lists and replaces the second, third and all other subsequent values with replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnum, students, replacing_num): result = students.copy() for elem in students: if elem[0] == stnum: for i in range(1, len(elem)): elem[i] = replacing_num return result stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

def avgrating(df): #your code here return df

Python

generate functionSun, 18 Dec 2022

Write a function that will check if two given characters are the same case. If either of the characters is not a letter, return -1 If both characters are the same case, return 1 If both characters are letters, but not the same case, return 0 Examples 'a' and 'g' returns 1 'A' and 'C' returns 1 'b' and 'G' returns 0 'B' and 'g' returns 0 '0' and '?' returns -1

def same_case(a, b): if a.isalpha() and b.isalpha(): if a.islower() and b.islower(): return 1 elif a.isupper() and b.isupper(): return 1 else: return 0 else: return -1 same_case('a', 'g') # 1 same_case('A', 'C') # 1 same_case('b', 'G') # 0 same_case('B', 'g') # 0 same_case('0', '?') # -1

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that 1) create a new dataframe hot_years 2) group data from df_ru by "year" column and take average values in "av_temp" column 3) leave in the hot_years dataframe only those lines where av_temp > 15

def hot_years(df): hot_years=df.groupby('year').mean() hot_years=hot_years.loc[hot_years.av_temp>15] return hot_years hot_years(df_ru)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

def duplicates(df): return df[df.name.duplicated(keep=False)].sort_values('name') duplicates(df)

Python

generate functionThu, 15 Dec 2022

Create a function that outputs directory numbers (dictionary keys) by the selected value. directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } For example: input = 2207 876234 output = 1 input = 11-2 output = 1 input = 10006 output = 2 input = 1434 output = 'Nothing found'

def search_goods(): global directories input_number = input('Input directory number: ') for key, value in directories.items(): for item in value: if input_number == item: return key return 'Nothing found'

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have 100 or more values in the "rating" column

import pandas as pd import numpy as np r_cols = ['userId','movieId','rating','timestamp'] ratings = pd.read_csv('ml-latest-small/ratings.csv',usecols=r_cols) #your code here ratings

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

def convert_to_year(x): try: date = pd.to_datetime(x) return date.year except: return None df['year'] = df.date.apply(convert_to_year)

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): #your code here return bin(n)

Python

generate functionMon, 26 Dec 2022

There is a list lst = ['BST161', 'BST162', 'BST163', 'BST161', 'BST161', 'BST164', 'BST162', 'BST161'] Write a function that counts the number of repetitions of the values in this list and displays them on the screen

def count_list(list): count_list = {} for i in list: if i in count_list: count_list[i] += 1 else: count_list[i] = 1 return count_list count_list(a)

Python

generate functionMon, 27 Mar 2023

The code below gives an error TypeError: unhashable type: 'list'. Fix it from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

def tokenize(text): return tokens_re.findall(text) def preprocess(text): tokens = tokenize(text) tokens = [token for token in tokens if not token in stop_words] tokens = [wordnet_lemmatizer.lemmatize(token) for token in tokens] return tokens

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate Wilcoxon statistics stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu def wilcoxon(a, b): stat, p = mannwhitneyu(data[data['version'] == a]['sum_gamerounds'], data[data['version'] == b]['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p)) wilcoxon('gate_30','gate_40')

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose(matrix): ''' Transpose a matrix ''' matrix_t = [] for i in range(len(matrix[0])): row = [] for j in range(len(matrix)): row.append(matrix[j][i]) matrix_t.append(row) return matrix_t

Python

generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: cook_book = { 'pizza': [ {'ingridient_name': 'cheese', 'quantity': 20, 'measure': 'g'}, {'ingridient_name': 'sausage', 'quantity': 30, 'measure': 'gr'}, {'ingridient_name': 'bacon', 'quantity': 30, 'measure': 'gr'} {'ingridient_name': 'olives', 'quantity': 10, 'measure': 'gr'} ]} dish = input('choose a dish: salad, pizza, lemonade ') portions = int(input('Enter number of portions: ')) ingredients = cook_book[dish] for i in ingredients: for value in i.values(): print(value)

def change_quantity(ingridient, portions): return ingridient['quantity'] * portions change_quantity(ingridient, portions)

Python

generate functionWed, 22 Feb 2023

Modify this code to make a horizontal bar graph instead of a vertical (plotly.express library) question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = go.Figure(data=[go.Bar(x=label, y=counts)]) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

import plotly.express as px def horizontal_bar(question, title): question = df[question].value_counts() label = question.index counts = question.values fig = px.bar(x=label, y=counts, orientation='h') fig.update_layout(title_text=title) fig.show() horizontal_bar('What is the most preferred working environment for you?', 'Какая рабочая среда для вас наиболее предпочтительна?')

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer time_on_chart min max "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by max columns, in descending order, and display the first 20 rows

data = data.sort_values(by = ['max'], ascending = False).head(20) data

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

import pandas as pd r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp'] ratings = pd.read_csv('u.data', sep='\t', names=r_cols,encoding='latin-1') ratings = ratings.drop('unix_timestamp', axis=1) ratings.head()

Python

generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python

generate functionThu, 15 Dec 2022

Refine the function below to handle errors 1) if shelf variable is not found in directories, it will show ERROR NO SUCH KEY 2) if doc variable is not found in directories, it will show ERROR NO SUCH VALUE def change_shelf(data): docnum = '11-2' shelf = '3' if list(docnum) in data.values(): print('Error') for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data

def change_shelf(data): docnum = '11-2' shelf = '3' if list(docnum) in data.values(): print('Error') for key, val in data.items(): for doc in val: if doc == docnum: data[shelf].append(doc) data[key].remove(doc) print('OK') return data

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

from scipy import stats from math import sqrt def interval(n, mean, sig, conf): h = sig * stats.norm.ppf((1 + conf) / 2) / sqrt(n) return int(2 * h)

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a for list that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

def sum_list(lst): sum = 0 for i in range(len(lst)): for j in range(len(lst)): if i != j: sum += lst[i] + lst[j] return sum sum_list([2, 1, 10, 5])

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that finds all rows with letters in the column and replace them with NaN. Example: find_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

df[df['loan_amnt'].str.contains('[a-z]', flags=re.IGNORECASE, regex=True)]

Python

generate functionWed, 01 Feb 2023

There is a Pandas function: df[['performer', 'time_on_chart']].groupby('performer').agg({'min', 'max'}) Sort it by 'max' column

df[['performer', 'time_on_chart']].groupby('performer').agg({'min', 'max'}).sort_values('max', ascending = False)

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

class Point: def __init__(self, x, y): self.x = x self.y = y def distance(a, b): return math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2)

Python

generate functionMon, 13 Feb 2023

There is Matplotlib chart code: df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False) plt.show() Improve this code to make the fonts size 14 and the boxplot lines bold 3

df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False) plt.rcParams.update({'font.size': 14}) plt.rcParams["axes.linewidth"] = 3 plt.show()

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt def linegraph(df): plt.plot(df.index, df['2015'], label = "2015") plt.plot(df.index, df['2016'], label = "2016") plt.plot(df.index, df['2017'], label = "2017") plt.plot(df.index, df['2018'], label = "2018") plt.plot(df.index, df['2019'], label = "2019") plt.legend() plt.show() linegraph(df)

Python

generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

from sklearn.metrics import f1_score y_pred = lda_model.predict(X_val) f1_score(y_val, y_pred)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas as pd import datetime import numpy as np df_ratings = pd.read_csv('C:/Users/User/Downloads/ml-latest-small/ratings.csv') def diff_pd(x): return x.max() - x.min() df_ratings.groupby('userId')['timestamp'].agg([diff_pd]).mean()

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def rock(dataframe): for i in range(0, len(dataframe)): if(dataframe['Class 1'][i] == 'Rock' or dataframe['Class 2'][i] == 'Rock'): dataframe.drop(i, inplace = True) return dataframe rock(df)

Python

generate functionWed, 18 Jan 2023

Write a function that calculates the value_counts in the decade pandas column of the dataview if the rows in the rating column = 5.0

def count5_decade(df): df = df[df.rating == 5.0] return df.decade.value_counts() count5_decade(ratings)

Python

generate functionWed, 22 Mar 2023

Write code that correctly compares two different dataframes: p['Speed']] boosted_p['Speed'] using the stats.f_oneway method in Python

import pandas as pd from scipy import stats df1 = pd.read_csv(url, sep='\s+', skiprows=3, nrows=4) df2 = pd.read_csv(url, sep='\s+', skiprows=7, nrows=4) df1.columns = ['A', 'B', 'C'] df2.columns = ['A', 'B', 'C'] print(stats.f_oneway(df1['A'], df2['A'])) print(stats.f_oneway(df1['B'], df2['B'])) print(stats.f_oneway(df1['C'], df2['C']))

Python

generate functionThu, 15 Dec 2022

Create a function that allows you to move values between directories keys with nested lists inside. It should check for: 1) if the specified key is in the dictionary and show ERROR NO SUCH KEY if it does not exist 2) if the value is present in the dictionary and show ERROR NO SUCH VALUE if it doesn't exist 3) if both the key and the value are in the dictionary, the function should move the value v to another k and return a new dictionary. Note that dictionary values are lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } v = '11-2' k = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, str1, str2): # write your code here dic = directories if str1 not in dic: return 'ERROR NO SUCH KEY' if str2 not in dic: dic[str2] = [] if str1 in dic: if str2 not in dic[str1]: return 'ERROR NO SUCH VALUE' dic[str2].append(str2) for i in range(len(dic[str1])): if str2 in dic[str1][i]: dic[str1].remove(str2) return dic print(move(directories, '11-2', '3'))

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd df = pd.read_csv('data/charts.csv') dfs = df.copy() def chart_debut_format(chart_debut): return chart_debut[:4] dfs['chart_debut'] = dfs['chart_debut'].apply(chart_debut_format) dfs.head()

Python

generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68]

def multiple_of_index(arr): return [num for i, num in enumerate(arr) if num % i == 0 and i != 0] multiple_of_index([22, -6, 32, 82, 9, 25])

Python

generate functionTue, 24 Jan 2023

there are two Pandas dataframes: rzd client_id rzd_revenue 0 111 1093 1 112 2810 2 113 10283 3 114 5774 4 115 981 auto client_id auto_revenue 0 113 57483 1 114 83 2 115 912 3 116 4834 4 117 98 Create a new dataframe joined and merge these 2 dataframes inside it

def merge_df(df1, df2): # Combine the dataframes on client_id return df1.merge(df2, on='client_id') merge_df(rzd, auto)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 convert the id column from float format to int

df_new.id = df_new.id.astype(int)

Python

generate functionTue, 20 Dec 2022

Change the date_range function to return a list of all dates between start and end in list def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return []

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point(object): def __init__(self, x=0, y=0): self.x = x self.y = y def distance(self, other): return math.sqrt(math.pow((other.x - self.x),2) + math.pow((other.y - self.y),2))

Python

generate functionTue, 21 Feb 2023

Change the code to have a logarithmic scale on the y-axis (scale) installs.plot(kind='bar', title='Топ 10 издателей по продажам видеоигр', xlabel='Издатель', ylabel='Количество игр')

installs.plot(kind='bar', title='Топ 10 издателей по продажам видеоигр', xlabel='Издатель', ylabel='Количество игр', logy=True)

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that finds all lines with the value "?" and counts them

def count(df): return df[df == '?'].count()

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with list comprehension that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

def sum_all(arr): return sum([x+y for x in arr for y in arr]) sum_all([2, 1, 10, 5])

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def extract_year(title): return title.split('(')[1].split(')')[0] df['year'] = df['title'].apply(extract_year)

Python

generate functionThu, 15 Dec 2022

There is a dictionary: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [], '10': ['abc'] } Make a function that outputs all dictionary keys in the format: 1, 2, 3, 10

def make_list(direct): return ','.join(list(direct.keys())) make_list(directories)

Python

generate functionTue, 13 Dec 2022

create a password verification function. The verification conditions are: the length should be bigger than 6; should contain at least one digit, but it cannot consist of just digits; if the password is longer than 9 - previous rule (about one digit), is not required. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("short54") == True assert is_acceptable_password("muchlonger") == True assert is_acceptable_password("ashort") == False

def is_acceptable_password(password: str) -> bool: return len(password) > 6 and any(i.isdigit() for i in password) and not any(j.isdigit() for j in password[-9:])

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np A = np.array([6, 8, 8, 10, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 18, 18, 18, 18, 22]) B = np.array([10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20])

Python

generate functionWed, 22 Mar 2023

There are two variables p.speed and boosted_p.speed. Plot two boxplots with their values next to each other

def plot_two_boxplots(speed, boosted_speed): import numpy as np import matplotlib.pyplot as plt plt.boxplot([speed, boosted_speed]) plt.show()

Python

generate functionSat, 18 Feb 2023

regions = ['North America', 'Europe', 'Japan', 'Other'] na_sales = df['NA_Sales'].sum() eu_sales = df['EU_Sales'].sum() jp_sales = df['JP_Sales'].sum() oth_sales = df['Other_Sales'].sum() Write a function that create a Pandas dataframe with this data. Example output: North America 1111 Europe 1234 Japan 3456 Other 5678

def sales_df(regions, na_sales, eu_sales, jp_sales, oth_sales): return df sales_df(regions, na_sales, eu_sales, jp_sales, oth_sales)

Python

generate functionWed, 22 Mar 2023

Write code that will correctly compare two different dataframes (excluding NaN and string values): p['Speed']] boosted_p['Speed'] using the stats.f_oneway method in Python

from scipy import stats stats.f_oneway(p['Speed'], boosted_p['Speed'])

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] add them to 3 separate columns of dataframe df, name the columns A, B, C

def add_columns(a, b, c): df['A'] = a df['B'] = b df['C'] = c add_columns(water, nutri, mineral)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column in the rating column put the average rating which the user gave

df.groupby('userId').filter(lambda x: len(x) >= 100) df.groupby('userId').mean()

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write code that shows the best decade with a maximum rating of 5.0

# import pandas import pandas as pd # import the data data = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2023/datasets/imdb_1000.csv') # check the data data # check the shape of the data data.shape # drop the rows with null values data.dropna(inplace=True) # drop the rows with null values data.dropna(inplace=True) # check the shape of the data data.shape # find the best decade data.groupby('decade')['rating'].mean().sort_values(ascending=False)

Python

generate functionFri, 17 Mar 2023

Write a function that will determine the size of the necessary sample for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number by round function.

import math def sample_size(delta, sigma, conf): z = 0.5 * (1 + conf) return math.ceil(2 * (z ** 2) * (sigma ** 2) / (delta ** 2)) sample_size(0.1, 0.3, 0.95)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: 35929 Passage de Venus (1874) 3.027778 1874 40331 Sallie Gardner at a Gallop (1878) 2.909091 1878 4195 Athlete Swinging a Pick (1880) 2.666667 1880 8085 Buffalo Running (1883) 2.636364 1883 29860 Man Walking Around a Corner (1887) 1.750000 1887 53932 Traffic Crossing Leeds Bridge (1888) 2.375000 1888 36445 Pferd und Reiter Springen Über ein Hindernis (... 2.583333 1888 1778 Accordion Player (1888) 1.928571 1888 39900 Roundhay Garden Scene (1888) 2.605263 1888 Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade(x): if x.isdigit(): decade = int(x) decade = decade/10 decade = int(decade) decade = decade*10 return str(decade) + "-" + str(decade+10) else: return float('NaN') df['Decade of Release'] = df['Year'].apply(decade) df

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python

generate functionThu, 16 Feb 2023

Write a function that finds all 'object' values in a Pandas dataframe column 'values'

def object_finder(dataframe, column): return dataframe[dataframe[column].apply(lambda x: type(x) == object)] object_finder(data, 'values')

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

from plotly.subplots import make_subplots fig = make_subplots(rows=1, cols=2) fig.add_trace(px.bar(x=label, y=counts, orientation='v'), 1, 1) fig.add_trace(px.bar(x=label, y=counts, orientation='v'), 1, 2) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?', showlegend=False) fig.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): df2 = df[["performer", "hits"]].groupby(df['chart_debut']).max() return df2

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a for list that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

def magic(arr): result = 0 for i in range(len(arr)): for j in range(i+1, len(arr)): result = result + (arr[i] + arr[j]) return result magic(arr)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def add(a, b): return a - b add(1, 2)

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer, but not less than 1 and not more than 4: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def calculate_respiratory_rate(pulse, respiratory_rate): if respiratory_rate == respiratory_rate: return respiratory_rate else: return pulse/pulse + 0.5 calculate_respiratory_rate(60, np.nan) calculate_respiratory_rate(60, 12)

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

# Update x axis fig.update_xaxes( title_text="Количество ответов", tickvals=counts, ticktext=label ) # Set y-axis title fig.update_yaxes(title_text="Ваш выбор") fig.show()

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) print(dfp_copy) return dfp_copy

Python

generate functionMon, 27 Mar 2023

Write code that converts this Series object to an array object. data1 = df['Critic_Score'] The result should output data1 in the following format: array([43.48164727, 53.46433221, 45.08411794, 61.7768157 , 37.78951321, 48.271562 , 42.16219046, 46.88876532, 50.92428925, 43.03283088...])

data1 = df['Critic_Score'] data1.to_numpy()

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_numbers(stnum, stus, repnum): for num, stu in enumerate(stus): if stu[0] == stnum: stu[0] = repnum return stus replace_numbers('4004', students, '9090')

Python

generate functionTue, 13 Dec 2022

Write a function that checks if the word "password" is in a string (any part of it)

def check_password(string): if "password" in string: return True else: return False check_password("asdfasdfpasswordasdfasdf")

Python

generate functionWed, 22 Mar 2023

This code gives an error TypeError: '>' not supported between instances of 'str' and 'float' fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) Fix it!

def anova(data): fvalue, pvalue = stats.f_oneway(p.Speed, boosted_p.Speed) return (fvalue, pvalue)

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def checkio(delta, sigsqr, conf): # Your code here # It's main function. Don't remove this function # It's using for auto-testing and must return a result for check. import scipy.stats as st import numpy as np n = ((st.norm.ppf((1+conf)/2))**2*sigsqr)/(delta**2) return int(n) # Some hints # You can use stats.norm.ppf function for the normal distribution #These "asserts" using only for self-checking and not necessary for auto-testing if __name__ == '__main__': assert checkio(0.02, 0.04, 0.95) == 491 assert checkio(0.02, 0.005, 0.95) == 4127 print("Coding complete? Click 'Check' to earn cool rewards!")

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): pass

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example: dish = 'salad' portions = 5 The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]}

def total_ingridients(): dish = 'salad' portions = 5 cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} print(f'{dish}') for ing in cook_book[dish]: print(f'{ing["ingridient_name"]}: {ing["quantity"]*portions}{ing["measure"]}')

Python

generate functionTue, 24 Jan 2023

Write a pandas function that groups matching columns traffic_source and region and outputs the values of the third column source_type for them

def group_by_two_columns(dataframe, col1, col2, col3): return df.groupby(['col1','col2'])['col3'].value_counts()

Python

generate functionWed, 22 Mar 2023

Write code that correctly compares two different dataframes: p['Speed']] boosted_p['Speed'] using the stats.f_oneway method in Python

stats.f_oneway(*(df[col] for col in df.columns))

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): return df.groupby(['chart_debut'])['num_of_hits'].max()

Python

generate functionTue, 13 Dec 2022

create a password verification function. The verification conditions are: the length should be bigger than 6; should contain at least one digit, but it cannot consist of just digits; if the password is longer than 9 - previous rule (about one digit), is not required. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("short54") == True assert is_acceptable_password("muchlonger") == True assert is_acceptable_password("ashort") == False

def is_acceptable_password(password: str) -> bool: if len(password) > 6: if len(password) > 9: return True if password.isdigit(): return False else: return True else: return False

Python

generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches fig = plt.figure() ax = fig.add_subplot(111,aspect='equal') vec_1 = np.array([1,1]) vec_2 = np.array([1,0]) x_coords = [0, vec_1[0], vec_2[0]] y_coords = [0, vec_1[1], vec_2[1]] plt.plot(x_coords, y_coords, color='black') plt.arrow(0, 0, vec_1[0], vec_1[1], head_width=0.1, head_length=0.1, color='black') plt.arrow(0, 0, vec_2[0], vec_2[1], head_width=0.1, head_length=0.1, color='black') angle = np.arccos(np.dot(vec_1,vec_2)

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return *** print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return sum(my_matrix.diagonal()) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15))

Python

generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight(df): return df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') return df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') #your code here

Python

generate functionThu, 15 Dec 2022

Create a function that adds the following variables docnum = '42' doctype = 'multipassport' docowner = 'R2D2' shelf = 3 into lists and dictionaries with the corresponding structures documents = [ {'type': doctype, 'number': docnum, 'name': docowner}, ] directories = { shelf: [] }

def new_doc_add(): docnum = input('Enter the number of your document: ') doctype = input('Enter the type of your document: ') docowner = input('Enter the owner of your document: ') shelf = input('Enter the shelf number: ') documents.append({'type': doctype, 'number': docnum, 'name': docowner}) if shelf in directories: directories[shelf].append(docnum) else: directories[shelf] = [docnum]

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks hits Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 Stupid Cupid Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 Chantilly Lace Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Chantilly Lace

Python

generate functionTue, 21 Mar 2023

Calculate mean, median values of this list water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1]

def mean(numbers): total_sum = 0; for n in numbers: total_sum += n count = len(numbers) avg = total_sum / count return avg def median(numbers): numbers.sort() #sort the list count = len(numbers) #get the length of the list isEven = count % 2 == 0 #check if this list is of even length if (isEven): #find the two numbers in the middle of the list mid = math.floor( count / 2 ) a = numbers[mid - 1] b = numbers[mid] #find the average of these two numbers ans = mean([a, b]) else: ans = numbers[math.floor( count / 2 )] return ans

Python

generate functionThu, 30 Mar 2023

There is a matrix A1 = array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) Multiply the vector vector = [1,2,3,4] by the matrix A1

import numpy as np A1 = np.array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) vector = [1,2,3,4] A1.dot(vector)

Python

generate functionSat, 18 Feb 2023

regions = ['North America', 'Europe', 'Japan', 'Other'] na_sales = df['NA_Sales'].sum() eu_sales = df['EU_Sales'].sum() jp_sales = df['JP_Sales'].sum() oth_sales = df['Other_Sales'].sum() Write a function that create a pd.DataFrame with this data. Example output: North America 1111 Europe 1234 Japan 3456 Other 5678

def create_sales_df(df): return pd.DataFrame({'regions': ['North America', 'Europe', 'Japan', 'Other'], 'sales': [df['NA_Sales'].sum(), df['EU_Sales'].sum(), df['JP_Sales'].sum(), df['Other_Sales'].sum()]}) create_sales_df(df)

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_row_with_rock(df): # your code here

Python

generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

import re def remove_quotes(d): for k, v in d.items(): k = re.sub(r" \"", "", k) v = re.sub(r" \"", "", v) d[k] = v return d remove_quotes({' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}'})

Python

generate functionTue, 20 Dec 2022

Change the date_range function to count the difference between end and start and then loop a list of all dates in that range def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return [] Examples: Input: date_range('2022-12-01', '2022-12-05') Output: ['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05']

from datetime import date, timedelta as td def date_range(start_date, end_date): lst = [] if start_date > end_date: return [] else: try: start = (dt.strptime(start_date, '%Y-%m-%d')) end = (dt.strptime(end_date, '%Y-%m-%d')) while start <= end: lst.append(start.strftime('%Y-%m-%d')) start += td(days=1) return lst except: return []

Python

generate functionSat, 25 Feb 2023

rich = df[df.annual_inc >= 1000000] Rewrite the code above so that it takes into account the range from 100000 to 1000000

def millionaire(df): rich = df[df.annual_inc >= 1000000] return rich millionaire(df)

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 5 2021-07-01 Gurzuf Alpina 14 12 38736 10878 8190 19668 Write a function divide_hotels that creates new columns big_hotels, medium_hotels, small_hotels, and adds values according to the following conditions: 1) if df['total_rooms'] > 30, then profit value is substituted into big_hotels column 2) if df['total_rooms'] > 20, then the profit value is substituted in the medium_hotels column 3) if df['total_rooms'] > 10, then profit value is substituted for small_hotels column

def divide_hotels(df): df['big_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] > 30 else 0, axis=1) df['medium_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] <= 30 and x['total_rooms'] > 20 else 0, axis=1) df['small_hotels'] = df.apply(lambda x: x['profit'] if x['total_rooms'] <= 20 and x['total_rooms'] > 10 else 0, axis=1) return df df = divide_hotels(df) df.head()

Python

generate functionWed, 18 Jan 2023

there is a dictionary: geo_data = {'Center': ['Moscow', 'Tula', 'Yaroslavl'], {'Northwest': ['Petersburg', 'Pskov', 'Murmansk'], 'Far East': ['Vladivostok', 'Sakhalin', 'Khabarovsk']} Write a function that will search the strings for the value of the dictionary, and if it exists, it will return the key of the dictionary. For example: geo_class('Pskov') Result: 'Northwest'

geo_data = {'Center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['Petersburg', 'Pskov', 'Murmansk'], 'Far East': ['Vladivostok', 'Sakhalin', 'Khabarovsk']} def geo_class(city): for region in geo_data: if city in geo_data[region]: return region geo_class('Pskov')

Python

generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

def my_evaluation(x_train, y_train, x_test, y_test): # your code return f1, precision, recall

Python

generate functionSat, 04 Mar 2023

Add the x-axis value captions to the code of this graph and change its color to purple gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar')

# your code here

Python

generate functionTue, 24 Jan 2023

there is a Pandas dataframe: timestamp visit_id url region user_id traffic_source 0 1549980692 e3b0c44298 https://host.ru/3c19b4ef7371864fa3 Russia b1613cc09f yandex 1 1549980704 6e340b9cff https://host.ru/c8d9213a31839f9a3a Germany 4c3ec14bee direct 2 1549980715 96a296d224 https://host.ru/b8b58337d272ee7b15 USA a8c40697fb yandex 3 1549980725 709e80c884 https://host.ru/b8b58337d272ee7b15 Italy 521ac1d6a0 yandex 4 1549980736 df3f619804 https://host.ru/b8b58337d272ee7b15 Russia d7323c571c yandex Create a new dataframe summary in which count how many traffic_source values relate to each region

def count_sources_per_region(df): return df.groupby(['region'])['traffic_source'].count()

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) df2.groupby('pulse')['respiratory_rate'] = round(df2.groupby('pulse')['respiratory_rate'])

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_num(stnums, students, replacing_num): for num in stnums: for st in students: if st[0] == num: st[1] = replacing_num return students replace_num(stnums, students, replacing_num)

Python

generate functionFri, 23 Dec 2022

There is a list lst = '[0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1]' Glue it on one line without commas and spaces to get '01011110001100111'

def glue(x): str_ = '' for i in x: str_+=str(i) return str_ glue(lst)

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose_matrix(A): rows = len(A) cols = len(A[0]) B = [[0 for i in range(0, rows)] for j in range(0, cols)] for i in range(0, rows): for j in range(0, cols): B[j][i] = A[i][j] return B

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.plot(x, y, 'r--')

Python

generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

def cond_to_float(X_train, X_test, y_train, y_test): y_train = y_train.astype(np.float) y_test = y_test.astype(np.float) X_train = X_train.astype(np.float) X_test = X_test.astype(np.float) return (X_train, X_test, y_train, y_test)

Python

generate functionMon, 12 Dec 2022

Create a function that returns the CSV representation of a two-dimensional numeric array. Example: input: [[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]] output: '0,1,2,3,4\n' +'10,11,12,13,14\n' +'20,21,22,23,24\n' +'30,31,32,33,34' Array's length > 2.

def to_csv_string(array): return '\n'.join(','.join(map(str, x)) for x in array)

Python

generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def init(self, name, seniority, awards=2): super().init(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python

generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Python

generate functionWed, 22 Mar 2023

Write code that will correctly compare two different dataframes (excluding NaN and string values): p['Speed']] boosted_p['Speed'] using the stats.f_oneway method in Python

def anova(df): cat = df.select_dtypes(include=['object']).columns num = df.select_dtypes(include=['float64']).columns num = num.drop('Time') all_columns = num f, p = stats.f_oneway(df[all_columns].values) return f, p

Python

generate functionThu, 16 Feb 2023

Write a function that finds all object, string values in a Pandas dataframe column 'values'

def find_values(df): column = df['values'] return column.str.contains('a')

Python

generate functionSat, 28 Jan 2023

Construct 4 seaborn bar graphs from these data: 1) df1['city'].value_counts() 2) df2['hotel'].value_counts() 3) df2['date'].value_counts() 4) df2['rating'].value_counts()

def bar_graphs(df1, df2): plt.figure(figsize=(15,7)) plt.subplot(2,2,1) sns.barplot(df1['city'].value_counts().index, df1['city'].value_counts().values) plt.xticks(rotation=40) plt.title('City') plt.subplot(2,2,2) sns.barplot(df2['hotel'].value_counts().index, df2['hotel'].value_counts().values) plt.xticks(rotation=40) plt.title('Hotel') plt.subplot(2,2,3) sns.barplot(df2['date'].value_counts().index, df2['date'].value_counts().values) plt.xticks(rotation=40) plt.title('Date') plt

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate stats instead of mannwhitneyu by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import wilcoxon stat, p = wilcoxon(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer, but not less than 1 and not more than 4: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round_to_four(a, b): return round(a + b) round_to_four(1.2, 3.1)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

def year_leaders(df): df=df.groupby(df.chart_debut).apply(lambda x: x.sort_values('num_of_hits',ascending=False)).reset_index(drop=True) df=df.groupby('chart_debut').head(1) return df year_leaders(df)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

# Добавляем колонку для проверки df['duplicate_name'] = df['name'].duplicated() # Проверяем колонку и выводим на экран df [df['duplicate_name'] == True]

Python

generate functionMon, 27 Mar 2023

There is a dataframe df with two columns: Category and Message in the following format: Category Message 0 ham Go until jurong point, crazy... Available only ... 1 ham Ok lar... Joking wif u oni... 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 3 ham U dun say so early hor... U c already then say... 4 ham Nah I don't think he goes to usf, he lives aro... Split it into samples X = df['Message'] y = df['Category'] so that there are float numbers instead of object text everywhere.

def split_dataframe(df): X = df['Message'] y = df['Category'] return X, y split_dataframe(df)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_by_title(df): return df.groupby('title')['rating'].mean()

Python

generate functionSun, 18 Dec 2022

Complete the function that takes a non-negative integer n as input, and returns a list of all the powers of 2 with the exponent ranging from 0 to n ( inclusive ). Examples n = 0 ==> [1] # [2^0] n = 1 ==> [1, 2] # [2^0, 2^1] n = 2 ==> [1, 2, 4] # [2^0, 2^1, 2^2]

def powers_of_two(n): return [2**x for x in range(n+1)]

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that will try to change the Dtype of all columns to float. If this fails, then except pass

def change_col_type(df, col_type): try: return df.astype(col_type) except: pass df2 = change_col_type(df, 'float')

Python

generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

Python

generate functionFri, 27 Jan 2023

There is a code for Pandas df4['loyal_profit'] = df4.apply(lambda x: (x.profit / x.ocup_rooms) if 'regular_customer' in x.how_find_us else None, axis=1) This code gives out an error. Rewrite it so that it searches not only for 'regular_customer' values but also for 'by_recommendation' values

df4['loyal_profit'] = df4.apply(lambda x: (x.profit / x.ocup_rooms), axis=1)

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: id grade rate_group 0 1077501 B 10-11% 1 1077430 C 15-17% 2 1077175 C 15-17% 3 1076863 C 13-14% 4 1075358 B 12-13% 5 1075269 A 7-8% 6 1069639 C 15-17% 7 1072053 E 17-25% 8 1071795 F 17-25% 9 1071570 B 12-13% Modify the dataframe so that grades are in columns and rate_group in rows.

df = pd.DataFrame({'grade': ['A','B','C','D','E','F'], 'rate_group': ['7-8%', '10-11%', '12-13%', '15-17%', '17-25%', '17-25%', '17-25%'], 'id': [1077501, 1077430, 1077175, 1076863, 1075358, 1075269, 1069639, 1072053, 1071795, 1071570]}) df.set_index(['id', 'rate_group'], inplace=True) df.unstack(level='rate_group')

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! Example of program operation: portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g

def get_quantity(cook_book, key, portions): grams = 0 if key == 'salad': grams = cook_book[key][0]['quantity'] * portions print(f"{cook_book[key][0]['ingridient_name']}: {grams} {cook_book[key][0]['measure']}") grams = cook_book[key][1]['quantity'] * portions print(f"{cook_book[key][1]['ingridient_name']}: {grams} {cook_book[key][1]['measure']}") grams = cook_book[key][2]['quantity'] * portions print(f"{cook_book[key][2]['ingridient_name']}: {grams} {cook_book[key][2]['measure']}")

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

import pandas as pd df = pd.DataFrame({ 'user_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Svetlana Kuznecova', 'Evgeniy Laptev', 'Ivan Ryzhkov', 'Sidorov Nikolay', 'Nikolay Ivanov', 'Natalya Volkova', 'Maksim Petrov', 'Maksim Petrov', 'Viktor Fomichev', 'Ulyana Selezneva'], 'date': ['2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-01', '2021-07-02', '2021-07-02', '2021-07-02', '

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round_up(x): if x - math.floor(x) < 0.5: return math.floor(x) return math.ceil(x) df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform(round_up))

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe in this format: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 The Seventh Seal (1957) 5.0 9 Bridge Over the River Kwai (1957) 4.0 Write a function that will pull the year of the movie from its title and add it to the "year" column

def get_year(title): return int(title.split()[-1][1:-1]) df['year'] = df['title'].apply(get_year)

Python

generate functionMon, 27 Mar 2023

There is a variable tfidf that contains numbers in this format: [(0, -0.34657359027997264), (1, -0.34657359027997264), (2, -0.34657359027997264), (3, -0.34657359027997264), (4, -0.34657359027997264), (5, -1.0986122886681098)...] When loading the cosine similarities code from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows]) The following error appears: TypeError: unhashable type: 'list'. Fix this error

import numpy as np tfidf = np.array(tfidf) from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

Python

generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x, y: x*y, a) - reduce(lambda x, y: x*y, b))

Python

generate functionThu, 22 Dec 2022

create function that find nearest square number, nearest_sq(n), of a positive integer n

import math def nearest_sq(n): return round(math.sqrt(n)) ** 2

Python

generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def init(self, name, seniority): super().init(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python

generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def top20(df): gdpdiff = pd.DataFrame({'Top1': df[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) return gdpdiff top20(df19)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

def data_clean(df): df['chart_debut'] = df['chart_debut'].str.split('-').str[0] return df data_clean(df)

Python

generate functionFri, 24 Feb 2023

There is a Pandas series named mean_c: country_code AFG 38.682 AGO 44.554 ALB 74.068 AND NaN ARE 68.607 ARG 78.796 ARM 72.590 ATG NaN AUS 90.716 AUT 88.413 Write a function that sorts them in descending numerical order

mean_c.sort_values(ascending=False)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

import pandas as pd ratings = pd.read_csv('ratings.csv') def aver_lifetime(data): data['date'] = pd.to_datetime(data['timestamp'], unit='s').dt.date lifetimes = data.groupby('userId')['date'].agg(['min', 'max']) lifetimes['lifetime'] = lifetimes['max'] - lifetimes['min'] return lifetimes['lifetime'].mean() aver_lifetime(ratings)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: 35929 Passage de Venus (1874) 3.027778 1874 40331 Sallie Gardner at a Gallop (1878) 2.909091 1878 4195 Athlete Swinging a Pick (1880) 2.666667 1880 8085 Buffalo Running (1883) 2.636364 1883 29860 Man Walking Around a Corner (1887) 1.750000 1887 53932 Traffic Crossing Leeds Bridge (1888) 2.375000 1888 36445 Pferd und Reiter Springen Über ein Hindernis (... 2.583333 1888 1778 Accordion Player (1888) 1.928571 1888 39900 Roundhay Garden Scene (1888) 2.605263 1888 Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def assignDecade(x): if type(x) is str: return np.nan else: return (str(int(x[:3]))+"0-") + (str(int(x[:3])+1)+"0") df['Decade of Release'] = df['Year'].map(assignDecade) df

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round_nearest_int(x): return round(x)

Python

generate functionFri, 17 Mar 2023

Write a function that will determine the size of the necessary sample for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number by round function.

def sample_size(error, sigsqr, conf): return (error**2)*sigsqr/(1.96**2)

Python

generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_min_max(df1): return df1.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;').apply(highlight_min, axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') highlight_min_max(df2)

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_nums(students, stnums, replacing_num): for st in students: for stnum in stnums: if stnum in st: st.remove(stnum) st.append(replacing_num) return students print(replace_nums(students, stnums, replacing_num))

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

def f(dframe): years = dframe.columns.get_level_values(0).get_level_values(0).unique() fig = plt.figure() ax = fig.add_subplot(111) for col in dframe.columns: dframe[col].plot(kind='line', ax=ax) plt.xticks(years) plt.show() f(df)

Python

generate functionFri, 24 Feb 2023

Write a function for the Pandas dataframe that will delete the lines with the names spi_rank and country

import pandas as pd # Create DataFrame df = pd.DataFrame({'spi_rank' : [21,19,20], 'country': ['Australia', 'Canada', 'Chile']}) def function(df): del df['spi_rank'] del df['country'] return df function(df)

Python

generate functionSat, 18 Feb 2023

There is a Pandas dataframe: regions sales 0 North America 4402.62 1 Europe 2424.67 2 Japan 1297.43 3 Other 791.34 Create a piechart with this data using Matplotlib

regions = ['North America', 'Europe', 'Japan', 'Other'] sales = [4402.62, 2424.67, 1297.43, 791.34] plt.pie(sales, labels=regions, autopct='%1.1f%%') plt.show()

Python

generate functionWed, 15 Mar 2023

load_iris from the sklearn library

from sklearn import datasets import matplotlib.pyplot as plt import pandas as pd import numpy as np def load_iris(): iris = datasets.load_iris() return iris iris = load_iris()

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a for loop that sums all the numbers in the list with each other The result should be this: (2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) = 54

#create a list arr = [2, 1, 10, 5] def sum_from_list(arr): result = [] for idx, num in enumerate(arr): for idx_2, num_2 in enumerate(arr): if idx_2 > idx: result.append(num + num_2) return result sum_from_list(arr)

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig * stats.norm.ppf((1 + conf) / 2) / (n ** 0.5) return int(round(h * 2))

Python

generate functionSat, 25 Feb 2023

Write code that finds all rows in the rate_group column of Pandas with "%" at the end and removes it with regex. Rows without "%" must be replaced with np.nan.

import pandas as pd import numpy as np df = pd.DataFrame({'rate_group': ['new', 'new%', 'old', 'old%', 'new%']}) print(df)

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that will try to change the Dtype of all columns to float. If this fails, then except pass

def change_dtype(df): for col in df.columns: try: df[col] = df[col].astype(float) except: pass return df df = change_dtype(df)

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt df.plot(x='year') plt.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas function: df[['performer', 'time_on_chart']].groupby('performer').agg({'min', 'max'}) Sort it by 'max' column

df = pd.DataFrame([['a', 1, 2], ['a', 2, 3], ['b', 2, 3]], columns = ['performer', 'min', 'max']) def sort_by_max(df, groupby_column, column_to_sort): return df[[groupby_column, column_to_sort]].groupby(groupby_column).agg({'min', 'max'}).sort_values(by=column_to_sort) sort_by_max(df, 'performer', 'max')

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def get_shop_list_by_dishes(dishes, person_count): cook_book = { 'salad': [ {'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'gr'}], 'cucumbers': [ {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'gr'}, {'ingridient_name': 'pepper', 'quantity': 8, 'measure': 'gr'}, {'ingridient_name': 'olives', 'quantity': 8, 'measure': 'gr'}, {'ingridient_name': 'olive oil', 'quantity': 30, 'measure': 'ml'}], 'olives': [

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe: date hotel revenue av_revenue difference in_percent 5 2021-07-01 Alpina 38736 NaN NaN NaN 11 2021-07-02 Moreon 40766 NaN NaN NaN 17 2021-07-03 Alpina 30330 NaN NaN NaN 23 2021-07-04 Galaxia 50180 NaN NaN NaN 29 2021-07-05 Optima 27006 NaN NaN NaN 1) Write a function that calculates the average revenue for each hotel and adds these values to the "av_revenue" column 2) Write a function that calculates the difference between the average revenue and the values in the "revenue" column and adds these calculations to the "difference" column 3) Write a function that calculates the difference between the average revenue and the values in the "revenue" column as a percentage and adds those calculations to the "in_percent" column

def av_revenue(df): df['av_revenue'] = df.groupby('hotel')['revenue'].transform('mean') return df def difference(df): df['difference'] = df['av_revenue'] - df['revenue'] return df def in_percent(df): df['in_percent'] = 100 * df['difference'] / df['av_revenue'] return df

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

import pandas as pd def decade(year): if year < 1910: return str(year - year%10) + '-' + str(year - year%10 + 10) elif year > 2009: return str(year - year%10) + '-' + str(year - year%10 + 10) else: return str(year - year%10) + '-' + str(year - year%10 + 9) df['Decade of Release'] = df['Year'].map(decade)

Python

generate functionWed, 25 Jan 2023

There is a list with lists inside [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/'], ['/sport/'], ['/middleeast/36131117-divizion-s-400-ne-zametil-ataki-f-35-pod-damaskom/'], ['/economics/36065674-rossiyane-vozmutilis-minimalnymi-zarplatami-v-stranah-es/']] 1) Modify the list to the Pandas dataframe 2) Filter out and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd list = [['/world/'], ['/latest/'], ['/?updated=top'], ['/politics/36188461-s-marta-zhizn-rossiyan-suschestvenno-izmenitsya-iz-za-novyh-zakonov/'] ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/'], ['/science/36157853-nasa-sobiraet-ekstrennuyu-press-konferentsiyu-na-temu-vnezemnoy-zhizni/'], ['/video/36001498-poyavilis-pervye-podrobnosti-gibeli-natali-melamed/'], ['/world/36007585-tramp-pridumal-kak-reshit-ukrainskiy-vopros/?smi2=1'] ['/science/

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.title('Percentage difference between nutrient and mineral content for each vegitable') plt.xlabel('Vegetable') plt.ylabel('Percentage difference') plt.legend() plt.show()

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionSat, 25 Feb 2023

You need to split the values of the int_rate column (in str data format) into categories. Write a function that creates a rate_group column and adds values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15'

def rate_group(rate): if rate > 15.00: return '>15' elif rate <= 15.00 and rate > 10.00: return '10-15'

Python

generate functionWed, 08 Feb 2023

def get_needed_posts(query): for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def search(query): query = query.lower() site = pd.DataFrame() for page in range(0, 10): if page == 0: url = 'https://habr.com/ru/all/' else: url = 'https://habr.com/ru/all/page' + str(page) + '/' req = requests.get(url) soup = BeautifulSoup(req.text, 'html.parser') articles = soup.find_all('li', class_='content-list__item_post') site = pd.concat([site, get_needed_posts(query, articles)]) return site.reset_index(drop=True) def get_needed_posts(query, articles): site = pd.DataFrame() for article in articles: title, date, link = get_article_info(article) if not title in list(site['title']) and not link in list(site['link']): row = {'

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times both titles and ratings in this table are not unique, they are duplicated many times Write a function that creates a new dataframe with two columns: - In the first column it groups the unique titles of the movies - In the 2nd column displays their average rating

import pandas as pd ratings = pd.read_csv('ratings.csv') filtered_ratings = ratings[['title', 'rating']] filtered_ratings = filtered_ratings.groupby('title').mean() filtered_ratings

Python

generate functionTue, 21 Mar 2023

Calculate mean, median values of this list water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1]

def mean(x): return sum(x) / len(x)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Group the data by "userId" column and output in a separate column the number of values in the "rating" column

import pandas as pd df = pd.read_csv('ratings.csv') df.groupby('userId').size().to_frame('size').reset_index()

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return np.trace(my_matrix) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return np.trace(my_matrix) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15))

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df = df[df.duplicated(subset=["name"], keep=False)].sort_values("name")

Python

generate functionTue, 21 Mar 2023

Write code that calculates the standard deviation of list A

import numpy as np def stdev(A): return np.std(A) B = [1, 2, 3, 4, 5] print(stdev(B))

Python

generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def create_plot(question, title, figure_template): # question = 'What is the most preferred working environment for you.' question6 = df[question].value_counts() label = question6.index counts = question6.values fig = figure_template(x=label, y=counts) fig.update_layout(title_text=title) fig.show() create_plot('What is the most preferred working environment for you.', 'Какая рабочая среда для вас наиболее предпочтительна?', px.bar)

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that shows the top 20 rows from the country column sorted by av_temp values in descending order.

def top(df, column_name): return df.sort_values(by=column_name, ascending=False)[:20] top(df, 'av_temp')

Python

generate functionTue, 14 Feb 2023

Rewrite the function below so it is called 'is_month_end' instead of 'is_quarter_end' and output 1 if the 'Date' column shows the last day of the month df['is_quarter_end'] = np.where(df['month']%3==0,1,0) df.head()

def is_month_end(date): # Your code goes here. is_month_end(date)

Python

generate functionWed, 25 Jan 2023

There is a Pandas dataframe news_title 0 /world/ 1 /latest/ 2 /?updated=top 3 /politics/36188461-s-marta-zhizn-rossiyan-susc... 4 /world/36007585-tramp-pridumal-kak-reshit-ukra... 5 /science/36157853-nasa-sobiraet-ekstrennuyu-pr... 6 /video/36001498-poyavilis-pervye-podrobnosti-g... 7 /world/36007585-tramp-pridumal-kak-reshit-ukra... 8 /science/ 9 /sport/ Filter out this dataframe and leave only the url's with the news sctructure (containing 8 digits and heading) in it, using the str.contains method

import pandas as pd df = pd.read_csv('news.csv', delimiter='\t') def filter_news(news_title): if news_title.startswith('/') and news_title.count('/')==2 and news_title[-1].isdigit() and not news_title.endswith('/'): return True else: return False df.news_title.apply(filter_news)

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with one line of code that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

# function to create a new data frame, water, mineral, nutri and then concatenate them to form a new df def df_maker(df, name): df['treatments'] = name df = df.reset_index() df = df[['index', 'treatments', 'value']] return df # create water df water = df_maker(water, 'water') # create nutri df nutri = df_maker(nutri, 'nutri') # create mineral df mineral = df_maker(mineral, 'mineral') # combine df combined_df = pd.concat([water, nutri, mineral]) combined_df

Python

generate functionFri, 27 Jan 2023

There is a code for Pandas df4['loyal_profit'] = df4.apply(lambda x: (x.profit / x.ocup_rooms) if ['regular_customer', 'by_recommendation'] in x.how_find_us else None, axis=1) This code gives out an error. Rewrite it so that it works

def panda_function(df): df['loyal_profit'] = df.apply(lambda x: (x.profit / x.ocup_rooms) if ['regular_customer', 'by_recommendation'] in x.how_find_us else None, axis=1)

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=label, y=counts) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe: hotel how_find_us Alpina agg_101hotels.com 2 agg_airbnb.com 4 agg_booking.com 3 agg_bronevik.com 1 agg_hotellook.ru 4 agg_level.travel 2 agg_onetwotrip.com 2 agg_onlinetours.ru 2 agg_ostrovok.ru 1 agg_other_sites 1 agg_roomguru.ru 2 agg_sutochno.ru 6 agg_travelata.ru 3 agg_tripadvisor.ru 3 agg_trivago.com 6 agg_tvil.ru 3 agg_yandex_travel 5 by_recommendation 8 facebook_adv 1 google_adv 1 instagram_adv 6 outdoor 6 regular_customer 1 seo 5 social 3 telegram_adv 2 tour_agents 3 unknown 4 vk_adv 3 yandex_adv 10 Write a function that creates a new dataframe by grouping the values in the "how_find_us" column according to the following rules: 1) if there are "agg" in the row, replace all row values with "aggregators" 2) if there are words "facebook" or "vk" or "instagram" or "telegram" in the line, replace these values with "social"

def find_us(df): df.loc[df.hotel=='Alpina', 'how_find_us'].value_counts() df.loc[df.hotel=='Alpina', 'how_find_us'] = df.loc[df.hotel=='Alpina', 'how_find_us'].map(lambda x: 'aggregators' if 'agg' in x else x) df.loc[df.hotel=='Alpina', 'how_find_us'] = df.loc[df.hotel=='Alpina', 'how_find_us'].map(lambda x: 'social' if 'facebook' in x or 'vk' in x or 'instagram' in x or 'telegram' in x else x) return df

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

def func(x, pos): # formatter function takes tick label and tick position s = '{:0,d}'.format(int(x)) return s import matplotlib.ticker as ticker formatter = ticker.FuncFormatter(func) # make formatter plt.gca().yaxis.set_major_formatter(formatter) # set formatter to needed axis

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

df_new.groupby(['route', 'incident_type']).count()['id'] df_new.groupby(['route', 'incident_type'])['id'].count() df_new.groupby(['route', 'incident_type'])['id'].size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type']).count()['id'] df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type']).size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type'])['id'].size() df_new[df_new['incident_type'] >= 5].groupby(['route', 'incident_type'])['id'].size().sort_values(ascending=False)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def decade_of_release(year): if type(year) == int: if year < 1900: return "1800-1900" if year >= 1900 and year < 1910: return "1900-1910" if year >= 1910 and year < 1920: return "1910-1920" if year >= 1920 and year < 1930: return "1920-1930" if year >= 1930 and year < 1940: return "1930-1940" if year >= 1940 and year < 1950: return "1940-1950" if year >= 1950 and year < 1960: return "1950-1960" if year >= 1960 and year < 1970: return "1960-1970" if year >= 1970 and year < 1980: return "1970-1980" if year >= 1980 and year < 1990: return "1980-1990" if year >= 1990 and year < 2000: return "1990-2000" if year >= 2000 and year < 2010: return "2000-2010"

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt df.plot() plt.show()

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

df=pd.DataFrame({'userId':[1,1,1,1,1], 'movieId':[31,1029,1061,1129,1172], 'rating':[2.5,3,3,2,4], 'timestamp':[1260759144,1260759179,1260759182,1260759185,1260759205] }) df

Python

generate functionThu, 30 Mar 2023

There is a matrix A1 = array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) Multiply the vector vector = [1,2,3,4] by the matrix A1

def matrix_multiplication(matrix,vector): matrix_dot_vector = [] for i in range(len(matrix)): sum_of_row = 0 for j in range(len(matrix[0])): sum_of_row += matrix[i][j] * vector[j] matrix_dot_vector.append(sum_of_row) return matrix_dot_vector vector = [1,2,3,4] A1 = array([[0, 0, 0, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 1, 0], [1, 0, 0, 1, 0, 1]]) matrix_multiplication(A1,vector)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

plt.plot(data.index, data['nutri'], label = 'nutri') plt.plot(data.index, data['mineral'], label = 'mineral')

Python

generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'column1' - for x2 = 'column2'

# write your code here

Python

generate functionThu, 30 Mar 2023

There are two vectors: film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) We know that the angle between them is 65.9 degrees. Construct a graph with a visual representation of the vectors and the angle between them.

import matplotlib.pyplot as plt from math import cos, sin, radians, degrees def plot_vector(v, ax=None): ax = ax or plt.gca() arrowprops=dict(arrowstyle='->', linewidth=2, shrinkA=0, shrinkB=0) ax.annotate('', v, v+v, arrowprops=arrowprops) film_2 = np.array([0,1,0,1,0,0]) film_4 = np.array([1,0,0,1,0,1]) ax = plt.axes() plot_vector(film_2, ax) plot_vector(film_4, ax) ax.set_xlim(0, 2) ax.set_ylim(0, 2) plt.show()

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

def t_test(A, B): return B > A

Python

generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def init(self, name, seniority): super().init(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Managers(Employee): def __init__(self, name, seniority, awards): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade() def add(a, b): return a + b add(1, 2)

Python

generate functionFri, 23 Dec 2022

You are given two sorted arrays that both only contain integers. Your task is to find a way to merge them into a single one, sorted in asc order. Complete the function mergeArrays(arr1, arr2), where arr1 and arr2 are the original sorted arrays. You don't need to worry about validation, since arr1 and arr2 must be arrays with 0 or more Integers. If both arr1 and arr2 are empty, then just return an empty array. Note: arr1 and arr2 may be sorted in different orders. Also arr1 and arr2 may have same integers. Remove duplicated in the returned result. Examples (input -> output) * [1, 2, 3, 4, 5], [6, 7, 8, 9, 10] -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * [1, 3, 5, 7, 9], [10, 8, 6, 4, 2] -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * [1, 3, 5, 7, 9, 11, 12], [1, 2, 3, 4, 5, 10, 12] -> [1, 2, 3, 4, 5, 7, 9, 10, 11, 12]

def merge_arrays(arr1, arr2): arr1.extend(arr2) arr1.sort() new_set = set(arr1) new_list = list(new_set) return new_list

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): return bin(n).replace("0b", "")

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform(round))

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer time_on_chart min max "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

import pandas as pd df = pd.read_csv("https://www.dropbox.com/s/jr9c7rwhi8hvuk7/performers.csv?dl=1") df.sort_values(by=['time_on_chart','max'], ascending=False).head(20) performer min max time_on_chart 9 "Weird Al" Yankovic 1 20 21 0 "Groove" Holmes 1 11 12 1 "Little" Jimmy Dickens 1 10 11 2 "Pookie" Hudson 1 1 2

Python

generate functionSat, 28 Jan 2023

Construct 4 seaborn bar graphs from these data: 1) df1['city'].value_counts() 2) df2['hotel'].value_counts() 3) df2['date'].value_counts() 4) df2['rating'].value_counts()

import seaborn as sns import matplotlib.pyplot as plt def bar_graph(data): plt.figure(figsize=(18, 6)) sns.barplot(data.index, data.values, alpha=0.8) plt.title(str(data.name)) plt.ylabel('Count', fontsize=12) plt.xlabel('Name', fontsize=12) plt.show()

Python

generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def init(self, name, seniority, awards=2): super().init(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): self.seniority += 1 self.seniority += self.intlawards if self.seniority % 7 == 0: self.grade_up() return self.publish_grade()

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_na(df, column_name): corr = df.corr()[column_name] new_value = corr.mean() * df[column_name].mean() df[column_name] = df[column_name].fillna(new_value) return df fill_na(df2, column_name='rectal_temp')

Python

generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def df_gdp_diff(df): df19 = df_19[df_19['Year'] == 2019] gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) return gdpdiff

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionFri, 03 Feb 2023

There is a df Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of strings for all "operator"

df["operator"].apply(len).mean()

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incident_types with a value >= 5. Sort them in descending order by incident_type column

def df_incident_type(df): df = df.groupby(['route', 'operator', 'group_name', 'incident_type'])[['incident_type']].count() df = df.sort_values('incident_type', ascending=False) df.reset_index(inplace=True) df = df.drop_duplicates(subset=['route'], keep='first') return df

Python

generate functionWed, 21 Dec 2022

Change the class so that when self.seniority is added, the designer is assigned 1 point and when self.intlawards is added, the designer is assigned 2 points. Correct the condition so that these points add up correctly: if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() Example class: class Designer(Employee): def init(self, name, seniority, awards=2): super().init(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

class Designer(Employee): def __init__(self, name, seniority, awards=2): super().__init__(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority + self.intlawards) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

Python

generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums money = 0 winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

for i in winnums: money += 1 print(money)

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

sns.barplot(x = 'hotel', y ='difference', hue = 'date', data = hotels_rev) plt.show()

Python

generate functionThu, 22 Dec 2022

Task Given three integers a ,b ,c, return the largest number obtained after inserting the following operators and brackets: +, , () In other words , try every combination of a,b,c with [+()] , and return the Maximum Obtained (Read the notes for more detail about it) Example With the numbers are 1, 2 and 3 , here are some ways of placing signs and brackets: 1 * (2 + 3) = 5 1 * 2 * 3 = 6 1 + 2 * 3 = 7 (1 + 2) * 3 = 9 So the maximum value that you can obtain is 9. Notes The numbers are always positive. The numbers are in the range (1 ≤ a, b, c ≤ 10). You can use the same operation more than once. It's not necessary to place all the signs and brackets. Repetition in numbers may occur . You cannot swap the operands. For instance, in the given example you cannot get expression (1 + 3) * 2 = 8.

def expression_matter(a, b, c): return max([a * b * c, a * (b + c), (a + b) * c, a + b + c])

Python

generate functionMon, 12 Dec 2022

Create a class Ball. Ball objects should accept one argument for "ball type" when instantiated. If no arguments are given, ball objects should instantiate with a "ball type" of "regular." ball1 = Ball() ball2 = Ball("super") ball1.ball_type #=> "regular" ball2.ball_type #=> "super"

class Ball(): def __init__(self, ball_type="regular"): self.ball_type = ball_type

Python

generate functionSat, 25 Feb 2023

Write code that finds all rows in the rate_group column of Pandas with the values "1:" and "2:" and replaces them with np.nan.

def replace_in_column(df, column, old_value, new_value): df[column].replace(old_value, new_value, inplace=True)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

import pandas as pd df = pd.DataFrame([ [1, "Ksenia Rodionova", "2021-07-01", "Alpina", 1639.000000, "by_recommendation", 48, 3.0], [2, "Ulyana Selezneva", "2021-07-01", "AquaMania", 930.000000, "by_airbnb.com", 97, 4.0], [3, "Konstantin Prokhorov", "2021-07-01", "Breeze", 1057.720000, "agg_trivago.com", 173, 4.0], [4, "Petrov Vladimir", "2021-07-01", "Moreon", 1403.000000, "agg_onlinetours.ru", 229, 4.0], [5, "Arina Selivanova", "2021-07-01", "Alpina", 1639.000000, "agg_sutochno.ru", 63, 4.0], [6

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

if(data['nutri']>data['mineral']): plt.text(data['mineral'] - (data['nutri'] - data['mineral']), data.index, '{:.2f}%'.format(data['mineral'] - data['nutri']), color='white', ha="center", va='bottom') else: plt.text(data['mineral'] + (data['mineral'] - data['nutri']), data.index, '{:.2f}%'.format(data['mineral'] - data['nutri']), color='black', ha="center", va='bottom') plt.legend() plt.title('Nutri vs Mineral', size = 20) plt.xlabel('Country', size = 20) plt.ylabel('Percentage', size = 20) plt.show()

Python

generate functionFri, 27 Jan 2023

There is a code for Pandas df4['loyal_profit'] = df4.apply(lambda x: (x.profit / x.ocup_rooms) if 'regular_customer' in x.how_find_us else None, axis=1) Rewrite it so that it searches not only for 'regular_customer' values but also for 'by_recommendation' values

df4['loyal_profit'] = df4.apply(lambda x: (x.profit / x.ocup_rooms) if 'regular_customer' or 'by_recommendation' in x.how_find_us else None, axis=1)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating decade 4728 Bad Education (2019) 0.5 2010-2020 35638 35638 Palooka (1934) 0.5 1934 1930-1940 21445 21445 High Moon (2019) 0.5 2019 2010-2020 40291 40291 Saint Maud (2019) 0.5 2019 2010-2020 29462 29462 Mad at the Moon (1992) 0.5 1992 1990-2000 46978 46978 The Butterfly Ball (1977) 0.5 1977 1970-1980 42931 42931 Snowboarďáci (2004) 0.5 2004 2000-2010 38670 38670 Recon 2020: The Caprini Massacre (2004) 0.5 2004 2000-2010 25506 25506 Karagoez catalogo 9,5 (1983) 0.5 1983 1980-1990 43989 43989 Starting a Skyscraper (1902) 0.5 1902 1900-1910 Write a function that shows which decade had the most movies with a high rating Write a function that creates a "Decade of Release" column and assigns a decade to each movie based on the values in the "Year" column Example: 1889 Result: 1880-1890 Example 2: 1954 Result 2: 1950-1960 Example 3: 2011 Result 3: 2010-2020 Example 4: raovyavas Result: NaN

def high_rating(df): for i in df['rating']: if i > 6.5: print("The years with most high rating movies are",df['decade']) #used for

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Group the data by "userId" column and output in a separate column the number of values in the "rating" column

def groupby_cnt(df): df = pd.DataFrame(df.groupby("userId")["rating"].count()) df.columns = ['ratings_cnt'] return df

Python

generate functionFri, 23 Dec 2022

Write a method, that will get an integer array as parameter and will process every number from this array. Return a new array with processing every number of the input-array like this: If the number has an integer square root, take this, otherwise square the number. Example [4,3,9,7,2,1] -> [2,9,3,49,4,1] Notes The input array will always contain only positive numbers, and will never be empty or null.

def square_or_square_root(arr): new_arr = [] for i in arr: if int(i**(1/2)) == i**(1/2): new_arr.append(int(i**(1/2))) else: new_arr.append(i**2) return new_arr

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnums, students, replacing_num): for index, item in enumerate(students): if item[0] in stnums: students[index][0] = replacing_num return students replace(stnums, students, replacing_num)

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete(df): return df.drop(df[df['Class 1'].str.contains('Rock') | df['Class 2'].str.contains('Rock')].index) delete(grass)

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

def plot_barchart(df): df = df.sort_values('perc_of_5star', ascending = False).head(10) return df.plot.barh(x = 'decade', y = 'perc_of_5star', title = '% 5-star ratings by decade'); plot_barchart(df)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a function year_leaders that will build a new dataframe and leave 1 line of performer and hits (having a maximum number of num_of_hits) in each chart_debut value

import pandas as pd performer = ['Glee Cast', 'Taylor Swift', 'Drake', 'YoungBoy Never Broke Again', 'Aretha Franklin', 'The Beatles'] hits = ['Somebody To Love', 'Friday', 'Loser Like Me', 'Baby', 'I Want You Back', 'Kacey Talk', 'Put It On Me', 'Dirty Iyanna', 'Lil Top', 'London Boy', 'Teardrops On My Guitar', 'Fifteen', 'Summer Sixteen', 'The Language', 'Weston Road Flow', 'Sgt. Pepper\'s Lonely Hearts Club Band/With A Little Help From My Friends'] chart_debut = [2009, 2008, 2016, 2020, 1967, 1978] time_on_chart = [290, 14299, 7449, 1012, 3490, 3548] consecutive_weeks = [47.0, 11880.0, 6441.0, 625.0, 2921.0, 2798.0] decade = ['2000-2010', '2000-2010', '2010-2020', '2020-2030

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: id grade rate_group 0 1077501 B 10-11% 1 1077430 C 15-17% 2 1077175 C 15-17% 3 1076863 C 13-14% 4 1075358 B 12-13% 5 1075269 A 7-8% 6 1069639 C 15-17% 7 1072053 E 17-25% 8 1071795 F 17-25% 9 1071570 B 12-13% Modify the dataframe so that grades are column names and rate_group are row names. Inside the rows should be the sum of the id by grouping grades and rate_group.

import pandas as pd def group(df): df = df.pivot_table(index='rate_group', columns='grade', values='id', aggfunc=np.sum) return df

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def delete_grass(pokemon): for i in pokemon.index: if 'Grass' in pokemon.loc[i, ['Class 1', 'Class 2']].values: pokemon.drop(i, inplace=True) return pokemon

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu # ... stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

Python

generate functionMon, 13 Feb 2023

There is Matplotlib chart code: df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False) plt.show() Improve this code to make the fonts size 14 and the boxplot lines bold 3

df.plot(kind='box', subplots=True, layout=(4,2), sharex=False, sharey=False, fontsize=14, linewidth=3) plt.show()

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 Create a new occupancy_rate column and add a function that will read the values from the formula: ocup_rooms / total_rooms

df['occupancy_rate'] = df.ocup_rooms / df.total_rooms

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

def unique(data: pd.DataFrame()) -> pd.DataFrame(): pass

Python

generate functionWed, 22 Feb 2023

Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

function that: Modify this code to make a vertical bar graph instead of a pie chart (plotly.express library) question6 = "How likely would you work for a company whose mission is not bringing social impact ?" question6 = data[question6].value_counts() label = question6.index counts = question6.values colors = ['gold','lightgreen'] fig = go.Figure(data=[go.Pie(labels=label, values=counts)]) fig.update_layout(title_text='How likely would you work for a company whose mission is not bringing social impact?') fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=30, marker=dict(colors=colors, line=dict(color='black', width=3))) fig.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Leave only 1 unique row in each 'song' column in case the 'peak_position' column has a value of 1

def chart_peak(df): df.sort_values(by=['song', 'peak_position'], inplace=True) df.drop_duplicates('song', keep='first', inplace=True) return df

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_key(dct): return max(dct, key=lambda key: dct[key]['Value']) max_key(dct)

Python

generate functionThu, 22 Dec 2022

Define a function that removes duplicates from an array of numbers and returns it as a result. The order of the sequence has to stay the same.

def distinct(seq): return list(dict.fromkeys(seq)) distinct([1, 2, 2, 3, 4, 4, 5])

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def size(delta, sigsqr, conf): return ((sigsqr * 1.645 ** 2) / (delta ** 2))

Python

generate functionTue, 13 Dec 2022

Create the function to change the element positions with the same exact logics: Input: ["tail", "body", "head"] Output: ["head", "body", "tail"] Input: ["tails", "body", "heads"] Output: ["heads", "body", "tails"] Input: ["bottom", "middle", "top"] Output: ["top", "middle", "bottom"] Input: ["lower legs", "torso", "upper legs"] Output: ["upper legs", "torso", "lower legs"] Input: ["ground", "rainbow", "sky"] Output: ["sky", "rainbow", "ground"]

def switch_elements(arr): return [arr[-1]] + arr[1:-1] + [arr[0]]

Python

generate functionTue, 13 Dec 2022

Create the function that find the coefficients of quadratic equation of the given two roots (x1 and x2). Equation will be the form of ax^2 + bx + c = 0 Return type is a Vector containing coefficients of the equations in the order (a, b, c). Since there are infinitely many solutions to this problem, we fix a = 1. Remember, the roots can be written like (x-x1) * (x-x2) = 0 Example quadratic(1,2) = (1, -3, 2) This means (x-1) * (x-2) = 0; when we do the multiplication this becomes x^2 - 3x + 2 = 0 Example 2 quadratic(0,1) = (1, -1, 0) This means (x-0) * (x-1) = 0; when we do the multiplication this becomes x^2 - x + 0 = 0 Notes Inputs will be integers. When x1 == x2, this means the root has the multiplicity of two

def quadratic(x1, x2): return (1, -x1 - x2, x1 * x2)

Python

generate functionMon, 27 Mar 2023

The code below gives an error TypeError: unhashable type: 'list'. Fix it from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

cos_sim = similarities.MatrixSimilarity(tfidf[bows].values)

Python

generate functionMon, 26 Dec 2022

Add to this code the numbering of all group_list values from 1 dct = convert_to_dict(students) def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() return group_list group_list(dct, 'BST161')

def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() for i, n in enumerate(group_list): print('{}. {}'.format(i+1, n)) group_list(dct, 'BST161') # output 1. A. García de Leon 2. A. Martínez Martínez 3. A. Romero de la Fuente 4. C. Ramírez de Cartagena 5. E. González Gómez 6. F. García León 7. H. Solís Ortíz 8. J. Carlos 9. J. Fernández 10. J. Muñoz Solís 11. L. González Gómez

Python

generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_min_max(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') return df

Python

generate functionFri, 27 Jan 2023

There is a code for Pandas df3[df3.how_find_us.str.contains('yandex')] Rewrite it so that it outputs all unique rows with str.contains = 'yandex' and does not output the rest

df3.loc[df3.how_find_us.str.contains('yandex') == True, ['how_find_us']]

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 Create a new occupancy_rate column and add a function that will read the values from the formula: ocup_rooms / total_rooms

def occupancy_rate(total_rooms, ocup_rooms): return ocup_rooms / total_rooms df['occupancy_rate'] = occupancy_rate(df['total_rooms'], df['ocup_rooms'])

Python

generate functionWed, 22 Feb 2023

Modify the code below to have column captions for the x-axis values question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def create_bar_chart(x, y, title): fig = px.bar(x=x, y=y, orientation='h') fig.update_layout(title_text=title) fig.show()

Python

generate functionWed, 18 Jan 2023

Write a function that calculates the value_counts in the decade pandas column of the dataview if the rows in the rating column = 5.0

def five_star_decade_value_counts(df): df = df.loc[df.rating == 5.0] return df.decade.value_counts()

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer, but not less than 1 and not more than 4: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def add(a, b): return a + b add(1, 2)

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = 2*sig*math.sqrt(n)*norm.ppf(conf)/math.sqrt(n) return h

Python

generate functionFri, 24 Feb 2023

There is a Pandas series named mean_c: country_code AFG 38.682 AGO 44.554 ALB 74.068 AND NaN ARE 68.607 ARG 78.796 ARM 72.590 ATG NaN AUS 90.716 AUT 88.413 Write a function that sorts them in descending numerical order

def sort_df(df): return df.sort_values(ascending=False) sort_df(mean_c)

Python

generate functionMon, 13 Feb 2023

Write a formula that rounds all non-integer values to the nearest integer in the Pandas column

def round_floats(df, col): return df[col].apply(np.round)

Python

generate functionTue, 21 Mar 2023

Add to this the code for generating a grid on the chart, as well as specifying the percentage difference directly on the chart bars plt.figure(figsize = (12, 4)) plt.bar(data.index, data['nutri'], color = 'darkblue', label = 'nutri') plt.bar(data.index, data['mineral'], color = 'brown', label = 'mineral')

import matplotlib.pyplot as plt import pandas as pd data = pd.read_csv('data.csv') # Add your code below: plt.figure(figsize=(12, 4)) plt.bar(data.index, data['nutri'], color='darkblue', label='nutri') plt.bar(data.index, data['mineral'], color='brown', label='mineral') plt.xticks(data.index, data['food']) plt.xlabel('Food') plt.ylabel('Percentage difference') plt.title('Percentage difference of nutrients and minerals') plt.legend() plt.show()

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def bar_x_axis(df, column_name, title): #prepare the data question = df[column_name].value_counts() label = question6.index counts = question6.values #create the figure fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title) fig.show() bar_x_axis(df,'What is the most preferred working environment for you.','Какая рабочая среда для вас наиболее предпочтительна?')

Python

generate functionFri, 23 Dec 2022

Prolog: This kata series was created for friends of mine who just started to learn programming. Wish you all the best and keep your mind open and sharp! Task: Write a function that will accept two parameters: variable and type and check if type of variable is matching type. Return true if types match or false if not. Examples: 42, "int" --> True "42", "int" --> False

def type_checker(variable, type): if type(variable) == type: return True else: return False

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

ratings = pd.read_csv("ratings.csv") ratings.head()

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

import scipy import pandas as pd import numpy as np data = pd.read_csv("cookie_cats.csv") data.head() def mann_whitney_test(data): """ Returns the result of a Mann Whitney test. """ # Calculate statistic and p-value stat, p = scipy.stats.mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) # Print result print('mann whitney stat=%.3f, p=%.3f' % (stat, p)) mann_whitney_test(data)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

def generate_unique_id(df): df['unique_id'] = range(1, len(df) + 1)

Python

generate functionThu, 16 Feb 2023

Write a function that finds all rows with letters in the Series column

def is_letter_in_Series(row): if row["Series"] != row["Series"]: return False if any(x.isalpha() for x in row["Series"]): return True return False

Python

generate functionTue, 21 Mar 2023

There are two samples A and B. Draw boxplots comparing these two samples with a light blue filler and a caption for the names of the samples

def boxplot_compare_distributions(array1, array2, names = ["Sample 1", "Sample 2"]): # Create a figure instance fig = plt.figure(1, figsize=(9, 6)) # Create an axes instance ax = fig.add_subplot(111) # Create the boxplot bp = ax.boxplot([array1, array2]) ## change outline color, fill color and linewidth of the boxes for box in bp['boxes']: # change outline color box.set( color='#7570b3', linewidth=2) # change fill color box.set( facecolor = '#1b9e77' ) ## change color and linewidth of the whiskers for whisker in bp['whiskers']: whisker.set(color='#7570b3', linewidth=2) ## change color and linewidth of the caps for cap in bp['caps'

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 450 (Can't Live Without Your) Love And Affection Nelson 1990-07-07 1 93 15 14.0 607 (Everything I Do) I Do It For You Bryan Adams 1991-06-29 1 53 9 8.0 748 (Hey Won't You Play) Another Somebody Done Som... B.J. Thomas 1975-02-01 1 99 17 16.0 852 (I Can't Get No) Satisfaction The Rolling Stones 1965-06-12 1 67 13 12.0 951 (I Just) Died In Your Arms Cutting Crew 1987-03-07 1 80 14 13.0 Create a new dfs dataframe, where the data format of the chart_debut column is changed from 1991-06-29 to 1991

import pandas as pd dfs = pd.read_csv('https://raw.githubusercontent.com/coding-blocks-archives/ML-Noida-2019-June-Two/master/datasets/hot-100.csv', parse_dates=['chart_debut']) dfs['chart_debut'] = dfs['chart_debut'].dt.year dfs.head(10)

Python

generate functionMon, 27 Mar 2023

This code removes stopwords in lists incorrectly: without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] It turns out like this: [[g], [u, n, l], [j, u, r, n, g], [p, n, ] It should be like this: [go, until, jurong, point Correct it.

from nltk.corpus import stopwords texts = [['go', 'until', 'jurong', 'point'], ['crazy', 'available', 'only', 'in', 'bugis', 'n', 'great', 'world', 'la', 'e', 'buffet', 'cine', 'there', 'got', 'amore', 'wat']] stopwords_set = set(stopwords.words('english')) without_sw = [word for text in texts for word in text if word not in stopwords_set]

Python

generate functionTue, 24 Jan 2023

there are two Pandas dataframes: rzd client_id rzd_revenue 0 111 1093 1 112 2810 2 113 10283 3 114 5774 4 115 981 auto client_id auto_revenue 0 113 57483 1 114 83 2 115 912 3 116 4834 4 117 98 Create a new dataframe joined and merge these 2 dataframes inside it

def join(rzd, auto): # YOUR CODE HERE joined = rzd.join(auto, how = 'outer') return joined

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

def unique_id(df): return [df['line_id'][i] for i in range(df.shape[0])] unique_id(df)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with performer on the x-axis and number_of_hits on the y-axis.

def plot_num_of_hits(df): df = df.sort_values("num_of_hits", ascending=False) # make sure you have a matplotlib # import matplotlib.pyplot as plt plt.bar(df.performer, df.num_of_hits) plt.show()

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) plt.bar(hotels_rev['date'], hotels_rev['av_revenue'], color='red') plt.xlabel('Date') plt.ylabel('Average revenues') plt.title('Average revenues per hotel') plt.xticks(rotation=90) plt.show()

Python

generate functionThu, 15 Dec 2022

Create a function that allows you to move doc to another shelf directories. It should check for: 1) if both doc and shelf exist in the dictionary, the function should move doc value to the specified shelf and make return updated directories. 2) if the specified shelf exists in the directories and return ERROR NO SUCH KEY if it doesn't exist 3) if there is a specified doc in the dictionary and return ERROR NO SUCH VALUE if it doesn't exist Note that directories is a dictionary and doc values are in lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, doc, shelf): if doc in directories[shelf]: return "ERROR VALUE ALREADY EXISTS" elif doc not in directories: return "ERROR NO SUCH VALUE" elif shelf not in directories: return "ERROR NO SUCH KEY" else: directories[shelf].append(doc) for shelf in directories: if doc in directories[shelf]: del directories[shelf] return directories directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' move(directories, doc, shelf)

Python

generate functionFri, 03 Feb 2023

There is a Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id for each line of the pandas dataframe

# Use index as a unique identifier df.index # Use a column as the unique identifier df['year'].values

Python

generate functionSun, 11 Dec 2022

Write a program that will ask the user for the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! cook_book = { 'salad': [ {'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'} {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'} {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'g'} ], 'pizza': [ {'ingridient_name': 'cheese', 'quantity': 20, 'measure': 'gr'} {'ingridient_name': 'sausage', 'quantity': 30, 'measure': 'gr'} ], 'lemonade': [ {'ingridient_name': 'lemon', 'quantity': 1, 'measure': 'pcs'} {'ingridient_name': 'water', 'quantity': 200, 'measure': 'ml'} {'ingridient_name': 'sugar', 'quantity': 10, 'measure': 'g'} ] } Enter the number of servings: 3 Result: Cheese: 210 gr. Tomatoes: 6 pcs. Cucumbers: 60g

def total_ingridients(cook_book): dish = input("Enter dish: ") ingridients = cook_book[dish] for i in ingridients: for value in i.values(): print(value) total_ingridients(cook_book)

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def func_name(group_number, dct): students = [] for key, value in dct.items(): if value[4] == group_number: students.append(value) students.sort(key=lambda x: x[0]) for index, student in enumerate(students): print(index, student[0], student[1], student[2]) func_name('BST162', dct)

Python

generate functionTue, 24 Jan 2023

Write a pandas function that groups matching columns traffic_source and region and outputs the values of the third column source_type for them

def merge_cols(df, col_list): return df.groupby(col_list)[['source_type']].apply(lambda x: tuple(x)).reset_index(name='source_type') merge_cols(df, ['traffic_source', 'region'])

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

def get_sample_size(z=1.96, conf=0.95, sigsqr=1, delta=0.5): return (z*z*sigsqr)/(delta*delta)

Python

generate functionMon, 27 Mar 2023

This code removes stopwords in lists incorrectly: without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] It turns out like this: [[g], [u, n, l], [j, u, r, n, g], [p, n, ] It should be like this: [go, until, jurong, point Correct it.

stopwords_set = set(stopwords.words('english')) without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] without_sw = [item for sublist in without_sw for item in sublist]

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Draw a boxplot graph with the x scale as 'hotel' and the y scale as 'av_revenue'

hotels_rev.boxplot(column='av_revenue', by='hotel')

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes == load: return water if clothes < load: return water return how_much_water(water, load, clothes - 1) * 1.1

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

# неверно def fig(x=counts, y=label): fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

import pandas as pd water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1] nutri = [1,2,4,6,5,6,7,5,4,5,6,7,4,3,5,5,6,5,4,3,5] mineral =[2,1,1,3,2,4,2,4,5,4,3,2,3,2,3,1,3,4,5,1,4] treatments = pd.DataFrame({"water": water, "nutri": nutri, "mineral": mineral}, index=range(21)) treatments.reset_index().melt(id_vars=["index"], var_name="treatments", value_name="value")

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate statistics by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

from scipy.stats import mannwhitneyu import numpy as np def mann_whitney_plus_means(turnstile_weather): ''' This function will consume the turnstile_weather dataframe containing our final turnstile weather data. You will want to take the means and run the Mann Whitney U test on the ENTRIESn_hourly column in the turnstile_weather dataframe. This function should return: 1) the mean of entries with rain 2) the mean of entries without rain 3) the Mann-Whitney U statistic and p-value comparing the number of entries with rain and the number of entries without rain You should feel free to use scipy's Mann-Whitney implementation, and you might also find it useful to use numpy's mean function. Here are the functions' documentation: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = stats.norm.interval(conf, loc=mean, scale=sig / np.sqrt(n))[1] - stats.norm.interval(conf, loc=mean, scale=sig / np.sqrt(n))[0] return round(h)

Python

generate functionThu, 15 Dec 2022

Create a function that allows you to move values between directories keys. It should check for: 1) if the specified key is in the dictionary and show ERROR NO SUCH KEY if it does not exist 2) if the value is present in the dictionary and show ERROR NO SUCH VALUE if it doesn't exist 3) if both key and value are in the dictionary, then the function should move value v to another k and return a new dictionary Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } v = '11-2' k = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move(directories, v, k): if k not in directories: print('ERROR NO SUCH KEY') return directories elif v not in directories[k]: print('ERROR NO SUCH VALUE') return directories else: directories[k].append(v) for key, value in directories.items(): if v in value: value.remove(v) return directories directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } move(directories, '11-2', '3')

Python

generate functionTue, 21 Mar 2023

Calculate mean and median values of this list with numpy functions water = [1,2,3,4,2,4,2,4,5,2,3,4,2,1,3,4,3,2,5,1]

def mean_median(x): return np.mean(x), np.median(x) mean_median(water)

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

df[df[['name']].duplicated(keep=False)] df['name'].sort_values()

Python

generate functionMon, 12 Dec 2022

Your mission is to implement a function that converts the following potentially harmful characters: < --> < > --> > " --> " & --> & Good luck :D

def html_escape(text): text = text.replace('<', '<') text = text.replace('>', '>') text = text.replace('"', '"') text = text.replace('&', '&') return text

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def group_lst(num): for i in dct[num]: print(i) return

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

import math import scipy from scipy import stats def sample_size(delta, sigsqr, conf): z = stats.norm.ppf(conf) n = math.ceil((2*z*z*sigsqr)/(delta**2)) return n sample_size(10, 100, 0.95)

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point(object): def __init__(self, x=0, y=0): self.x = x self.y = y # TODO Write a function calculating distance between Point a and Point b. def distance(a, b): c = math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2) return c a = Point(1,1) b = Point(1,2) print(distance(a,b))

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

df_new['unique_id'] = pd.Series(range(1, df_new.shape[0]+1)) df_new.head()

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

sl = [0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067,0.067] sw = [0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050,0.050] scaled_data = { } # Add code here df = pd.DataFrame(scaled_data, columns=['sl', 'sw'])

Python

generate functionWed, 18 Jan 2023

Write a function that calculates the value_counts in the decade pandas column of the dataview if the rows in the rating column = 5.0

def decade(df): df = df[df.rating == 5.0] return df.decade.value_counts()

Python

generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

from sklearn.metrics import f1_score y_pred = lda.predict(X_test) f1_score(y_test, y_pred)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

def year_leaders(df): return df.groupby('chart_debut')['num_of_hits'].max()

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_movies(df):

Python

generate functionTue, 14 Feb 2023

There is a df Pandas dataframe: Date Open High Low Close Adj Close Volume 0 2022-02-14 133.365494 136.166504 133.302002 135.300003 135.300003 26792000 1 2022-02-15 137.471497 137.899994 135.539505 136.425507 136.425507 26578000 2 2022-02-16 136.430496 137.945999 134.823654 137.487503 137.487503 25610000 3 2022-02-17 136.149994 136.839493 132.201996 132.308502 132.308502 30968000 Write a function that creates an is_month_end column and calculates values there as follows: 1) if the 'Date' column contains the last day of the month, the 'is_month_end' column will contain number 1 2) otherwise 0 is written

def is_month_end(date): if date[-2:] == '31': return 1 else: return 0 is_month_end(df['Date'])

Python

generate functionMon, 13 Mar 2023

This code does not work and gives the following error: ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted']. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) from sklearn.metrics import f1_score y_pred = lda.predict(X_test) f1_score(y_test, y_pred) Write code to evaluate its quality with F1 measure

from sklearn.model_selection import train_test_split from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.metrics import f1_score from sklearn.preprocessing import LabelEncoder from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report le = LabelEncoder() le.fit(data_class) data_class = le.transform(data_class) X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) lda = LinearDiscriminantAnalysis() lda.fit(X_train, y_train) y_pred = lda.predict(X_test) f1 = f1_score(y_test, y_pred, average='macro') # average='macro' print('F1:', f1) #confusion_matrix(y_test, y_pred) print(classification_report(y_test, y_

Python

generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'x1' - for x2 = 'x2'

def add(a, b): return a + b add(1, 2)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * (1.1 ** (clothes - load)) how_much_water(5, 10, 14)

Python

generate functionTue, 24 Jan 2023

there is a Pandas dataframe: timestamp visit_id url region user_id traffic_source 0 1549980692 e3b0c44298 https://host.ru/3c19b4ef7371864fa3 Russia b1613cc09f yandex 1 1549980704 6e340b9cff https://host.ru/c8d9213a31839f9a3a Germany 4c3ec14bee direct 2 1549980715 96a296d224 https://host.ru/b8b58337d272ee7b15 USA a8c40697fb yandex 3 1549980725 709e80c884 https://host.ru/b8b58337d272ee7b15 Italy 521ac1d6a0 yandex 4 1549980736 df3f619804 https://host.ru/b8b58337d272ee7b15 Russia d7323c571c yandex Create a new dataframe summary in which count how many traffic_source values relate to each region

region direct yandex google 0 Russia 1 4 0 1 Germany 0 1 0 2 USA 0 0 1 3 Italy 0 1 0

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3]} df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma']) df

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

from math import sqrt def distance(a, b): x_diff_sq = (a.x - b.x)**2 y_diff_sq = (a.y - b.y)**2 return round(sqrt(x_diff_sq + y_diff_sq), 2)

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

from math import sqrt def distance(a, b): return sqrt((a.x - b.x)**2 + (a.y - b.y)**2)

Python

generate functionMon, 26 Dec 2022

Write a code that looks for a repeating student number and replaces it with "9090". lst = [ ["0001", "Antonov", "Anton", "Igorevich", "08/20/2009", "BST161"] ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace_student(lst): result = [] for student in lst: if student[0] == student[-2]: student[-2] = '9090' result.append(student) return result print(replace_student(lst))

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Create a new column "is_loyal" through the lambda function, in which you put the value "True" if duplicates are found in the "name" column. If not put a "False"

df['is_loyal'] = df.duplicated(subset='name', keep=False).apply(lambda x: 'True' if x else 'False')

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 y = {'Marlboro': [3, 13, 6, 66, 13, 7, 13]} for el in y.get('Marlboro'): el * portions print(el)

out = (x*5 for x in y) y = {'Marlboro': [3, 13, 6, 66, 13, 7, 13]} for el in y.get('Marlboro'): el * portions print(el)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column

import pandas as pd def get_dataframe(sl, sw): df = pd.DataFrame({ 'sl': sl, 'sw': sw }) return df sl = [1, 2, 3, 4, 5] sw = [1, 2, 3, 4, 5] df = get_dataframe(sl, sw) df

Python

generate functionTue, 13 Dec 2022

Create a function with two variables. The first variable will take a list. The second variable should output the value from that list by index. If the second variable is longer than the list length, it should not print an error but re-list the list. ([a, b, c, d], 1) should return a ([a, b, c, d], 4) should return d ([a, b, c, d], 5) should return a

def find_index(lst, ind): return lst[ind % len(lst)] find_index(["a", "b", "c", "d"], 1)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new data frame and leave only the rows that have the maximum value of num_of_hits in it when grouped by the chart_debut column

df = pd.DataFrame({'chart_debut': ['2012', '2012', '2012', '2014', '2017'], 'num_of_hits': [1,2,3,4,5]}) def year_leaders(df): return df.groupby('chart_debut').max() year_leaders(df)

Python

generate functionTue, 13 Dec 2022

Write a function that outputs True if the letters in it are one after the other in the string, and False otherwise. def goes_after(word: str, first: str, second: str) -> bool: s = '' for i, let in enumerate(word): print (let[i] + let[i+1]) print(first + second) if let[i] + let[i-1] == first + second: return True else: return False Examples: goes_after("world", "o", "r") Result: True goes_after("world", "w", "r") Result: False

if word.find(first) + word.find(second) == -2: s = word.find(first) if word[s + 1] == second: return True else: return False else: return False goes_after("world", "o", "r")

Python

generate functionSun, 11 Dec 2022

Change the code in this program so that the value of 'quantity' is multiplied by the variable 'portions' and the result is displayed: def change_quantity(dish, portions): return dish['quantity'] * portions change_quantity(dish, portions) dish = 'salad' portions = 5 recipe book = { 'salad': [ { 'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'} {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'} {'ingridient_name': 'cucumbers', 'quantity': 20, 'measure': 'g'} {'ingridient_name': 'olives', 'quantity': 10, 'measure': 'g'} {'ingridient_name': 'olive oil', 'quantity': 20, 'measure': 'ml'} {'ingridient_name': 'salad', 'quantity': 10, 'measure': 'g'} {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'g'} ]}

def prepare_dish(dish, portions): for ingridient in dish: ingridient['quantity'] = ingridient['quantity'] * portions return dish prepare_dish(dish, portions)

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders_all(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) return dfp_copy year_leaders_all(dfp)

Python

generate functionFri, 27 Jan 2023

There is a code for Pandas df3[df3.how_find_us.str.contains('yandex')] Rewrite it so that it outputs all unique rows with str.contains = 'yandex' and does not output the rest

df3[df3.how_find_us.str.contains('yandex').drop_duplicates(keep=False)

Python

generate functionTue, 13 Dec 2022

You are given a method called main, make it print the line Hello World!, (yes, that includes a new line character at the end) and don't return anything Note that for some languages, the function main is the entry point of the program. Here's how it will be tested: Solution.main("parameter1", "parameter2","parametern") Hints: Check your references Think about the scope of your method For prolog you can use write but there are better ways If you still don't get it probably you can define main as an attribute of the Solution class that accepts a single argument, and that only prints "Hello World!" without any return.

class Solution(object): def main(self): print("Hello World!") Solution.main("parameter1","parameter2")

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

import math class Point: def __init__(self, x, y): self.x = x self.y = y def distance(a, b): return math.sqrt((a.x - b.x)**2 + (a.y - b.y)**2)

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

df.plot.bar(rot=0)

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': X_test[index], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionThu, 15 Dec 2022

Create a function that allows you to move doc to another shelf directories. It should check for: 1) if the specified shelf exists in directories and give ERROR NO SUCH KEY if it does not exist 2) if the specified doc exists in the directories and show ERROR NO SUCH VALUE if it doesn't exist 3) if both doc and shelf are in the dictionary, the function should move the value of doc to the specified shelf and make return updated directories. Note that directories is a dictionary and doc values are in lists! Example: Input: directories = { '1': ['2207 876234', '11-2'], '2': ['10006'], '3': [] } doc = '11-2' shelf = '3' Output: directories = { '1': ['2207 876234'], '2': ['10006'], '3': ['11-2'] }

def move_doc(directories, doc, shelf): if shelf in directories: if doc not in directories[shelf]: print('No such value') else: for value in directories.values(): if doc in value: value.remove(doc) else: print('No such key') directories[shelf].append(doc) return directories directories = { '1': ['2207 876234', '11-2', '5455 028765'], '2': ['10006', '5400 028765', '5455 002299'], '3': [] } doc = '11-2' shelf = '3' move_doc(directories, doc, shelf)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(L,X,N): return L * (1+0.1)**(N-X) how_much_water(5, 10, 14)

Python

generate functionTue, 13 Dec 2022

Create a function called _if which takes 3 arguments: a boolean value bool and 2 functions (which do not take any parameters): func1 and func2 When bool is truth-ish, func1 should be called, otherwise call the func2. Example: def truthy(): print("True") def falsey(): print("False") _if(True, truthy, falsey) # prints 'True' to the console

def _if(bool, func1, func2): if bool: func1() else: func2() def truthy(): print("True") def falsey(): print("False") _if(True, truthy, falsey)

Python

generate functionTue, 14 Mar 2023

Change the code of the graph so that only whole numbers are displayed on the x scale plt.plot(ks, k_inertia) plt.plot(ks, k_inertia ,'ro') plt.xlabel('Количество кластеров') plt.ylabel('Инерция')

plt.xticks(ks)

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 155 [free, entry, wkly, comp, win, fa, cup, final,... The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': y_test[index], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Created on Wed Apr 1 16:34:47 2020 @author: jordan """ def ends77(x): return x % 100 == 77 def ends7(x): return x % 10 == 7 def ends00(x): return x % 100 == 0 def ends0(x): return x % 10 == 0 def dropdollar(x): return x % 100 == 0 or x % 100 == 7 def dropdollars(x): return x % 100 == 0 or x % 100 == 7 money = 0 trials = 100000 for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15;

Python

generate functionWed, 21 Dec 2022

Create a function that finds the key with the maximum value in 'Value' and displays it in key - value format: {'AUD': {'ID': 'R01010', {'NumCode': '036', {'CharCode': 'AUD', {'Nominal': 1, 'Name': 'Australian Dollar', 'Value': '46.9983, 'Previous': 45.9496}, 'AZN': {'ID': 'R01020A', 'NumCode': '944', 'CharCode': 'AZN', 'Nominal': 1, 'Name': 'AZN', Value: 41.4856, Previous': 40.5904}, 'GBP': {'ID': 'R01035', 'NumCode': '826', 'CharCode': 'GBP', 'Nominal': 1, 'Name': 'Pound Sterling United Kingdom', 'Value': 85.611, 'Previous': 83.7015},

def max_in_dictionary(d): max_key = max(d, key=lambda key: d[key]['Value']) return (max_key, d[max_key]) max_in_dictionary(rates)

Python

generate functionThu, 22 Dec 2022

Wilson primes satisfy the following condition. Let P represent a prime number. Then, ((P-1)! + 1) / (P * P) should give a whole number. Your task is to create a function that returns true if the given number is a Wilson prime.

def am_i_wilson(n): from math import factorial return factorial(n-1)+1 == n*n*factorial(n-2)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

# Write your code here

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import matplotlib.pyplot as plt import pandas as pd df = pd.read_csv('./data/nutri.csv') df.plot.bar() plt.show()

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users. The lifetime is the difference between the maximum and minimum values of the "timestamp" column for a given "userId".

import pandas users = pandas.read_csv('ml-100k/u.user', sep='|', names=['user_id', 'age', 'gender', 'occupation', 'zip_code']) ratings = pandas.read_csv('ml-100k/u.data', sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp']) avg_lifetime = ratings.groupby('user_id')['timestamp'].agg(['max','min']).mean(axis=1).mean() print(avg_lifetime)

Python

generate functionMon, 12 Dec 2022

Create a function that returns the CSV representation of a two-dimensional numeric array. Example: input: [[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]] output: '0,1,2,3,4\n' +'10,11,12,13,14\n' +'20,21,22,23,24\n' +'30,31,32,33,34' Array's length > 2.

def to_csv_string(array): return '\n'.join([','.join([str(j) for j in i]) for i in array]) to_csv_string([[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]])

Python

generate functionSat, 25 Feb 2023

rich = df[df.annual_inc >= 1000000] Rewrite the code above so that it takes into account the range from 100000 to 1000000

def rich_people(df, a, b): return df[(df.annual_inc >= a) & (df.annual_inc <= b)] rich_people(df, 100000, 1000000)

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new hot_years dataframe, group the data by the "year" column, average the values in the "av_temp" column, and leave only those lines in the dataframe where av_temp > 15

def hot_years_create(df_ru, min_temp): df_ru_hot = df_ru.groupby(['year'])['av_temp'].mean().reset_index() return df_ru_hot[df_ru_hot['av_temp'] > min_temp] hot_years_create(df_ru, 15)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if load >= clothes: return water else: return water * 1.1 ** (clothes - load) how_much_water(5, 10, 14)

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): return bin(n)

Python

generate functionThu, 20 Apr 2023

When seed=42 is fixed, try experimenting with the differential_evolution algorithm parameters: strategy, popsize, mutation. import numpy as np from scipy.optimize import differential_evolution import matplotlib.pyplot as plt %matplotlib inline def ackley(x): arg1 = -0.2 * np.sqrt(0.5 * (x[0] 2 + x[1] 2)) arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e bounds = [(-10, 10), (-10, 10)] result = differential_evolution(ackley, bounds, seed=42) ###YOUR CODE HERE

artificial parameters def ackley(x): arg1 = -0.2 * np.sqrt(0.5 * (x[0] ** 2 + x[1] ** 2)) arg2 = 0.5 * (np.cos(2. * np.pi * x[0]) + np.cos(2. * np.pi * x[1])) return -20. * np.exp(arg1) - np.exp(arg2) + 20. + np.e bounds = [(-10, 10), (-10, 10)] def plot_optimisation(strategy): result = differential_evolution(ackley, bounds, strategy=strategy, seed=42) x = np.linspace(-10, 10, 100) y = np.linspace(-10, 10, 100) X, Y = np.meshgrid(x, y) Z = np.array([ackley([x, y]) for x, y in zip(np.ravel(X), np.ravel(Y))

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round(a, b): return a + b add(1, 2.5)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 We need to find customers who have contacted us several times and make a separate table with them. That is, we need to find duplicate values in the name column and leave only them, all the rest removed. Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

def remove_duplicate(df): # remove duplicates in column name return df[df.duplicated(subset="name", keep=False)]

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return

def interval(n, mean, sig, conf): h = sig / (n ** 0.5) * norm.ppf(conf) return int(h)

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = * return * print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

def sum_matrix(N): my_matrix = np.diagonal(np.linspace(N, 0, N)) return my_matrix.sum() print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15))

Python

generate functionTue, 13 Dec 2022

Make a function that returns the value multiplied by 50 and increased by 6. If the value entered is a string it should return "Error".

def math(a): if type(a) == 'str': return "Error" else: return (a * 50) + 6 math(5)

Python

generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 2), columns=['Col1', 'Col2']) X_train, X_test, y_train, y_test = train_test_split( df['Col1'], df['Col2'], random_state=1 ) def convert_to_float(x_train, x_test, y_train, y_test): return ( x_train.astype(float), x_test.astype(float), y_train.astype(float), y_test.astype(float) ) convert_to_float(X_train, X_test, y_train, y_test)

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt df = pd.DataFrame(data=[1,1,2,2,4,1,3,2,5,4,3,2,4,1,3,4,3,2,4,5,1,3], columns=['water']) df['nutri'] = pd.DataFrame(data=[1,2,2,4,6,2,4,5,4,5,6,4,3,3,5,5,6,5,4,3,3,5], columns=['nutri']) df['mineral'] = pd.DataFrame(data=[2,1,1,3,2,4,1,2,5,4,3,3,2,2,3,1,3,4,5,4,1,3], columns=['mineral']) plt.figure(figsize=(12,5)) sns.

Python

generate functionTue, 14 Feb 2023

Rewrite the function below so it is called 'is_month_end' instead of 'is_quarter_end' and output 1 if the 'Date' column shows the last day of the month df['is_quarter_end'] = np.where(df['month']%3==0,1,0) df.head()

def is_month_end(date): if date[-2:] == '31': return 1 else: return 0 df['is_month_end'] = df['Date'].apply(is_month_end) df.head()

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to rows

def columns_to_rows(dataframe): new_dataframe = pd.DataFrame(dataframe.loc[0]).T return new_dataframe

Python

generate functionMon, 26 Dec 2022

Add to this code the numbering of all group_list values from 1 dct = convert_to_dict(students) def group_list(dct, gr): group_list = [] for k, v in dct.items(): if gr in v: group_list.append(' '.join(dct[k][0:3])) group_list.sort() return group_list group_list(dct, 'BST161')

print(', '.join(map(str, range(1, len(group_list(dct, 'BST161')) + 1))))

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_key(dct): max_v = 0 max_k = ' ' for key in dct: if dct[key]['Value'] > max_v: max_v = dct[key]['Value'] max_k = key return max_k dct = {'a': {'Value': 1, 'Other': 2}, 'b': {'Value': 5, 'Other': 4}, 'c': {'Value': 3, 'Other': 4}} max_key(dct)

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def grass (df): df = df[df['Class 1'] != 'Rock'] df = df[df['Class 2'] != 'Rock'] return df

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Merge the rows with duplicates and sort the name column in ascending order

def filter_df(df, column): return df[df[column].duplicated(keep=False)].sort_values(column) df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Ulyana Selezneva', 'Konstantin Prokhorov', 'Petrov Vladimir', 'Arina Selivanova', 'Ksenia Rodionova'], 'profit_per_room': [1639.000000, 930.000000, 1057.720000, 1403.000000, 1639.000000, 1639.000000]}) filter_df(df, 'name')

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

df = pd.DataFrame([['Ksenia Rodionova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova']]).T df.columns = ['name'] df.drop_duplicates(keep = 'first', inplace = True) df.sort_values(by = 'name', ascending = True)

Python

generate functionTue, 21 Feb 2023

There is a Pandas dataframe: Type App 0 Paid 757 1 Free 8902 Construct a pie chart with these values in percent

df_type['App'].groupby(df_type['Type']).sum().plot(kind='pie', figsize=(5, 6), autopct='%1.1f%%', # add in percentages startangle=90, # start angle 90° (Africa) shadow=True, # add shadow ) plt.title('Pie chart of the Repartition between Free and Paid Apps') plt.axis('equal') # Sets the pie chart to look like a circle. plt.show()

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that replace all rows with letters in the column to NaN. Example: find_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def find_non_numbers(df, col): for i in df[col]: if type(i) == str: i = np.nan find_non_numbers(df, 'loan_amnt')

Python

generate functionWed, 18 Jan 2023

There is a dataframe of Pandas: id movie rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (1994) 3.5 2 Three Colors: Blue (1993) 5.0 3 Underground (1995) 5.0 4 Singing in the Rain (1952) 3.5 Write a function that creates a new 'class' column and substitutes values there depending on the rating: - grade 2 and below - low rating; - grade 4 and below - average rating; - 4.5 and 5 - high rating.

def ratings(x): if x <= 2.0: return 'Low' elif x <= 4.0: return 'Average' else: return 'High' df['rating'] = df['rating'].apply(ratings) df.head()

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose(matrix): return list(map(list, zip(*matrix))) transpose(matrix)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column

sl = [0.8,0.9,0.9,1.0,1.1,1.1,1.2,1.4,1.4,1.5,1.6,1.6,1.7,1.7,1.8,1.8,1.8,1.9,1.9,2.0,2.0,2.1,2.2,2.2,2.2,2.2,2.2,2.2,2.2,2.3,2.3,2.3,2.3,2.3,2.4,2.4,2.4,2.4,2.4,2.4,2.4,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.5,2.

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column.

def chart_to_hits(df): df['hits'] = df.groupby('performer')['song'].apply(lambda x: ','.join(x)).reset_index()['song'] df.drop_duplicates(subset = 'performer', inplace = True) return df

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 3819 Imagine Dragons Radioactive, Natural, Believer, Thunder, On To... 2012 11101 9304.0 2674 Ed Sheeran Thinking Out Loud, Photograph, Sing, Don't, On... 2014 10685 10039.0 9590 The Weeknd Starry Eyes, Save Your Tears, Acquainted, Blin... 2022 9429 6713.0 5812 Mariah Carey Without You/Never Forget You, Don't Forget Abo... 1994 8735 6616.0 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0 Write a function divide_decade(df, decade) which, when a decade is specified, will perform the appropriate filter of the chart_debut column on the grouped Performer column. For example divide_decade(df, 1989) 5749 Madonna Like A Prayer, Erotica, Vogue, Lucky Star, Cau... 1989 8391 7534.0

def divide_decade(df, decade): decade_df = df[df['chart_debut'] // 10 == decade // 10] return decade_df

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a while operator that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

arr = [2,1,10,5] def sum(arr): i = 0 res = 0 while i < len(arr): j = i + 1 while j < len(arr): res += arr[i] + arr[j] j += 1 i += 1 return res sum(arr)

Python

generate functionThu, 23 Mar 2023

Write code in Python that downloads all the text file names at url: http://stup.pro/wp-content/uploads/2023/03/*.txt into one list of book_files

import requests from bs4 import BeautifulSoup def get_book_files(url): text_files = [] resp = requests.get(url) soup = BeautifulSoup(resp.text, 'lxml') for link in soup.find_all('a', href=True): if '.txt' in link['href']: text_files.append(link['href']) return text_files books = get_book_files('http://stup.pro/wp-content/uploads/2023/03/') print(books)

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. import numpy as np from scipy import stats def seed(seed): np.random.seed(seed) n = np.random.choice(range(10,26)) mean = np.random.choice(range(120,141)) sig = np.random.choice(range(10,21)) conf = np.random.choice([0.90, 0.95, 0.98, 0.99, 0.999]) return n, mean, sig, conf def interval(n, mean, sig, conf): h = return print(interval(seed(12))) print(interval(seed(45))) print(interval(*seed(7))) The result of the function should be three values: 12, 28, 21

import numpy as np from scipy import stats def seed(seed): np.random.seed(seed) n = np.random.choice(range(10,26)) mean = np.random.choice(range(120,141)) sig = np.random.choice(range(10,21)) conf = np.random.choice([0.90, 0.95, 0.98, 0.99, 0.999]) return n, mean, sig, conf def interval(n, mean, sig, conf): h = sig * stats.t.ppf((1+conf)/2, n-1) / np.sqrt(n) return np.round(h) print(interval(*seed(12))) print(interval(*seed(45))) print(interval(*seed(7))) The result of the function should be three values: 12, 28, 21

Python

generate functionFri, 17 Mar 2023

Determine the required sample size for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number.

import math def get_sample_size(error, variance, confidence): sqrt_variance = math.sqrt(variance) return math.ceil((error * math.sqrt(2 * (1 - confidence)) * sqrt_variance) / (error * error)) get_sample_size(0.02, 0.05, 0.95)

Python

generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined' df2['region'] = df2['keyword'].apply(geo_class)

df2['region'] = df2['keyword'].apply(geo_class)

Python

generate functionTue, 13 Dec 2022

Given an array of numbers, check if any of the numbers are the character codes for lower case vowels (a, e, i, o, u). If they are, change the array value to a string of that vowel. Return the resulting array. Example: Input: inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] Output: [118,117,120,121,117,98,122,97,120,106,104,116,113,114,113,120,106 ]

def replace_vowels(vowels): vowel_codes = [97, 101, 105, 111, 117] # a, e, i, o, u result = [] for vowel in vowels: if isinstance(vowel, str): result.append(vowel) elif vowel in vowel_codes: result.append(chr(vowel)) else: result.append(vowel) return result inp = [118, "u",120,121,"u",98,122,"a",120,106,104,116,113,114,113,120,106 ] print(replace_vowels(inp))

Python

generate functionThu, 16 Feb 2023

Write a function that finds all object, string values in a Pandas dataframe column 'values'

def object_finder(row): if type(row['values']) == str or type(row['values']) == list: return row['values'] else: return None df['object'] = df.apply(object_finder, axis=1)

Python

generate functionSat, 04 Mar 2023

Change the code of this graph so that the x-axis reference scale (x-limit) starts at 6 top20 = df19[['Country or region', 'Score']].sort_values(by='Score', ascending=False).head(20) sns.barplot(data=top20, x='Score', y='Country or region', color='#5ed14f')

sns.barplot(data=top20, x='Score', y='Country or region', color='#5ed14f') plt.xlim(6,8)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

# 1. how much water does my washing machine use # 2. how much water does my clothes need for washing # 1. 5 litres # 2. (1.1 ^ (14 - 10)) * 5 # 3. 5 * 1.1 ^ 4 # 4. 5 * 1.1 * 1.1 * 1.1 * 1.1 = 7.4074 def how_much_water(water, load, clothes): return (1.1 ** (clothes - load)) * water print(how_much_water(5, 10, 14))

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name district injury_result incident_type victim_category victim_age 0 2015 01.01.2015 1 London General Go-Ahead Southwark 1 Onboard Injuries 7 10 1 2015 01.01.2015 4 Metroline Metroline Islington 1 Onboard Injuries 7 2 2 2015 01.01.2015 5 East London Stagecoach Havering 7 Onboard Injuries 7 8 3 2015 01.01.2015 5 East London Stagecoach None London Borough 7 Onboard Injuries 7 8 4 2015 01.01.2015 6 Metroline Metroline Westminster 5 Onboard Injuries 10 8 generate a unique id from 1 to 23158 for each line of the pandas dataframe

def generate_unique_id(dataframe): dataframe.index += 1 return dataframe generate_unique_id(df_new)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer, but not less than 1 and not more than 4: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def respiratory_rate_function(x): if x < 1: return 1 elif x > 4: return 4 else: return round(x) df2['respiratory_rate'] = df2['respiratory_rate'].apply(respiratory_rate_function)

Python

generate functionWed, 01 Feb 2023

There is a Pandas function: df[['performer', 'time_on_chart']].groupby('performer').agg({'min', 'max'}) Sort it by 'max' column

def sort_by_max(x): return x.sort_values(by='max', ascending=True)

Python

generate functionThu, 12 Jan 2023

There is a table: userId movieId rating timestamp title genres 0 1 296 5.0 1147880044 Pulp Fiction (1994) Comedy|Crime|Drama|Thriller 1 1 306 3.5 1147868817 Three Colors: Red (Trois couleurs: Rouge) (1994) Drama 2 1 307 5.0 1147868828 Three Colors: Blue (Trois couleurs: Bleu) (1993) Drama 3 1 665 5.0 1147878820 Underground (1995) Comedy|Drama|War 4 1 899 3.5 1147868510 Singin' in the Rain (1952) Comedy|Musical|Romance Write code for Pandas that shows the movie with the most rows with a value of 5.0 in the rating column

def most_5(data): data.groupby('movieId')['rating'].count() return data.sort_values(by = 'rating', ascending = False).head(1) most_5(df)

Python

generate functionMon, 12 Dec 2022

Create a program that will take two lists of integers, a and b. Each list will consist of 3 positive integers above 0, representing the dimensions of cuboids a and b. You must find the difference of the cuboids' volumes regardless of which is bigger. For example, if the parameters passed are ([2, 2, 3], [5, 4, 1]), the volume of a is 12 and the volume of b is 20. Therefore, the function should return 8. Write solution in one line of code.

def find_difference(a, b): return abs(reduce(lambda x,y: x*y, a) - reduce(lambda x,y: x*y, b))

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of dates for all the operators

def avg_date_by_operator(df_new): # YOUR CODE HERE # raise NotImplementedError() return df_new.groupby("operator").agg("date").mean() df_2015 = pd.read_csv("bus_trucks_2015.csv") avg_date_by_operator(df_2015)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series: Day Tuesday 358114 Wednesday 345393 Thursday 323337 Friday 293805 Saturday 292016 Monday 278905 Sunday 273823 construct a horizontal bar graph with these data

import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({'Day':['Tuesday','Wednesday','Thursday','Friday','Saturday','Monday','Sunday'], 'Value':[358114,345393,323337,293805,292016,278905,273823]}) df.plot.barh()

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated. Sort the name values in ascending order

df[df.duplicated(subset=['name'], keep= False)][['name']]

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

import pandas as pd # function that: calculate the average temperature in countries def average_temp_of_country(df): return df.groupby('country')['av_temp'].mean() # function that: build a list of the 20 coldest countries in ascending av_temp order def coldest_20_countries(df): return df.groupby('country')['av_temp'].mean().sort_values()[:20]

Python

generate functionMon, 27 Mar 2023

The code below gives an error TypeError: unhashable type: 'list'. Fix it from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

Python

generate functionFri, 17 Mar 2023

Определите размер необходимой выборки для исследования среднего чека за кофе в городе N, если известны следующие параметры: погрешность (delta); дисперсия (sigsqr); уровень доверия (conf). Ответ округлите до целого.

from math import sqrt def get_sample_size(delta, conf, sigsqr): z = 1.96 # z-score for 95% confidence level return int(sigsqr * z**2 / delta**2) get_sample_size(100, 0.95, 2000) # => 477

Python

generate functionMon, 27 Mar 2023

This code removes stopwords in lists incorrectly: without_sw = [[word for word in text if word not in stopwords_set] for text in only_words_text] It turns out like this: [[g], [u, n, l], [j, u, r, n, g], [p, n, ] It should be like this: [go, until, jurong, point Correct it.

stopwords_set = set(stopwords.words('english')) only_words_text = [[word for word in word_tokenize(text.lower()) if word not in stopwords_set] for text in df_message["text"]]

Python

generate functionFri, 23 Dec 2022

Write a simple regex to validate a username. Allowed characters are: lowercase letters, numbers, underscore Length should be between 4 and 16 characters (both included).

my_regex = re.compile(r"[a-z][A-Z][0-9]{4,16}")

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a "data" dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

def dataframe(sl, sw): data = {'sl': sl, 'sw': sw} df = pd.DataFrame(data) return df sl = [[-0.90068117], [-1.14301691], [-1.38535265], [-1.50652052], [-1.02184904], [-0.53717756], [-1.50652052], [-1.02184904], [-1.74885626], [-1.14301691]] sw = [[3.5], [2.5], [2.4], [1.5], [3.5], [2.2], [2.1], [1.5], [1.1], [1.3]] dataframe(sl, sw)

Python

generate functionSat, 25 Feb 2023

x1 = (round(g1['rate_group'].value_counts() / g1['rate_group'].value_counts().sum() * 100, 2)) x2 = (round(g2['rate_group'].value_counts() / g2['rate_group'].value_counts().sum() * 100, 2)) x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') Add to the code above the caption of the legends on the chart: - for x1 = 'x1' - for x2 = 'x2'

def my_function(x1, x2): plt.legend(loc='upper left') plt.title('Сравнение распределений с собственным жильем и без') x1.plot(kind='hist', alpha=0.5, bins=6, density=True) x2.plot(kind='hist', alpha=0.5, bins=6, density=True) return x1, x2 my_function(x1, x2)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 There is a Pandas dataframe that has column 'loan_amnt' with both numbers and letters and words. Write a function that checks the rows in column 'loan_amnt' for numbers and letters. If the string has numbers it is converted to float64 If the string is not a number, it is replaced by np.nan. Example: replace_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def replace_non_numbers(df, column): df[column] = pd.to_numeric(df[column], errors = 'coerce')

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: date city hotel total_rooms ocup_rooms revenue oper_costs adv_costs profit 0 2021-07-01 Yalta Rapsodia 33 24 78936 25641 19305 33990 1 2021-07-01 Yalta AquaMania 28 14 51156 21756 16380 13020 2 2021-07-01 Yalta Skyline 15 12 38016 11655 8775 17586 3 2021-07-01 Alushta Breeze 36 25 75475 27972 21060 26443 4 2021-07-01 Alushta Moreon 20 17 51091 15540 11700 23851 5 2021-07-01 Gurzuf Alpina 14 12 38736 10878 8190 19668 Write a function divide_hotels that creates new columns big_hotels, medium_hotels, small_hotels, and adds values according to the following conditions: 1) if df['total_rooms'] > 30, then profit value is substituted into big_hotels column 2) if df['total_rooms'] > 20, then the profit value is substituted in the medium_hotels column 3) if df['total_rooms'] > 10, then profit value is substituted for small_hotels column

import pandas as pd def divide_hotels(df): big_hotels = [] medium_hotels = [] small_hotels = [] for item in df['total_rooms']: if item > 30: big_hotels.append(df['profit']) elif item > 20: medium_hotels.append(df['profit']) elif item > 10: small_hotels.append(df['profit']) df['big_hotels'] = big_hotels df['medium_hotels'] = medium_hotels df['small_hotels'] = small_hotels return df

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) for each NaN line in this column, find a non-NaN line which has similar values in the pulse or respiratory_rate columns (the difference range is 20%) 3) replace NaN with values from the corresponding rows

def my_f(col_name, df): # get the rows for which the column is NaN df = df[ np.isnan(df[col_name]) ] # find the rows that have similar pulse df_pulse = df[ abs( (df.pulse - df.pulse.shift(1)) / df.pulse ) <= 0.2 ] # find the rows that have similar respiratory_rate df_respiratory_rate = df[ abs( (df.respiratory_rate - df.respiratory_rate.shift(1)) / df.respiratory_rate ) <= 0.2 ] # merge the dataframes df_merged = pd.concat( [df_pulse, df_respiratory_rate] ) # get rid of duplicates df_merged = df_merged.drop_duplicates() # return the rows return df_merged

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(item, list1, replacing_number): for i in range(0,len(list1)): for j in range(0,len(list1[i])): if list1[i][j] == item: list1[i][j] = replacing_number replace('4004', students, '9090')

Python

generate functionMon, 27 Mar 2023

The code below gives an error TypeError: unhashable type: 'list'. Fix it from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

from gensim import similarities cos_sim = similarities.MatrixSimilarity(tfidf[bows])

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def filter_coldest(df, year_threshold): return df[df['year'] > year_threshold]\ .groupby('country')['av_temp'].mean()\ .sort_values()[:20] filter_coldest(df, 1980)

Python

generate functionFri, 17 Mar 2023

There are two lists with numbers: sync = [85.1 83.8 69.9 82.1 84.4 80.4 78.1 88.4 77. 91.5 76.7 86.6 91.8 73.3 83.9 76.7 85.8 89.6 91.7 87.2 79. 85.3] asyncr = [89.8 81.6 87.4 81. 66.9 72.5 78.4 68.5 78.3 62.6 73.7 77.7 63. 77.5] Write code that will calculate the similarity statistics of these samples, p-value and display them

from scipy import stats import numpy as np sync = [85.1, 83.8, 69.9, 82.1, 84.4, 80.4, 78.1, 88.4, 77., 91.5, 76.7, 86.6, 91.8, 73.3, 83.9, 76.7, 85.8, 89.6, 91.7, 87.2, 79., 85.3] asyncr = [89.8, 81.6, 87.4, 81., 66.9, 72.5, 78.4, 68.5, 78.3, 62.6, 73.7, 77.7, 63., 77.5] stats.ttest_ind(sync, asyncr, equal_var = False) np.var(sync), np.var(asyncr)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 'loan_amnt' 10 78 54 GOOD 64 23 Write a function that finds all rows with letters in the column and replace them with NaN. Example: find_non_numbers(df, 'loan_amnt') df Result: 10 78 54 NaN 64 23

def find_non_numbers(df, column): return df[~df[column].astype(str).str.isnumeric()]

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Calculate the average lifetime of users who gave more than 100 ratings. The lifetime is the difference between the maximum and minimum values of the timestamp column for a given userId.

df = pd.read_csv('ratings.csv') df.head() df.columns df.groupby('userId').size() df.groupby('userId').size().index df.groupby('userId').size().values #df.groupby('userId').size().values > 100 df.groupby('userId').size().values[df.groupby('userId').size().values > 100] df.groupby('userId').size().index[df.groupby('userId').size().values > 100] df[df['userId'] == 1] df[df['userId'] == 1].timestamp df[df['userId'] == 1].timestamp.diff() df[df['userId'] == 1].timestamp.diff().min() df[df['userId'] == 1].timestamp.diff().max() df[df['userId'] == 1].timestamp.diff().max() - df[df['userId'] == 1].timestamp.diff().min() def get_lif

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you convert all date values to years. Example: 1744-01-01 Date column: 1744

df = pd.DataFrame({'date': ['1743-12-01', '1744-01-01', '1744-02-01', '1744-03-01', '1744-08-01'], 'av_temp': [0, 10, 20, 30, 40], 'deviations': [0, 10, 20, 30, 40], 'country': ['Åland', 'Åland', 'Åland', 'Åland', 'Åland'] })

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

# Find the rows in the dataview where the values in the name column are duplicated. df[df.duplicated(subset='name', keep=False)] # Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order df.sort_values("name").drop_duplicates(subset="name", keep='first')

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

from collections import Counter def most_incident(df, column): # Count incident_type by route incident_id = Counter(df[column]) # Return key of the most incident_type return incident_id.most_common(1)[0][0] most_incident(df_new, 'route')

Python

generate functionWed, 22 Mar 2023

There are two variables p.speed and boosted_p.speed. Plot two boxplots with their values next to each other on the same graph

def plot_speed(data, speed_name, boosted_speed_name): fig = plt.figure() boxes = fig.add_subplot(1,2,1) boxes = fig.add_subplot(1,2,2) data[speed_name].plot.box(ax = boxes, color = 'blue') data[boosted_speed_name].plot.box(ax = boxes, color = 'red') plt.show()

Python

generate functionFri, 17 Mar 2023

Create a Python function calculating the confidence interval for the mean value if you know - the sample size (n); - the mean value of the sample (mean); - standard deviation (sig); - level of confidence (conf). Import necessary libraries. In function return specify the length of the resulting confidence interval (the difference between the right and left ends), rounded to integers. def interval(n, mean, sig, conf): h = return The result of the function should be three values: 12, 28, 21

# Import necessary libraries from math import sqrt, erf def interval(n, mean, sig, conf): z = erf(conf + 0.5) h = z * sig / sqrt(n) return int(h) # The result of the function should be three values: 12, 28, 21 interval(100, 100, 10, 0.95) interval(100, 100, 10, 0.99) interval(100, 100, 10, 0.995)

Python

generate functionFri, 03 Feb 2023

Paste the id for the pandas dataframe

def get_id(df): return df.id get_id(df)

Python

generate functionMon, 13 Feb 2023

There is Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 0 38.50 66 3 3 ? 2 ? 0 1 39.2 88 ? ? 4 1 ? 0 2 38.30 40 1 1 3 1 ? 0 3 39.10 164 4 1 6 2 1 0 4 37.30 104 ? ? 6 2 ? 0 Write a function that will change the parameters of all columns to int64

def to_int64(df): for i in df.columns: df[i] = df[i].astype('int64') return df

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type. Sort them in descending order by incident_type column

def get_most_incidents_route(df): # count incident_type by the route df_count = df.groupby('route').count() # sort the df_count by incident_type and get the first row df_count.sort_values(by='incident_type', inplace=True, ascending=False) return df_count.iloc[0] get_most_incidents_route(df_tfl)

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

import pandas as pd data = pd.DataFrame({ 'id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'title': ['Pulp Fiction (1994)', 'Three Colors: Red (Trois couleurs: Rouge) (1994)', 'Three Colors: Blue (Trois couleurs: Bleu) (1993)', 'Underground (1995)', 'Singin\' in the Rain (1952)', 'Dirty Dancing (1987)', 'Delicatessen (1991)', 'Ran (1985)', 'Seventh Seal, The (Sjunde inseglet, Det) (1957)', 'Bridge on the River Kwai, The (1957)'], 'rating': [5.0, 3.5, 5.0, 5.0, 3.5, 4.0, 3.5, 3.5, 5.0, 4.0] }) def film_rating(df): return df.groupby('title')['rating'].mean().reset_index().

Python

generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

f = open('purchase_log.txt', encoding='utf-8') purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1].strip('"') values = line[3].strip('"') purchases[keys] = values

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

df.groupby('userId')['timestamp'].agg(['max', 'min']) df['av_ltv'] = df.groupby('userId')['timestamp'].agg(['max', 'min'])['max'] - df.groupby('userId')['timestamp'].agg(['max', 'min'])['min']

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): return bin(n)[2:]

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

data.name.duplicated() data[data.name.duplicated()] data[data.name.duplicated()].sort_values(by='name')

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy Add it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def y_axis(question6): question6 = df[question6].value_counts() label = question6.index counts = question6.values return counts def x_axis(question6): question6 = df[question6].value_counts() label = question6.index counts = question6.values return label def plot(question6): fig = px.bar(x=y_axis(question6), y=x_axis(question6), orientation='h') fig.update_layout(title_text=question6) fig.show() plot('What is the most preferred working environment for you.')

Python

generate functionTue, 21 Mar 2023

Write code that draws two histograms A and B superimposed on each other with bins=12 and legend

import matplotlib.pyplot as plt def draw_histogram(a,b,legend): plt.hist(a, bins=12) plt.hist(b, bins=12) plt.title("Normal Distribution") plt.xlabel("x") plt.ylabel("Frequency") plt.legend(legend) plt.show() A = [1,2,3,4,5,6,7,8,9,10,11,12] B = [1,2,3,4,5,3,3,2,2,2,2,1] legend = ["A", "B"] draw_histogram(A,B,legend)

Python

generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): #your code here return []

Python

generate functionWed, 01 Feb 2023

There is a function for Pandas dataframe: dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp) Add to it the function of sum of all values in the time_on_chart and consecutive_weeks columns if the values in the performer column are the same. The total should be displayed in the same columns (time_on_chart and consecutive_weeks)

dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp['time_on_chart'] = dfp.groupby('performer')['time_on_chart'].transform(lambda x: x.sum()) dfp['consecutive_weeks'] = dfp.groupby('performer')['consecutive_weeks'].transform(lambda x: x.sum()) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp)

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp, year): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy.loc[year] year_leaders(dfp, '1956')

Python

generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data[key]: if value in row: return key return 'undefined'

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

import pandas as pd def get_line_id(row): return row.name df['line_id'] = df.apply(lambda row: get_line_id(row), axis=1)

Python

generate functionWed, 15 Mar 2023

There is the "sl" list and there is the "sw" list. Write code that will create a scaled_data dataframe with the values from "sl" in the first column and "sw" in the second column. The values in the lists are contained in the following format: [[-0.90068117] [-1.14301691] [-1.38535265] [-1.50652052] [-1.02184904] [-0.53717756] [-1.50652052] [-1.02184904] [-1.74885626] [-1.14301691]]

def create_dataframe(sl, sw): scaled_data = pd.DataFrame({ "sl": sl, "sw": sw }) return scaled_data create_dataframe(sl, sw)

Python

generate functionTue, 14 Mar 2023

Change the code of the graph so that only integers are displayed on the x-scale and there is a grid of integers on the whole graph plt.plot(ks, k_inertia) plt.plot(ks, k_inertia ,'ro') plt.xlabel('Количество кластеров') plt.ylabel('Инерция')

ax.set_xticks(np.arange(0, 80, 10)) plt.grid(True)

Python

generate functionWed, 08 Feb 2023

def get_needed_posts(query): for article in articles: try: title = article.find('h2', class_='tm-article').text date = article.find('span', class_='tm-article').text.strip() link = article.find('h2', class_='tm-article').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the Python function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame(columns=['date', 'title', 'link']) articles = BeautifulSoup(requests.get(f'https://habr.com/ru/search/?target_type=posts&q={query}').text, 'lxml').find('div', class_='content-list').find_all('article', class_='post post_preview') for article in articles: try: title = article.find('h2', class_='post__title').text date = article.find('span', class_='post__time').text.strip() link = article.find('h2', class_='post__title').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

# import data import pandas as pd data = pd.read_csv("ml-100k/u.data", sep="\t", names=["user_id", "item_id", "rating", "timestamp"]) data.head() # create a new dataframe that counts the number of ratings per user user_data = data.groupby('user_id').count() user_data.head() # create a new column that is true if a user has rated 100 or more user_data['hasrated'] = user_data['rating'] >= 100 user_data.head() # create a new dataframe from the original data, where user_id is in the list # user_id's that have more than 100 ratings rating_data = data[data['user_id'].isin(user_data[user_data['hasrated'] == True].index)] rating_data.head()

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 Add an "av_ltv" column to the dataframe, in which you calculate the average lifetime of the users. Lifetime refers to the difference between the maximum and minimum values of the timestamp column for a given userId value.

import pandas as pd df = pd.read_csv('./movies.csv') df['av_ltv'] = df.groupby('userId')['timestamp'].transform(lambda x: x.max() - x.min()) df

Python

generate functionTue, 24 Jan 2023

there are two Pandas dataframes: rzd client_id rzd_revenue 0 111 1093 1 112 2810 2 113 10283 3 114 5774 4 115 981 auto client_id auto_revenue 0 113 57483 1 114 83 2 115 912 3 116 4834 4 117 98 Create a new dataframe joined and merge these 2 dataframes inside it

import pandas as pd rzd = pd.DataFrame( { 'client_id': [111, 112, 113, 114, 115], 'rzd_revenue': [1093, 2810, 10283, 5774, 981] } ) auto = pd.DataFrame( { 'client_id': [113, 114, 115, 116, 117], 'auto_revenue': [57483, 83, 912, 4834, 98] } ) def merge_df(rzd, auto): joined = rzd.merge(auto, how='outer', on='client_id') return joined

Python

generate functionSat, 25 Feb 2023

This code gives an error "expected string or bytes-like object" Refactor it: import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python

generate functionWed, 21 Dec 2022

Change this function so that the employee's awards are also taken into account (add this variable to the class) and the employee's promotion condition is taken into account by the formula if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() class Developer(Employee): def init(self, name, seniority): super().init(name, seniority) def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all of the developers pass the accreditation self.seniority += 1 # condition of promoting an employee from the presentation if self.seniority % 5 == 0: self.grade_up() # publication of the results return self.publish_grade()

def check_if_it_is_time_for_upgrade(self): if (self.seniority)+(self.intlawards*2) % 7 == 0: self.grade_up() return self.publish_grade()

Python

generate functionMon, 13 Feb 2023

Modify the function below so that it rounds to the nearest integer, but not less than 1 and not more than 4: def value_filler(row): filler = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) if filler < 1: return 1 elif filler > 4: return 4 else: return df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) df2['respiratory_rate'] = df2['respiratory_rate'].apply(value_filler)

def value_filler(row): filler = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median')) if filler < 1: return 1 elif filler > 4: return 4 else: return round(df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))) df2['respiratory_rate'] = df2['respiratory_rate'].apply(value_filler)

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) calculates the average temperature in countries (average av_temp by decade column grouped by country) 2) build a list of the 20 coldest countries in ascending av_temp order

import pandas import numpy as np df = pandas.DataFrame({'date': ['2013-05-01', '2013-06-01', '2013-07-01', '2013-08-01'], 'av_temp': [19.059, 17.613, 17.0, 19.759], 'deviations': [1.022, 0.473, 0.453, 0.717], 'country': ['Zimbabwe', 'Zimbabwe', 'Zimbabwe', 'Zimbabwe'], 'year': [2013, 2013, 2013, 2013], 'decade': ['2010-2020', '2010-2020', '2010-2020', '2010-2020']}) df

Python

generate functionWed, 01 Feb 2023

There is a function for Pandas dataframe: dfp = df.copy() dfp.sort_values(by='performer', inplace=True) def group_by_performer(data): dfp['hits'] = dfp.groupby('performer')['song'].transform(lambda x: ', '.join(x.unique())) dfp['time_on_chart'] = dfp.groupby('performer')['time_on_chart'].transform(lambda x: x.sum()) dfp['consecutive_weeks'] = dfp.groupby('performer')['consecutive_weeks'].transform(lambda x: x.sum()) dfp.drop_duplicates(subset='performer', inplace=True) dfp.reset_index(inplace=True) return dfp group_by_performer(dfp) Add a function to it to convert the values in the chart_debut column from the format '1966-06-25' to the format 1966, that is, just the year

def get_year(data): data['chart_debut'] = pd.to_datetime(data['chart_debut'], format='%Y-%m-%d').dt.year return data dfp = get_year(dfp)

Python

generate functionFri, 27 Jan 2023

Write a Pandas function that shows all str.contains = 'yandex' values

def show_yandex(df): return df[df['company'].str.contains('yandex')] show_yandex(df)

Python

generate functionWed, 18 Jan 2023

Count all value_counts in the decade pandas column of the dataview if the value in the rating column == 5.0

df = dataview[dataview['rating'] == 5.0] df['decade'].value_counts()

Python

generate functionMon, 13 Feb 2023

There is df2 Pandas dataframe: surgery age rectal_temp pulse respiratory_rate extremities_temp pain outcome 4 2.0 1 NaN NaN NaN 2.0 2.0 1.0 6 1.0 1 NaN 60.0 NaN 3.0 NaN 2.0 7 2.0 1 NaN 80.0 36.0 3.0 4.0 3.0 15 1.0 9 NaN 128.0 36.0 3.0 4.0 2.0 27 1.0 1 NaN NaN NaN NaN NaN 2.0 33 1.0 1 NaN 100.0 30.0 3.0 5.0 1.0 34 2.0 1 NaN 104.0 24.0 4.0 4.0 3.0 39 1.0 1 NaN 88.0 NaN 3.0 5.0 2.0 Write a function that: 1) takes as a parameter the column name (default = 'rectal_temp') 2) calculate the correlation of this column with other columns: pulse, respiratory_rate, extremities_temp, pain_outcome 3) fill all NaNs in the 'rectal_temp' column with the average values multiplied by the correlation weight

def fill_rectal_temp(df, col_name = 'rectal_temp'): corr = df.corr()[col_name][1:] corr_dict = corr.to_dict() corr_dict.pop('surgery') corr_dict.pop('outcome') corr_dict.pop('age') corr_dict.pop('pain') mean_col1 = df[col_name].mean() df_copy = df.copy() df_copy[col_name] = df_copy[col_name].fillna(0) for key, value in corr_dict.items(): col_mean = df[key].mean() df_copy[key] = df_copy[key].fillna(mean_col1*value) df_copy[col_name] = df_copy[col_name] + value*df_copy[key] return df_copy fill_rectal_temp(df2)

Python

generate functionThu, 16 Feb 2023

Write a function that loops through the columns in the dataframe and applies try .astype('float64') to them. If a string value is encountered, it will be assigned the value NaN.

def convert_float(dataframe): for c in dataframe.columns: try: dataframe[c] = dataframe[c].astype('float64') except: pass convert_float(dataframe)

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): return water * (1.1 ** (clothes - load)) how_much_water(5, 10, 14)

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import matplotlib.pyplot as plt from pandas.plotting import table fig, ax = plt.subplots(figsize=(10, 2)) # set size frame ax.xaxis.set_visible(False) # hide the x axis ax.yaxis.set_visible(False) # hide the y axis ax.set_frame_on(False) # no visible frame, uncomment if size is ok tabla = table(ax, df, loc='upper right', colWidths=[0.17]*len(df.columns)) # where df is your data frame tabla.auto_set_font_size(False) # Activate set fontsize manually tabla.set_fontsize(12) # if ++fontsize is necessary ++colWidths tabla.scale(1.2, 1.2) # change size table plt.savefig('table.png', transparent=True)

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def search_data(data, search_word): for i in range(len(data)): if search_word in data.loc[i, 'Class 1'] or search_word in data.loc[i, 'Class 2']: data = data.drop([i]) return data search_data(grass, 'Rock')

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations.

from scipy import stats import pandas as pd def check_for_significance(a, b): t, p = stats.ttest_ind(a, b) df = pd.DataFrame({"t" : t, "p-value" : p}, index = [0]) return df

Python

generate functionThu, 16 Feb 2023

Write a function that loops through the columns in the dataframe and applies try .astype('float64') to them. If a string value is encountered, it will be replaced by NaN.

def convert_float(df, cols): for col in cols: try: df[col] = df[col].astype('float64') except ValueError: df[col] = np.nan return df

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function howMuchWater (JS)/how_much_water (Python and Ruby) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(water, load, clothes): if clothes <= load: return water else: return water * 1.1 ** (clothes - load)

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have 100 or more values in the "rating" column

#leave the users who rated more than 100 films ratings_df = ratings_df.groupby("userId").filter(lambda x: x["rating"].count() > 100)

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists. If a match is found, only the 2nd value must be replaced by replacing_num stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace(stnums, students, replacing_num): for st_num in stnums: for student in students: if st_num in student: student[1] = replacing_num return students

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Filter this dataframe and leave only the rows where the values in the name column are duplicated

data[data.name.duplicated()]

Python

generate functionFri, 23 Dec 2022

Prolog: This kata series was created for friends of mine who just started to learn programming. Wish you all the best and keep your mind open and sharp! Task: Write a function that will accept two parameters: variable and type and check if type of variable is matching type. Return true if types match or false if not. Examples: 42, "int" --> True "42", "int" --> False

def type_check(v, t): if v in "0123456789": v = int(v) return type(v) == eval(t)

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_key_value(dct): # YOUR CODE HERE return (max(dct, key = lambda x: dct[x]['Value'])) pass

Python

generate functionTue, 21 Mar 2023

Rewrite this code to calculate stats instead of mannwhitneyu by scipy.stats wilcoxon stat, p = mannwhitneyu(data[data['version'] == 'gate_30']['sum_gamerounds'], data[data['version'] == 'gate_40']['sum_gamerounds']) print('Mann-Whitney Statistics=%.3f, p=%.3f' % (stat, p))

def mann_whitney(a,b): # ваш код

Python

generate functionWed, 22 Feb 2023

Modify this code to make a horizontal bar graph instead of a vertical (plotly.express library) question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = go.Figure(data=[go.Bar(x=label, y=counts)]) fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

df['question6'].value_counts().iplot(kind='bar', yTitle='Count', linecolor='black', title = 'Какая рабочая среда для вас наиболее предпочтительна?')

Python

generate functionTue, 10 Jan 2023

Create a numpy array with elements from number N to 0. For example, for N = 10 it would be array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

import numpy as np def array_creation(n): return np.arange(n - 1, -1, -1)

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

df_ru = pd.read_csv('https://raw.githubusercontent.com/OSU-geohackweek2020/tutorials/master/00_data/temperature/df_ru.csv') df_ru.head()

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Draw a boxplot graph with the x scale as 'hotel' and the y scale as 'av_revenue'

hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) hotels_rev.boxplot(column=['av_revenue'], by='hotel')

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose(matrix): return [list(i) for i in zip(*matrix)] transpose(matrix)

Python

generate functionMon, 27 Mar 2023

Write code that converts this Series object to an array object. data1 = df['Critic_Score'] The result should output data1 in the following format: array([43.48164727, 53.46433221, 45.08411794, 61.7768157 , 37.78951321, 48.271562 , 42.16219046, 46.88876532, 50.92428925, 43.03283088...])

# Convert the series to an array data1 = data1.values # Print the data type for data1 print(type(data1))

Python

generate functionThu, 02 Feb 2023

There is a Pandas dataframe: year date route operator group_name bus_garage bus_park injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Write a grouped_operators function that groups all values by operator column and sums all dates for each unique operator value

import pandas as pd data = {'year':[2015,2015,2015], 'date':['01.01.2015','01.01.2015','01.01.2015'], 'route':[1,4,5], 'operator':['London General','Metroline','East London'], 'group_name':['Go-Ahead','Metroline','Stagecoach'], 'bus_garage':['Southwark','Islington','Havering'], 'bus_park':['Garage Not Available','Garage Not Available','Garage Not Available'], 'injury_result':['Injuries treated on scene','Injuries treated on scene','Taken to Hospital – Reported Serious Injury or...'], 'incident_type':['Onboard Injuries','Onboard Injuries','Onboard Injuries'], 'victim_category':['Passenger','Passenger','Passenger'], 'victim_sex':['Male','Male','Male'], 'victim_age':['Child','Unknown','Elderly']

Python

generate functionWed, 21 Dec 2022

Change the class so that self.seniority defaults to 1 and self.intlawards = 2 class Designer(Employee): def init(self, name, seniority, awards): super().init(name, seniority) self.intlawards = awards def check_if_it_is_time_for_upgrade(self): # for each accreditation, increase the counter by 1 # for now we assume that all designers are accredited self.seniority += 1 # condition for promoting an employee from the presentation if (self.seniority) + (self.intlawards*2) % 7 == 0: self.grade_up() # publication of the results return self.publish_grade()

function that: takes two numbers as input and returns True if the first is bigger than the second and False otherwise.

Python

generate functionSun, 11 Dec 2022

A recipe book is given with information about how many ingredients are needed to prepare a dish per serving. Write a program that will multiply the number of servings to prepare these dishes and display the information about the total amount of ingredients required in the given form. Note: The same ingredients with different dimensions must be counted separately! The result should be: Cheese: 250 g Tomatoes: 10 pcs. Cucumbers: 60g def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions if key == 'salad': grams = 'quantity' * portions print(grams) cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

def total_ingredients(cook_book): dish = 'salad' portions = 5 grams = cook_book['quantity'] * portions cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'g'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pcs'}, {'ingridient_name': 'cucumbers', 'quantity': 100, 'measure': 'g'}]} total_ingredients(cook_book)

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

def plot_city_reviews(df): fig, ax = plt.subplots() df.sort_values('perc_of_5star', ascending=False)[:10].plot.barh(x='place', y='perc_of_5star', figsize=(10,6), ax=ax) _ = ax.set(ylabel='Decade', xlabel='% of 5-star reviews')

Python

generate functionTue, 31 Jan 2023

There is a Pandas dataframe: time_on_chart min max performer "Groove" Holmes 1 11 "Little" Jimmy Dickens 1 10 "Pookie" Hudson 1 1 "Weird Al" Yankovic 1 20 Sort these rows by the time_on_chart and max columns, in descending order, and display the first 20 rows

df.sort_values(['time_on_chart', 'max'], ascending=[False, False]).head(20)

Python

generate functionWed, 08 Mar 2023

Upgrade the code so that a roll of 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090 will drop 1 dollar for i in range(trials): num = np.random.randint(1, 1000) if num == 777: money += 200; successes += 1 elif num == 999: money += 100; successes += 1 elif num == 555: money += 50; successes += 1 elif num == 333: money += 15; successes += 1 elif num == 111: money += 10; successes += 1 elif ends77(num) == True: money += 5; successes += 1 elif ends7(num) == True: money += 3; successes += 1 elif ends00(num) == True: money += 2; successes += 1 elif ends0(num) == True: money += 1; successes += 1 else: money -= 1

def ends77(num): return num % 100 == 77 def ends7(num): return num % 10 == 7 def ends00(num): return num % 100 == 0 def ends0(num): return num % 10 == 0

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to rows

def row_from_columns(df): return df.melt(id_vars=['spi_rank', 'country', 'spi_score']) row_from_columns(df)

Python

generate functionThu, 16 Feb 2023

There is a Pandas Series column: 10 78 54 GOOD 64 23 Write a function that finds all rows with letters in the Series column. Example: find_non_numbers(df, 'loan_amnt') Result: GOOD

def find_non_numbers(df, col): return df[df[col].apply(lambda x: x.isalpha())]

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def RemoveRockGrass(dataset): for index, row in dataset.iterrows(): if (row['Class 1'] == 'Rock') or (row['Class 2'] == 'Rock'): dataset = dataset.drop(index) return dataset

Python

generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

X_train = [['a', 'b'], ['c', 'd']] y_train = ['a', 'b'] X_test = [['a', 'b'], ['c', 'd']] y_test = ['a', 'b'] def convert_to_float(text_data): return text_data.astype(float) X_train = convert_to_float(X_train) y_train = convert_to_float(y_train)

Python

generate functionTue, 13 Dec 2022

create a password verification function in one line. The verification conditions are: 1) the length should be bigger than 6; 2) should contain at least one digit. Input: A string. Output: A bool. Examples: assert is_acceptable_password("short") == False assert is_acceptable_password("muchlonger") == False assert is_acceptable_password("ashort") == False assert is_acceptable_password("muchlonger5") == True

def is_acceptable_password(password: str) -> bool: return len(password)>6 and any(char.isdigit() for char in password) is_acceptable_password("muchlonger5")

Python

generate functionSun, 18 Dec 2022

Replace re.split with regular split to get the same result (you can use the json library and the loads function) The code: f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

f = open('purchase_log.txt', encoding='utf-8') import re purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

Python

generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

#Python 3.7.3 #https://stackoverflow.com/questions/57774616/find-duplicates-in-list-of-tuples-in-python def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): for stnums in lst: if stnums[0] == replacing_num: stnums[0] = replacing_num students = [["1", "John", "Biology", "A"], ["2", "Mary", "Math", "C"], ["3", "Alex", "Computer Science", "B"], ["3", "Alex", "Computer Science", "B"]] print(students)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of "date" string values for all "operator"

df_new.groupby('operator').date.mean()

Python

generate functionSat, 04 Mar 2023

Add the x-axis value captions to the code of this graph and change its color to purple gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar')

gdpdiff[['Country or region', 'GDP per capita']].plot(kind='bar') plt.xlabel('Country or region') plt.ylabel('GDP per capita') plt.title('Change in GDP per capita') plt.show()

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) hotels_rev %matplotlib inline hotels_rev['hotel'].value_counts() hotels_rev.plot('hotel', 'av_revenue', kind = 'bar')

Python

generate functionSat, 11 Feb 2023

There is Pandas dataframe: Date Notifications Times opened Conversion 6 09/01/2022 23 57 2.478261 10 09/05/2022 24 51 2.125000 8 09/03/2022 15 25 1.666667 1 08/27/2022 43 48 1.116279 9 09/04/2022 29 30 1.034483 17 09/12/2022 99 102 1.030303 Write a function that will change all values > 1 in the Conversion column to 1

def change_conv_to_one(df): df.loc[df['Conversion'] > 1, 'Conversion'] = 1 return df

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: id route operator group_name district injury_result incident_type victim_category victim_age 2 3.0 1 London General Go-Ahead Southwark NaN Onboard Injuries 7 10 3 4.0 4 Metroline Metroline Islington NaN Onboard Injuries 7 2 4 5.0 5 East London Stagecoach Havering NaN Onboard Injuries 7 8 5 6.0 5 East London Stagecoach None London Borough NaN Onboard Injuries 7 8 Count which routes had the most incidents_type

import pandas as pd df = pd.read_csv('../data/bus_data.csv') df = df[df['incident_type']=='Onboard Injuries'][['route', 'incident_type']] df.groupby('route')['incident_type'].count().sort_values(ascending=False)

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'Message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the mean values of each column and transposes them to rows

import pandas as pd df = pd.read_csv('SDG_data.csv') def mean_row(data): df_2 = data[["basic_human_needs", "foundations_of_wellbeing", "opportunity", "basic_nutri_med_care", "water_sanitation", "shelter", "personal_safety", "access_to_knowledge", "access_to_communications", "health_wellness", "environmental_quality", "personal_rights", "personal_freedom", "inclusiveness", "access_to_advanced_education"]] mean_df = pd.DataFrame(df_2.mean(axis=0)).T return mean_df

Python

generate functionThu, 22 Dec 2022

Given a non-negative integer n, write a function to_binary/ToBinary which returns that number in a binary format. to_binary(1) # should return 1 to_binary(5) # should return 101 to_binary(11) # should return 1011

def to_binary(n): return bin(n)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round_to_nearest_integer(x): return round(x) df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform(round_to_nearest_integer))

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value

import pandas as pd import matplotlib.pyplot as plt data = {'water': [1, 2, 3, 4, 2, 4, 2, 4, 5, 2, 3, 4, 2, 1, 3, 4, 3, 2, 5, 1], 'nutri': [1, 2, 4, 6, 5, 6, 7, 5, 4, 5, 6, 7, 4, 3, 5, 5, 6, 5, 4, 3], 'mineral': [2, 1, 1, 3, 2, 4, 2, 4, 5, 4, 3, 2, 3, 2, 3, 1, 3, 4, 5, 1]} df = pd.DataFrame(data) df.plot(kind='bar', stacked=True)

Python

generate functionMon, 13 Feb 2023

Modify the formula below so that it rounds to the nearest integer: df2['respiratory_rate'] = df2['respiratory_rate'].fillna( df2.groupby('pulse')['respiratory_rate'].transform('median'))

def round_nearest_int(x): return round(x) df2['respiratory_rate'] = df2['respiratory_rate'].apply(round_nearest_int)

Python

generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace(stnums, students): # you code here

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order

hotel = pd.read_csv('hotel_bookings.csv', sep=';') hotel.duplicated(['name']) hotel[hotel.duplicated(['name'], keep = False)].sort_values(by = 'name')

Python

generate functionMon, 26 Dec 2022

There are two lists. Write a function that looks for matching values of these two lists and replaces them with replacing_num (but without touching the first one) stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ] replacing_num = '9090'

def replace_nums(stnums, students, replacing_num): for i in range(len(students)): for j in range(len(students[i])): if students[i][j] in stnums: students[i][j] = replacing_num return students replace_nums(stnums, students, replacing_num)

Python

generate functionFri, 27 Jan 2023

There is a Pandas dataframe user_id name date hotel profit_per_room how_find_us resting_time rating 0 1 Ksenia Rodionova 2021-07-01 Alpina 1639.000000 by_recommendation 48 3.0 1 2 Ulyana Selezneva 2021-07-01 AquaMania 930.000000 by_airbnb.com 97 4.0 2 3 Konstantin Prokhorov 2021-07-01 Breeze 1057.720000 agg_trivago.com 173 4.0 3 4 Petrov Vladimir 2021-07-01 Moreon 1403.000000 agg_onlinetours.ru 229 4.0 4 5 Arina Selivanova 2021-07-01 Alpina 1639.000000 agg_sutochno.ru 63 4.0 Find the rows in the dataview where the values in the name column are duplicated. Create a new dataview in which the first row of the duplicate and all subsequent ones will be added. Sort the name column in ascending order Example: name Ksenia Rodionova Artur Petrov Ivan Sidorov Ksenia Rodionova Result: name Ksenia Rodionova Ksenia Rodionova

# Challenge 1 import pandas as pd def duplicates(df): return df[df.name.duplicated()].sort_values(by='name', ascending=True).reset_index(drop=True) df = pd.DataFrame({'name': ['Ksenia Rodionova', 'Artur Petrov', 'Ivan Sidorov', 'Ksenia Rodionova'], 'date': ['2021-07-01', '2021-07-02', '2021-07-03', '2021-07-04']}) duplicates(df)

Python

generate functionFri, 03 Mar 2023

Correct the code of these plotly plots so that the y-axis is a logarithmic scale fig = go.Figure() fig.add_trace(go.Bar( x=data['Country'], y=data['Total Cases'], name='Заболевших', marker_color='indianred')) fig.add_trace(go.Bar( x=data['Country'], y=data['Total Deaths'], name='Умерших', marker_color='lightsalmon')) fig.update_layout(barmode='group', xaxis_tickangle=-45) fig.show()

fig = go.Figure() fig.add_trace(go.Bar( x=data['Country'], y=data['Total Cases'], name='Заболевших', marker_color='indianred')) fig.add_trace(go.Bar( x=data['Country'], y=data['Total Deaths'], name='Умерших', marker_color='lightsalmon')) fig.update_layout(barmode='group', xaxis_tickangle=-45, yaxis_type="log") fig.show()

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns hapscore = pd.DataFrame({'2015': [df15['Happiness Score'].mean()], '2016': [df16['Happiness Score'].mean()], '2017': [df17['Happiness.Score'].mean()], '2018': [df18['Score'].mean()], '2019': [df19['Score'].mean()]}) 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

hapscore.plot(kind='line')

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

def convert_to_str(dataframe): rank = str(dataframe['spi_rank']) country = dataframe['country'] return rank + ' ' + country convert_to_str(df.iloc[0])

Python

generate functionSat, 28 Jan 2023

There is a Pandas dataframe: hotels_rev = df1[['date', 'hotel', 'revenue', 'av_revenue', 'difference', 'in_percent']].sort_values(by=['hotel', 'date']) Construct a bar graph with 'hotel' values in the x axis and change values in the 'date' column and 'av_revenue' values in the y axis

def bar_graph(df): data = df data1 = df[['date', 'revenue']] data2 = data.groupby(['date']).mean() data2['date'] = data2.index data3 = data2.merge(data1, on = 'date') data3 = data3.groupby('hotel', as_index=False).diff() data3['in_percent'] = data3.apply(lambda row: round((row['revenue_x'] / row['revenue_y'] - 1) * 100, 2), axis = 1) data3.head() plt.figure(figsize=(12, 8)) plt.bar(data3['hotel'], data3['in_percent']) plt.xlabel('hotel') plt.ylabel('change in percent') plt.title('change in %') plt.show()

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Plot a bar chart with number_of_hits on the x-axis and performer on the y-axis.

import pandas as pd import matplotlib.pyplot as plt def plot_top_performers(dataframe, column, number_of_hits): return dataframe.nlargest(number_of_hits, column).plot.barh(x='performer', y='num_of_hits', title='Top 20 Performers') data = pd.DataFrame({'performer' : ['Glee Cast', 'Taylor Swift', 'Drake', 'YoungBoy Never Broke Again', 'Aretha Franklin', 'The Beatles'], 'num_of_hits' : [191, 166, 125, 75, 66, 66]}) plot_top_performers(data, 'num_of_hits', 6)

Python

generate functionMon, 12 Dec 2022

Create a function that returns the CSV representation of a two-dimensional numeric array. Example: input: [[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]] output: '0,1,2,3,4\n' +'10,11,12,13,14\n' +'20,21,22,23,24\n' +'30,31,32,33,34' Array's length > 2.

def to_csv_string(array): return '\n'.join(','.join(map(str,row)) for row in array)

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country 1 1743-12-01 NaN NaN Åland 2 1744-01-01 NaN NaN Åland 3 1744-02-01 NaN NaN Åland 4 1744-03-01 NaN NaN Åland 9 1744-08-01 NaN NaN Åland Create a new column year from date in which you pull out the first 4 digits of the year. Example: 1744-01-01 Date column: 1744 Example: 1744-01-01 Date column: 1744

def yearExtractor(date): year = int(date[:4]) return year df['year'] = df['Date'].apply(yearExtractor)

Python

generate functionSat, 28 Jan 2023

Construct 4 seaborn bar graphs from these data: 1) df1['city'].value_counts() 2) df2['hotel'].value_counts() 3) df2['date'].value_counts() 4) df2['rating'].value_counts()

def bar_plots(df): for col in df.columns: df[col].value_counts().plot.bar() return

Python

generate functionWed, 08 Feb 2023

def get_needed_posts(query): site = pd.DataFrame() for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article-snippet__title').text date = article.find('span', class_='tm-article-snippet__datetime-published').text.strip() link = article.find('h2', class_='tm-article-snippet__title').find('a').get('href') except: pass row = {'date': date, 'title': title, 'link': 'https://habr.com'+link} site = pd.concat([site, pd.DataFrame([row])]) return site.reset_index(drop=True) Modify the function above so that if it finds a repeated title or link, it will not add it to the return

def get_needed_posts(query): site = pd.DataFrame() links = [] titles = [] for q in query: URL = parseurl+'search/' params = { 'q': q } req = requests.get(URL, params=params) soup = BeautifulSoup(req.text) articles = soup.find_all('article', class_='tm-articles-list__item') for article in articles: try: title = article.find('h2', class_='tm-article-snippet__title').text date = article.find('span', class_='tm-article-snippet__datetime-published').text.strip() link = article.find('h2', class_='tm-article-snippet__title').find('a').get('href') if link not in links and title not in titles: titles.append(title) links.append(link) row = {'date': date, '

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns hapscore = pd.DataFrame({'2015': [df15['Happiness Score'].mean()], '2016': [df16['Happiness Score'].mean()], '2017': [df17['Happiness.Score'].mean()], '2018': [df18['Score'].mean()], '2019': [df19['Score'].mean()]}) 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

def graph(x, y): plt.plot(x, y) plt.show() graph(hapscore.columns, hapscore.loc[0])

Python

generate functionSun, 18 Dec 2022

Rewrite the code with a command that removes unnecessary quotation marks (' ') in the dictionary: Here's the problem: Output: {' "user_id"': ' "category"}', ' "1840e0b9d4"': ' "Products"}', How it should be: Output: {"user_id": "category"}, "1840e0b9d4": "Products"}, The code: f = open('purchase_log.txt', encoding='utf-8') import re import json purchases = {} for i, line in enumerate(f): line = re.split(r",|:", line.strip()) keys = line[1] values = line[3] purchases[keys] = values

def remove_quotes(s): if s[0] in '\'"' and s[-1] in '\'"': s = s[1:-1] return s keys = remove_quotes(line[1]) values = remove_quotes(line[3])

Python

generate functionFri, 17 Mar 2023

Write a function that will determine the size of the necessary sample for the study if the following parameters are known: - error (delta); - variance (sigsqr); - Confidence level (conf). Round up the answer to a whole number.

# function that determines the size of the necessary sample for the study def sample_size(delta, sigsqr, conf): # confidence level alpha = 1 - conf # standard error stderr = math.sqrt(sigsqr) # margin of error moe = stderr * norm.ppf(1 - alpha / 2) # sample size n = (norm.ppf(1 - alpha / 2) * stderr / delta) ** 2 # rounding up n n = math.ceil(n) return n # calling function sample_size(0.1, 1, 0.95)

Python

generate functionTue, 13 Dec 2022

Make a function that returns the value multiplied by 50 and increased by 6. If the value entered is a string it should return "Error".

def math(a): if type(a) == str: return "Error" elif type(a) == int: return (a * 50) + 6 print(math(1)) print(math("Hello")) print(math(2))

Python

generate functionMon, 06 Feb 2023

There is a df_ru Pandas dataframe: date av_temp deviations country year decade 432393 1988-07-01 16.105 0.158 Russia 1988 1980-1990 432429 1991-07-01 16.385 0.243 Russia 1991 1990-2000 432489 1996-07-01 16.032 0.213 Russia 1996 1990-2000 432513 1998-07-01 16.677 0.183 Russia 1998 1990-2000 432525 1999-07-01 16.412 0.228 Russia 1999 1990-2000 432549 2001-07-01 16.539 0.190 Russia 2001 2000-2010 Write a function that will create a new dataframe hot_years and leave in it only those lines where av_temp > 12

def filter_hot_years(df, hot_temp = 12, country = 'Russia'): df_hot = df[(df['av_temp'] > hot_temp) & (df['country'] == country)] return df_hot df_hot_years = filter_hot_years(df_ru, hot_temp = 18)

Python

generate functionThu, 09 Feb 2023

There is 'actors' Pandas dataframe: actorid name date_of_birth birth_city birth_country height_inches biography gender ethnicity networth 1002 1155 Jackie Chan 1954-04-07 Victoria Peak Hong Kong 68.0 Hong Kong's cheeky, lovable and best known fil... Male NaN 400000000.0 1387 69 Keanu Reeves 1964-09-02 Beirut Lebanon 73.0 Keanu Charles Reeves, whose first name means "... Male Lebanese 360000000.0 2252 141 Sean Connery 1930-08-25 Edinburgh UK 74.0 The tall, handsome and muscular Scottish actor... Male White 350000000.0 291 6 Bruce Willis 1955-03-19 Idar-Oberstein West Germany 72.0 Actor and musician Bruce Willis is well known ... Male White 250000000.0 Write a function that sorts all rows in the 'networth' column in descending order and creates a new dataframe with only the first 393 rows remaining

def sort_and_return(df): df = df.sort_values('networth', ascending=False) df = df[0:393] return df

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_by_title(df): pass

Python

generate functionMon, 26 Dec 2022

A matrix matrix is given. matrix = [ [0,1,2,4,8], [6,2,2,1,9], [3,3,3,3,3], [4,6,7,1,2], [5,7,3,4,0] ] Write code (in pure Python, no third-party libraries, no list comprehensions) that will transpose it (that is, make columns out of rows) into matrix_t: matrix_t = [ [0, 6, 3, 4, 5], [1, 2, 3, 6, 7], [2, 2, 3, 7, 3], [4, 1, 3, 1, 4], [8, 9, 3, 2, 0] ]

def transpose(matrix): t = zip(*matrix) return [list(row) for row in t] matrix_t = transpose(matrix)

Python

generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_max_min(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') return df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;')

Python

generate functionThu, 22 Dec 2022

Return a new array consisting of elements which are multiple of their own index in input array (length > 1). Some cases: [22, -6, 32, 82, 9, 25] => [-6, 32, 25] [68, -1, 1, -7, 10, 10] => [-1, 10] [-56,-85,72,-26,-14,76,-27,72,35,-21,-67,87,0,21,59,27,-92,68] => [-85, 72, 0, 68] [22] should equal [-6, 32, 25] [68] should equal [-1, 10] [11] should equal [-11] [-56] should equal [-85, 72, 0, 68] [28] should equal [38, -44, -99] [-1] should equal [-49, 8, -60, 35]

def multiple_of_index(arr): new_arr = [] for i in range(1, len(arr)): if arr[i] % i == 0: new_arr.append(arr[i]) return new_arr

Python

generate functionTue, 10 Jan 2023

Create a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

def create_diag_matrix(n): #create a list of lists first return [[0 for _ in range(n)] for _ in range(n)] def create_diag_matrix_2(n): return [[i*0+1 if i==j else i*0 for i in range(n)] for j in range(n)] def create_diag_matrix_3(n): return [[1 if i==j else 0 for i in range(n)] for j in range(n)]

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 both titles and ratings in this table are not unique, they are duplicated many times Write a function that groups unique movie titles in one column and displays their average rating in the second column

def group_avg_ratings(df): #your code here df = pd.DataFrame({'id':[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9], 'title':['Pulp Fiction (1994)','Three Colors: Red (Trois couleurs: Rouge) (1994)','Three Colors: Blue (Trois couleurs: Bleu) (1993)','Underground (1995)','Singin\' in the Rain (1952)','Dirty Dancing (1987)','Delicatessen (1991)','Ran (1985)','Seventh Seal, The (Sjunde inseglet, Det) (1957)','Bridge on the River Kwai, The (1957)','Pulp Fiction (1994)','Three Colors: Red (Trois couleurs: Rouge) (1994)','Three Colors: Blue (Trois couleurs: Bleu) (1993)','Underground (1995)','Singin\' in the Rain (1952)','

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

def make_df(water, nutri, mineral): water = pd.DataFrame({'treatments': 'water', 'value': water}, index=list(range(len(water)))) nutri = pd.DataFrame({'treatments': 'nutri', 'value': nutri}, index=list(range(len(nutri)))) mineral = pd.DataFrame({'treatments': 'mineral', 'value': mineral}, index=list(range(len(mineral)))) df = pd.concat([water, nutri, mineral]) return df make_df(water, nutri, mineral)

Python

generate functionFri, 23 Dec 2022

Return the Nth Even Number Example(Input --> Output) 1 --> 0 (the first even number is 0) 3 --> 4 (the 3rd even number is 4 (0, 2, 4)) 100 --> 198 1298734 --> 2597466 The input will not be 0.

def nth_even(n): return n * 2 - 2

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] The texts in it are in the df['Message'] variable. And the classification categories are in df['Category']. The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it!

def misclassified_messages(y_test, predicted): misclassified = [] for index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['Message'], 'actual': df.iloc[index]['Category'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) return misclassification_df

Python

generate functionSat, 04 Mar 2023

Correct the code to correctly output the dataframe with this data: gdpdiff = pd.DataFrame({'Top1': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1], 'Top20': df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20]}) gdpdiff

def gdp(df19): Top1=df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1] Top20=df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20] dataset = pd.DataFrame({'Top1': Top1, 'Top20': Top20}) return dataset gdp(df19)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: song performer chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Pandas dataframe is available: song artist chart_debut peak_position worst_position time_on_chart consecutive_weeks 261636 Stupid Cupid Connie Francis 1958-08-02 17 72 12 11.0 46383 Chantilly Lace Big Bopper 1958-08-02 6 40 18 17.0 46384 Chantilly Lace Big Bopper 1958-08-02 6 40 19 18.0 46385 Chantilly Lace Big Bopper 1958-08-02 6 40 20 19.0 46386 Chantilly Lace Big Bopper 1958-08-02 6 40 21 20.0 Write a function that will group all rows by the performer column (so that no duplicates are left), and all corresponding (unique) values to that column are written through the "," sign into a new hits column. All values of both time_on_chart and consecutive_weeks columns must be summed if the performer value matches.

def get_songs_from_artists(df): # do something return df

Python

generate functionWed, 18 Jan 2023

Write a function that calculates the value_counts in the decade pandas column of the dataview if the rows in the rating column = 5.0 if av_rating[av_rating.rating == 5.0]: av_rating['decade'].value_counts() else: pass

def top_decade(av_rating): if av_rating[av_rating.rating == 5.0]: av_rating['decade'].value_counts() else: pass return av_rating top_decade(av_rating)

Python

generate functionMon, 12 Dec 2022

Create a function that returns the CSV representation of a two-dimensional numeric array. Example: input: [[ 0, 1, 2, 3, 4 ], [ 10,11,12,13,14 ], [ 20,21,22,23,24 ], [ 30,31,32,33,34 ]] output: '0,1,2,3,4\n' +'10,11,12,13,14\n' +'20,21,22,23,24\n' +'30,31,32,33,34' Array's length > 2.

def toCsvText(array): return "\n".join(",".join(map(str, a)) for a in array)

Python

generate functionSat, 25 Feb 2023

There is a Pandas dataframe: loan_amnt int_rate 0 5000 10.65% 1 2500 15.27% 2 2400 15.96% 3 10000 13.49% 4 3000 12.69% 5 5000 7.90% 6 7000 15.96% You need to split the values of the int_rate column into categories. Important: The int_rate column is in str data format. Use regex to remove the '%' signs at the end of strings. Then convert the values to float format. Then write a function that creates the rate_group column and adds values to it as follows: 1) if int_rate > 15.00%, rate_group should be '>15' 2) if int_rate is between 10.00% and 15.00%, then rate_group should be '10-15'

import re def rate_group(int_rate): int_rate = re.sub('%', '', int_rate) int_rate = float(int_rate) if int_rate > 15.0: return '>15' elif 10.0 < int_rate <= 15.0: return '10-15' else: return '<10' df['rate_group'] = df['int_rate'].apply(rate_group)

Python

generate functionWed, 01 Feb 2023

There is a Pandas dataframe: performer hits chart_debut time_on_chart consecutive_weeks decade num_of_hits 3428 Glee Cast Somebody To Love, Friday, Loser Like Me, Baby,... 2009 290 47.0 2000-2010 191 8478 Taylor Swift Fifteen, Fearless, London Boy, Teardrops On My... 2008 14299 11880.0 2000-2010 166 2543 Drake Summer Sixteen, The Language, Weston Road Flow... 2016 7449 6441.0 2010-2020 125 10397 YoungBoy Never Broke Again Kacey Talk, Put It On Me, Dirty Iyanna, Lil To... 2020 1012 625.0 2020-2030 75 458 Aretha Franklin Chain Of Fools, My Song, Respect, Until You Co... 1967 3490 2921.0 1960-1970 66 8625 The Beatles Sgt. Pepper's Lonely Hearts Club Band/With A L... 1978 3548 2798.0 1970-1980 66 Write a year_leaders function that will build a new dataframe and leave only the performer and hits lines that have the maximum number of num_of_hits when grouped by the chart_debut column

df_year_leaders = year_leaders(df_performers)

Python

generate functionWed, 18 Jan 2023

Fix this function so that when it finds a value in a Pandas dataframe string, it returns its key: def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): for value in geo_data.values(): if value in row: return key return 'undefined'

def geo_class(row): geo_data = {'center': ['Moscow', 'Tula', 'Yaroslavl'], 'Northwest': ['petersburg', 'pskov', 'murmansk'], 'Far East': ['vladivostok', 'sakhalin', 'khabarovsk']} for key in geo_data.keys(): if row in geo_data[key]: return key return 'undefined'

Python

generate functionWed, 22 Feb 2023

Add the x-axis value captions to the columns fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def add_x_labels(fig, x_values, x_labels): fig.update_xaxes(tickvals=x_values, ticktext=x_labels)

Python

generate functionTue, 21 Mar 2023

There is a Pandas dataframe: water nutri mineral 0 1 1 2 1 2 2 1 2 3 4 1 3 4 6 3 4 2 5 2 5 4 6 4 6 2 7 2 7 4 5 4 8 5 4 5 9 2 5 4 10 3 6 3 11 4 7 2 12 2 4 3 13 1 3 2 14 3 5 3 15 4 5 1 16 3 6 3 17 2 5 4 18 5 4 5 19 1 3 1 Construct a vertical bar graph comparing these values, where the x-axis is index and the y-axis is the value of each column

import pandas as pd import matplotlib.pyplot as plt data = {'water': [1, 2, 3, 4, 2, 4, 2, 4, 5, 2, 3, 4, 2, 1, 3, 4, 3, 2, 5, 1], 'nutri': [1, 2, 4, 6, 5, 6, 7, 5, 4, 5, 6, 7, 4, 3, 5, 5, 6, 5, 4, 3], 'mineral': [2, 1, 1, 3, 2, 4, 2, 4, 5, 4, 3, 2, 3, 2, 3, 1, 3, 4, 5, 1]} data = pd.DataFrame(data, index = range(20)) plt.figure(figsize = (12, 4)) plt.bar(data.index, data['water'], color = 'r', label = 'water') plt.bar(data.index, data['nutri'], color = 'y', label = 'nutri', bottom = data

Python

generate functionMon, 20 Feb 2023

There is a Pandas code: df2.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df2.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') Write a function that combines these two actions into one and returns the changed dataframe

def highlight_max_and_min(df): df.style.highlight_max(axis=1, props='color:white;\ font-weight:bold;\ background-color:green;') df.style.highlight_min(axis=1, props='color:white;\ font-weight:bold;\ background-color:brown;') return df

Python

generate functionWed, 01 Feb 2023

There is a Python and Pandas function. def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.drop_duplicates(subset='chart_debut', keep='first', inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp) Modify it to output all 66 lines of the dataframe

def year_leaders(dfp): dfp_copy = dfp.copy() dfp_copy.sort_values(by=['chart_debut','num_of_hits'], ascending=False, inplace=True) dfp_copy.set_index('chart_debut', inplace=True) return dfp_copy year_leaders(dfp)

Python

generate functionThu, 20 Jul 2023

You apply a convolution module to the data: y=convolve(x,kernel,bias), where x is the input sequence, kernel is the convolution kernel, bias are the shift parameters for each output channel. Write a function calculate_conv_x_grad that finds the value of the derivative of the result given the input diff(y)/diff(x). import sys import ast import numpy as np def parse_array(s): return np.array(ast.literal_eval(s)) def read_array(): return parse_array(sys.stdin.readline()) def write_array(arr): print(repr(arr. tolist())) def calculate_conv_x_grad(x, y, kernel, bias): """ x - InLen x InChannels y - OutLen x OutChannels kernel - OutChannels x InChannels x KernelSize bias - OutChannels returns InLen x InChannels """ x = read_array() y = read_array() kernel = read_array() bias = read_array() result = calculate_conv_x_grad(x, y, kernel, bias) write_array(result) Use the following values for testing and debugging: sample input: [[0.5031766517322117, 0.30744410216949514], [0.04690208449415345, 0.322727131626243], [0.1388690574185909, 0.48576543724022325 ], [0.5260018011862109, 0.5859221562109312] 0.8974007607375208, 0.5713329992292489], [0.378989716528242, 0.49787928388753266]] [[1.5157583762374225, 0.9460413662192456, 0.9802340338281511], [1.5728362445918327, 0.996409724139607, 1.2530013664472253], [1 .9068174476481374, 1.430592927945995, 1.6704630594015581], [2.189768979209843, 2.3149543871163503, 2.1601629609824995], [2.8353 457102707083, 1.7422359297539565, 1.816707087141475], [2.0532913525958474, 1.9924093441385802, 2.3069493556139014]] [[[0.8077620147648772, 0.006392942850116379, 0.6080212915877307], [0.6288229869798402, 0.6410664904844843, 0.75419330562945] ], [[0.5355186530459589, 0.9211024178840701, 0.27725553497982014], [0.4507098181629161, 0.081570594016668, 0.8234980185346139]], [[0.0325944131753374, 0.7744753133142763, 0.05946983249285043], [0.7059580971549311, 0.7969953841197822 , 0.5257810951530107]]] [0.2579976950685653, 0.029957050945287222, 0.18958928880952108] sample output: [[1.3758750809861735, 1.7854909022976875], [3.0778457550346365, 3.305123370918622], [4.022592414095037, 5.4085957902356965], [4. 022592414095037, 5.4085957902356965], [4.022592414095037, 5.4085957902356965], [4.022592414095037, 5.4085957902356965], [2.646717 333108864, 3.623104887938009], [0.9447466590604012, 2.1034724193170744]]

import sys import ast import numpy as np def parse_array(s): return np.array(ast.literal_eval(s)) def read_array(): return parse_array(sys.stdin.readline()) def write_array(arr): print(repr(arr. tolist())) def calculate_conv_x_grad(x, y, kernel, bias): """ x - InLen x InChannels y - OutLen x OutChannels kernel - OutChannels x InChannels x KernelSize bias - OutChannels returns InLen x InChannels """ x = read_array() y = read_array() kernel = read_array() bias = read_array() result = calculate_conv_x_grad(x, y, kernel, bias) write_array(result)

Python

generate functionMon, 26 Dec 2022

There are 2 functions. One looks for duplicate values and makes a dictionary. The second changes the duplicates to the desired value. The first function (find_duplicates) works correctly. But the second function (change_duplicates) is incomplete. Finish the code so that the duplicate student number changes to the one in brackets in the change_duplicates function. def find_duplicates(lst): stnums = [] cnt = {} for el in lst: stnums.append(el[0]) for i in stnums: if i in cnt: cnt[i] += 1 else: cnt[i] = 1 return cnt def change_duplicates(lst, replacing_num): stnums = [] duplicates = find_duplicates(lst) for k, v in duplicates.items(): if v > 1: stnums = [k] for student in lst: if stnums in lst: #need right code#

# Python ## Python Basics ### Basic Syntax Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. The official Python website is www.python.org. There you will find the latest Python documentation. There are also mailing lists, newsgroups, and a number of other resources available at the site. ### Running Python The Python interpreter is usually installed as /usr/local/bin/python3.7 on those machines where it is available; putting /usr/local/bin in your Unix shell’s search path makes it possible to start it by typing the command:

Python

generate functionTue, 21 Mar 2023

There are 3 lists: water, nutri and mineral. They differ in length: print(len(water)) - 20 print(len(nutri)) -21 print(len(mineral)) -21 add them to df Pandas dataframe in the following format: index treatments value 0 0 water 25 1 1 water 30 2 2 water 28 3 3 nutri 36 4 4 nutri 29 5 0 nutri 45 6 1 mineral 55 7 2 mineral 29 8 3 mineral 56

import pandas as pd water = [25,30,28,24,34] nutri = [36,29,45,23,30,39,28] mineral = [55,29,56,20] df = pd.DataFrame() df['index'] = list(range(len(water)+len(nutri)+len(mineral))) df['treatment'] = list(water) + list(nutri) + list(mineral) df['treatment'] = df['treatment'].replace(water, "water") df['treatment'] = df['treatment'].replace(nutri, "nutri") df['treatment'] = df['treatment'].replace(mineral, "mineral") df

Python

generate functionFri, 03 Feb 2023

generate a unique id for each line of the pandas dataframe

def unique_id(df, col1, col2, col3): return df[col1]+df[col2]+df[col3]

Python

generate functionWed, 15 Mar 2023

Write code that brings all the values of sl and sw variables to the same scale using the StandardScaler library from the sklearn.preprocessing library: sl = data['sepal_length'] sw = data['sepal_width']

from sklearn.preprocessing import StandardScaler st_scaler = StandardScaler() sl = np.array(sl).reshape(-1, 1) sw = np.array(sw).reshape(-1, 1) sl = st_scaler.fit_transform(sl) sw = st_scaler.fit_transform(sw) print(sl) print(sw)

Python

generate functionTue, 13 Dec 2022

a list is given arr = [2, 1, 10, 5] Write a function with a for list that sums all the numbers in the list with each other ((2+1)+(2+10)+(2+5)+(1+10)+(1+5)+(10+5) ) The result should be this: res = 54

def res(arr): suma = 0 for i in range(len(arr)): for j in range(i+1, len(arr)): suma += arr[i] + arr[j] return suma res(arr)

Python

generate functionWed, 08 Mar 2023

Write code that adds money +=1 on any number from the list of winnums winnums = [777, 999, 555, 333, 111, 177, 277, 377, 477, 577, 677, 877, 977, 100, 200, 300, 400, 500, 600, 700, 800, 900, 110, 120, 130, 140, 150, 160, 170, 180, 190, 210, 220, 230, 240, 250, 260, 270, 280, 290, 310, 320, 330, 340, 350, 360, 370, 380, 390, 410, 420, 430, 440, 450, 460, 470, 480, 490, 510, 520, 530, 540, 550, 560, 570, 580, 590, 610, 620, 630, 640, 650, 660, 670, 680, 690, 710, 720, 730, 740, 750, 760, 770, 780, 790, 810, 820, 830, 840, 850, 860, 870, 880, 890, 910, 920, 930, 940, 950, 960, 970, 980, 990, 107, 117, 127, 137, 147, 157, 167, 177, 187, 197, 207, 217, 227, 237, 247, 257, 267, 277, 287, 297, 307, 317, 327, 337, 347, 357, 367, 377, 387, 397, 407, 417, 427, 437, 447, 457, 467, 477, 487, 497, 507, 517, 527, 537, 547, 557, 567, 577, 587, 597, 607, 617, 627, 637, 647, 657, 667, 677, 687, 697, 707, 717, 727, 737, 747, 757, 767, 787, 797, 807, 817, 827, 837, 847, 857, 867, 877, 887, 897, 907, 917, 927, 937, 947, 957, 967, 977, 987, 997, 000, 007, 017, 027, 037, 047, 057, 067, 077, 087, 097, 010, 020, 030, 040, 050, 060, 070, 080, 090]

def add_money(winnums): for i, char in enumerate(winnums): winnums[i] += 1 return winnums add_money(winnums)

Python

generate functionSat, 28 Jan 2023

There is seaborn graph code sns.factorplot(x='date', y='rating', col='hotel', data=hotels_rating, col_wrap=3, kind='bar') plt.suptitle('hotel rating change',size=16) plt.subplots_adjust(top=.925) Limit the height of the columns on the y-axis to 3

sns.factorplot(x='date', y='rating', col='hotel', data=hotels_rating, col_wrap=3, kind='bar', size=4, aspect=.9) plt.suptitle('hotel rating change',size=16) plt.subplots_adjust(top=.9)

Python

generate functionTue, 10 Jan 2023

import numpy as np def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return *** print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15)) Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

def sum_matrix(N): my_matrix = np.diag(np.arange(N-1, -1, -1), k=0) return np.sum(my_matrix) print(sum_matrix(5)) print(sum_matrix(10)) print(sum_matrix(15))

Python

generate functionSun, 25 Dec 2022

My washing machine uses water amount of water to wash load (in JavaScript and Python) or max_load (in Ruby) amount of clothes. You are given a clothes amount of clothes to wash. For each single item of clothes above the load, the washing machine will use 10% more water (multiplicative) to clean. For example, if the load is 10, the amount of water it requires is 5 and the amount of clothes to wash is 14, then you need 5 * 1.1 ^ (14 - 10) amount of water. Write a function def how_much_water(L,X,N) to work out how much water is needed if you have a clothes amount of clothes. The function will accept 3 arguments: - water, load (or max_loadin Ruby) and clothes.

def how_much_water(max_load, load, clothes): return max_load * 1.1 ** (clothes - load)

Python

generate functionFri, 17 Mar 2023

There are two lists with numbers: sync = [85.1 83.8 69.9 82.1 84.4 80.4 78.1 88.4 77. 91.5 76.7 86.6 91.8 73.3 83.9 76.7 85.8 89.6 91.7 87.2 79. 85.3] asyncr = [89.8 81.6 87.4 81. 66.9 72.5 78.4 68.5 78.3 62.6 73.7 77.7 63. 77.5] Write code that calculates the similarity statistics of these samples, p-value

from scipy.stats import ttest_ind sync = [85.1, 83.8, 69.9, 82.1, 84.4, 80.4, 78.1, 88.4, 77. , 91.5, 76.7, 86.6, 91.8, 73.3, 83.9, 76.7, 85.8, 89.6, 91.7, 87.2, 79. , 85.3] asyncr = [89.8, 81.6, 87.4, 81., 66.9, 72.5, 78.4, 68.5, 78.3, 62.6, 73.7, 77.7, 63., 77.5] ttest_ind(sync, asyncr)

Python

generate functionMon, 27 Mar 2023

The code below gives a ValueError: Input variables with an inconsistent number of samples were found: [8082, 5572]. Fix it! from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(tfidf, cats, test_size=0.3, random_state=42)

Python

generate functionSun, 11 Dec 2022

Write a function that multiplies each dictionary value by portions portions = 5 grams = 0 key = 'salad' cook_book = {'salad': [{'ingridient_name': 'cheese', 'quantity': 50, 'measure': 'gr'}, {'ingridient_name': 'tomatoes', 'quantity': 2, 'measure': 'pct'}, {'ingridient_name': 'pepper', 'quantity': 20, 'measure': 'гр'}]} if key == 'salad': grams = 'quantity' * portions print(grams) The result should be: Cheese: 210 g Tomatoes: 6 pcs. Cucumbers: 60g Olives: 30 g Olive oil: 60 ml Lettuce: 30 gr Pepper: 60 gr

def get_shop_list_by_dishes(dishes, person_count): shop_list = {} for dish in dishes: for ingridient in cook_book[dish]: new_shop_list_item = dict(ingridient) new_shop_list_item['quantity'] *= person_count if new_shop_list_item['ingridient_name'] not in shop_list: shop_list[new_shop_list_item['ingridient_name']] = new_shop_list_item else: shop_list[new_shop_list_item['ingridient_name']]['quantity'] +=\ new_shop_list_item['quantity'] return shop_list def print_shop_list(shop_list): for shop_list_item in shop_list.values(): print('{} {} {}'.format(shop_list_item['ingridient_name'], shop_list_item['quantity'], shop

Python

generate functionSat, 25 Feb 2023

Write code that finds all lines in the Pandas dataframe without the "%" sign and replace all their content with NaN

import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/ufo.csv') df.head() df.shape df.info() df1 = df[df.City != '%'] df1.shape df1.head()

Python

generate functionTue, 10 Jan 2023

Create in Numpy a diagonal matrix with elements from N to 0. Count the sum of its values on the diagonal.

import numpy as np def diagonal_matrix(n): arr = np.arange(1, n+1) return np.diag(arr) print(diagonal_matrix(4))

Python

generate functionTue, 21 Feb 2023

There is a Pandas dataframe: Last Updated Installs 7479 2010-05-21 100000 7430 2011-01-30 50000 10282 2011-03-16 100000 8418 2011-04-11 5000000 8084 2011-04-16 50000 9067 2011-04-18 50000 5144 2011-05-12 100 7237 2011-06-23 1000 10460 2011-06-26 5000 1544 2011-06-29 1000000 7080 2011-07-10 5000000 8200 2011-09-20 50000 5561 2011-09-22 1000000 Write a function that will create a 'year' column, in which the first 4 characters of the 'Last Updated' column will be added

def year(df): df["year"] = df["Last Updated"].apply(lambda x: x[:4]) return df

Python

generate functionTue, 21 Mar 2023

You are given test results for two drugs: A (results for the group of people getting drug A) and B (results for the other group of people getting drug B). 1) You need to assess whether there is a statistically significant difference in the effect in these patients. Use the null hypothesis, the inverse hypothesis, and Student's t-test for calculations. 2) construct a boxplot and histogram to visually confirm the result

import numpy as np import scipy.stats as st import seaborn as sns import matplotlib.pyplot as plt def difference_of_means(data_1, data_2): """Difference in means of two arrays.""" # The difference of means of data_1, data_2: diff diff = np.mean(data_1) - np.mean(data_2) return diff def permutation_sample(data1, data2): """Generate a permutation sample from two data sets.""" # Concatenate the data sets: data data = np.concatenate((data1, data2)) # Permute the concatenated array: permuted_data permuted_data = np.random.permutation(data) # Split the permuted array into two: perm_sample_1, perm_sample_2 perm_sample_1 = permuted_data[:len(data1)] perm_sample_2 = permuted_data[

Python

generate functionTue, 28 Mar 2023

There is a dataframe "df" containing data about texts in the following format: Category Message Length Words 0 ham Go until jurong point, crazy... Available only ... 111 [go, jurong, point, crazy, available, bugis, g... 1 ham Ok lar... Joking wif u oni... 29 [ok, lar, joking, wif, oni] 2 spam Free entry in 2 a wkly comp to win FA Cup fina... 155 [free, entry, wkly, comp, win, fa, cup, final,... The programmer has written code which should contain all the source messages classified incorrectly (specifying the actual and predicted ones). However, this code does not work and generates an error: KeyError. Here is this code: misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified) Fix it

misclassified = [] For index, label_predicted in enumerate(predicted): if label_predicted != y_test[index]: misclassified.append({'message': df.iloc[index]['message'], 'actual': df.iloc[index]['label'], 'predicted': label_predicted}) misclassification_df = pd.DataFrame(misclassified)

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def get_students(gr, dct=dct): students = [] for k, v in dct.items(): if gr == v[4]: students.append(v[0] + ' ' + v[1] + ' ' + v[2]) return sorted(students) get_students("BST161")

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

def class_delete(df, Class1, Class2): df = df.loc[df['Class 1'] != Class1] df = df.loc[df['Class 2'] != Class2] return df df = class_delete(df, 'Rock', 'Rock')

Python

generate functionMon, 26 Dec 2022

There are two lists. The first (stnums) contains rows that are considered duplicates. The second list is a list of students. Write a function that replaces the values in the second list that match the first (but only the 2nd, 3rd value, and so on, not the first one). stnums = ['4004'] students = [ ['0001', 'Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], ["1102", "Bogov", "Artem", "Igorevich", "25.01.2010", "BST162"] ["0333", "Glagoleva", "Anastasiya", "Nikolaevna", "11.07.2009", "BST163"] ["4004", "Stepanova", "Natalia", "Aleksandrovna", "13.02.2008", "BST161"] ["0045", "Bokov", "Igor", "Kharitonovich", "02.06.2009", "BST161"], ["0096", "Vasil'kov", "Valentin", "Sergeevich", "20.03.2009", "BST164"], ["0607", "Siropova", "Violetta", "Eduardovna", "28.05.2010", "BST162"], ["4004", "Potapov", "Dmitry", "Stanislavovich", "14.02.2012", "BST161"] ]

def replace(stnums, students): for i in stnums: for j in students: if i == j[0]: students.remove(j) return students

Python

generate functionSun, 25 Dec 2022

Point objects have x and y attributes (X and Y in C#) attributes. Write a function calculating distance between Point a and Point b. Tests round answers to 6 decimal places.

def distance(a, b): return (a.x - b.x)**2 + (a.y - b.y)**2

Python

generate functionTue, 21 Mar 2023

There is a "grass" Pandas dataframe: pid Name Class 1 Class 2 HP Attack Defense Sp. Atk Sp. Def Speed Legendary 0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 False 1 2 Ivysaur Grass Poison 60 62 63 80 80 60 False 2 3 Venusaur Grass Poison 80 82 83 100 100 80 False 3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 False 48 49 Oddish Grass Poison 45 50 55 75 65 30 False Write a function that will search for 'Rock' values in the Class 1 and Class 2 columns. And if it finds such values it will delete these rows.

import pandas as pd data = pd.read_csv('pokemon.csv') def delete(data): for i in range(len(data)): if data['Class 1'][i] == 'Rock' or data['Class 2'][i] == 'Rock': data = data.drop([i]) return data delete(data)

Python

generate functionThu, 19 Jan 2023

There is a Pandas dataframe: place decade perc_of_5star 0 1 2010-2020 2.300 1 2 1900-1910 1.379 2 3 1970-1980 1.179 3 4 2000-2010 1.176 4 5 1960-1970 1.133 build a horizontal barchart with perc_of_5star columns in descending order and decade values on the y-axis

import pandas as pd import matplotlib.pyplot as plt import numpy as np df = pd.DataFrame({'place': [1, 2, 3, 4, 5], 'decade': ['2010-2020','1900-1910','1970-1980','2000-2010','1960-1970'], 'perc_of_5star': [2.300, 1.379, 1.179, 1.176, 1.133]}) plt.barh(df['decade'], df['perc_of_5star']) plt.xlabel('Percentage of 5 star reviews') plt.ylabel('Decades') plt.title('Percentage of 5 star reviews by decade') plt.show()

Python

generate functionThu, 23 Mar 2023

Write code that uses import os, import glob or other libraries to write all the text files at this url into one variable book_files book_files = ('http://stup.pro/wp-content/uploads/2023/03/*.txt') book_files

import os def read_files_in_directory(directory): for filename in os.listdir(directory): if not filename.endswith('.txt'): continue with open(filename, 'r') as f: lines = [l.strip() for l in f.readlines()] print(lines) return lines read_files_in_directory(r'C:\Users\USER\Desktop\Python\Python Basics')

Python

generate functionThu, 15 Dec 2022

Create a function that checks if there is a docnum value in the list with the dictionary, and if there is, deletes it, otherwise print Document not found docnum = '10006' documents = [ {'type': 'passport', 'number': '2207 876234', 'name': 'Василий Гупкин'}, {'type': 'invoice', 'number': '11-2', 'name': 'Геннадий Покемонов'}, {'type': 'insurance', 'number': '10006', 'name': 'Аристарх Павлов'} ]

def del_doc_by_num(doc_num): for document in documents: if document['number'] == doc_num: documents.remove(document) return print('Document not found') del_doc_by_num('10006')

Python

generate functionWed, 21 Dec 2022

Rewrite this function without lambda max_key = max(dct, key=lambda key: dct[key]['Value'])

def max_key(dct, key): return max(key) max_key('key')

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that converts dataframe columns to strings

def df_to_string(df): df.columns = df.columns.astype(str)

Python

generate functionMon, 26 Dec 2022

Write a function that takes a group number and outputs a numbered and alphabetically ordered list (surnames, first names, patronymics) of all students in that group from the dictionary dct. dct = {'0001': ['Antonov', 'Anton', 'Igorevich', '20.08.2009', 'BST161'], '1102': ['Bogov', 'Artem', 'Igorvich', '25.01.2010', 'BST162'], '0333': ['Glagoleva', 'Anastasia', 'Nikolaevna', '11.07.2009', 'BST163'], '4004': ['Potapov', 'Dmitry', 'Stanislavovich', '14.02.2012', 'BST161'], '0045': ['Bokov', 'Igor', 'Kharitonovich', '02.06.2009', 'BST161'], '0096': ['Vasilkov', 'Valentin', 'Sergeevich', '20.03.2009', 'BST164'], '0607': ['Siropova', 'Violetta', 'Eduardovna', '28.05.2010', 'BST162']}

def student_list(gr): lst = [] for i in dct.keys(): if gr == dct[i][-1]: lst.append(dct[i][:3]) lst.sort(key=lambda x: x[0]) lst.sort(key=lambda x: x[1]) return lst

Python

generate functionSat, 04 Mar 2023

Construct a line graph using the data from this dataframe, where the x-axis is years and the y-axis is the columns hapscore = pd.DataFrame({'2015': [df15['Happiness Score'].mean()], '2016': [df16['Happiness Score'].mean()], '2017': [df17['Happiness.Score'].mean()], '2018': [df18['Score'].mean()], '2019': [df19['Score'].mean()]}) 2015 2016 2017 2018 2019 0 5.375734 5.382185 5.354019 5.375917 5.407096

import matplotlib.pyplot as plt def happinessgraph(): hapscore.plot(kind='line') plt.xlabel("Years") plt.ylabel("Happiness Score") plt.title("Graph of Happiness Score over the Years") plt.show()

Python

generate functionFri, 24 Feb 2023

There is a Pandas dataframe: spi_rank country spi_score basic_human_needs foundations_of_wellbeing opportunity basic_nutri_med_care water_sanitation shelter personal_safety access_to_knowledge access_to_communications health_wellness environmental_quality personal_rights personal_freedom inclusiveness access_to_advanced_education 61 62.0 Russia 73.45 83.0 79.4 57.94 95.49 97.04 88.56 50.92 94.34 77.42 64.7 81.16 51.79 70.26 30.31 79.4 Write a function that finds the mean values of each column and transposes them to rows

def mean_values(df): return df.mean().to_frame().T

Python

generate functionWed, 18 Jan 2023

There is a Pandas dataframe: id title rating 0 Pulp Fiction (1994) 5.0 1 Three Colors: Red (Trois couleurs: Rouge) (1994) 3.5 2 Three Colors: Blue (Trois couleurs: Bleu) (1993) 5.0 3 Underground (1995) 5.0 4 Singin' in the Rain (1952) 3.5 5 Dirty Dancing (1987) 4.0 6 Delicatessen (1991) 3.5 7 Ran (1985) 3.5 8 Seventh Seal, The (Sjunde inseglet, Det) (1957) 5.0 9 Bridge on the River Kwai, The (1957) 4.0 in this dataframe the names of films are not unique, because the same film may have been given different ratings Your task is to write a function that calculates the average rating of each movie and removes non-unique strings

pd.DataFrame({'id': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'title': ['Pulp Fiction (1994)', 'Three Colors: Red (Trois couleurs: Rouge) (1994)', 'Three Colors: Blue (Trois couleurs: Bleu) (1993)', 'Underground (1995)', "Singin' in the Rain (1952)", 'Dirty Dancing (1987)', 'Delicatessen (1991)', 'Ran (1985)', 'Seventh Seal, The (Sjunde inseglet, Det) (1957)', 'Bridge on the River Kwai, The (1957)'], 'rating': [5.0, 3.5, 5.0, 5.0, 3.5, 4.0, 3.5, 3.5, 5.0, 4.0]})

Python

generate functionTue, 24 Jan 2023

There is a Pandas dataframe: userId movieId rating timestamp 0 1 31 2.5 1260759144 1 1 1029 3.0 1260759179 2 1 1061 3.0 1260759182 3 1 1129 2.0 1260759185 4 1 1172 4.0 1260759205 leave in this dataframe only users who have rated 100 or more in the rating column

df_filtered = df.groupby('userId').filter(lambda x: len(x) >= 100)

Python

generate functionTue, 21 Mar 2023

There are two samples A and B. Draw boxplots comparing these two samples with a light blue filler and a caption for the names of the samples

def box_plots(A, B): fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(111) ax.boxplot([A, B], labels=['A', 'B'], patch_artist=True, boxprops=dict(facecolor='lightblue', color='black', linewidth=1), medianprops=dict(color='black')) plt.show() box_plots(A, B)

Python

generate functionFri, 03 Feb 2023

There is a df_new Pandas dataframe: year date route operator group_name bus_garage district injury_result incident_type victim_category victim_sex victim_age 0 2015 01.01.2015 1 London General Go-Ahead Garage Not Available Southwark Injuries treated on scene Onboard Injuries Passenger Male Child 1 2015 01.01.2015 4 Metroline Metroline Garage Not Available Islington Injuries treated on scene Onboard Injuries Passenger Male Unknown 2 2015 01.01.2015 5 East London Stagecoach Garage Not Available Havering Taken to Hospital – Reported Serious Injury or... Onboard Injuries Passenger Male Elderly Calculate the average number of strings for all "operator"

df_new["operator"].str.len().mean()

Python

generate functionSat, 04 Mar 2023

Merge all these 3 dataframes into one dataframe: df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[0:1] df19[['Country or region', 'GDP per capita']]\ .sort_values(by='GDP per capita', ascending=False).head(20)[19:20] rus[['Country or region', 'GDP per capita']]

def merge(df1, df2, df3): return pd.concat([df1, df2, df3], axis=0)

Python

generate functionMon, 13 Mar 2023

There is a logistic linear discriminant model trained using the following formula: from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(data, data_class, random_state=42) from sklearn.discriminant_analysis import LinearDiscriminantAnalysis lda.fit(X_train, y_train) Write code to evaluate its quality with F1 measure

from sklearn.metrics import f1_score y_pred = lda.predict(X_test) f1_score(y_test, y_pred)

Python

generate functionMon, 27 Mar 2023

There are variables X_train, X_test, y_train, y_test created through sklearn train_test_split. They contain text data in string format. Convert them to float numbers.

def str_to_float(train, test): train = [float(x) for x in train] test = [float(x) for x in test] return train, test X_train, X_test = str_to_float(X_train, X_test) y_train, y_test = str_to_float(y_train, y_test)

Python

generate functionWed, 22 Feb 2023

Change the code below so that the columns have values on the x-axis question6 = 'What is the most preferred working environment for you.' question6 = df[question6].value_counts() label = question6.index counts = question6.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text='Какая рабочая среда для вас наиболее предпочтительна?') fig.show()

def plot_bar(df, question, title_text, xlabel, ylabel, orientation='h'): question = df[question].value_counts() label = question.index counts = question.values fig = px.bar(x=counts, y=label, orientation='h') fig.update_layout(title_text=title_text) fig.show() df = pd.read_csv('https://raw.githubusercontent.com/GODKarma/Data-Analytics-2020/master/Data/survey_results_public.csv', index_col='Respondent') plot_bar(df, 'LanguageWorkedWith', 'С какими языками программирования вы работали?', 'Количество', 'Языки')

Python

generate functionMon, 06 Feb 2023

There is a df Pandas dataframe: date av_temp deviations country year decade 577457 2013-05-01 19.059 1.022 Zimbabwe 2013 2010-2020 577458 2013-06-01 17.613 0.473 Zimbabwe 2013 2010-2020 577459 2013-07-01 17.000 0.453 Zimbabwe 2013 2010-2020 577460 2013-08-01 19.759 0.717 Zimbabwe 2013 2010-2020 Write a function that: 1) leave only the rows in the year column with a value higher than 1980 2) calculates the average temperature of the countries (the average of the av_temp column) 3) builds a list of the 20 coldest countries in ascending av_temp order

def calc_av_temp(df): df = df[df['year']>1980] df = df.groupby('country').agg({'av_temp': np.mean}) return df.sort_values('av_temp').head(20)

Python

Experience Our AI Studio

Access all features through one powerful interface — lightning-fast, incredibly intuitive, and designed for productivity.

Smart workflows

Real-time collaboration

Customizable workspace

One-click exports

Feature Preview

Generate

More than just a code generator. A tool that helps you with a wide range of tasks. All in one place.

Function from Description: Generate a function just by describing what is needs to do. Choose of many programming languages.
Text Description to SQL Command: Create a SQL command from a description.
Translate Languages: Translate code to any programming language.
Generate HTML from Description: Generate small HTML from Description.
Code to Explanation: Any code explained in plain english.
Fix invalid Code: To spot a missing character somewhere can be frustrating. This feature will help you to fix it.
Get Test for Code: This feature will help you to create tests for your code.
Class from Description: Generate a class just by describing what is needs to do. Choose of many programming languages.
Regex from Description: Create a regex from a description like "check for email".
Regex to Explanation: Create a plain english explanation from a regex.
Git Command from Description: Find the Git Command you are looking for from a description.
Linux Command: Get the linux command from a description.
Function from Docstring: Provide a docstring to generate the actual function.
Add typing to code: This feature will help you to add typing to your code.
Get Language from Code: Get the programming language from a code.
Time complexity: Get time complexity.
CSS from Description: Generate CSS from Description.
Meta Tags from Description: Generate Meta Tags from a Description.

Want to kickstart your project?Use the new AI Studio to create your code

Get chatting now