Generation

generate functionFri, 28 Apr 2023

Export all of the Japanese characters in a html file to a text file

from bs4 import BeautifulSoup import requests import re r = requests.get('http://www.aozora.gr.jp/cards/000148/files/773_14547.html') soup = BeautifulSoup(r.content, 'lxml') def get_text(soup): text_list = [] for text in soup.find_all('p'): text_list.append(text.get_text()) return text_list with open('text.txt', 'w') as f: for text in get_text(soup): f.write(text + '\n')

Questions about programming?Chat with your personal AI assistant