All generations.

generate functionTue, 02 May 2023

Extracting text from HTML file

import urllib.request from bs4 import BeautifulSoup url = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India" html = urllib.request.urlopen(url) soup = BeautifulSoup(html, 'html.parser') for tr in soup.find_all('table'): for td in tr.find_all('td'): text = td.find(text=True) print(text)

Python
View
generate functionThu, 11 May 2023

read a txt file into a list and replace \n to space

def readIntoList(): with open('C:\Users\xxx\Desktop\xxx\xxx.txt','r') as f: li = [] line = f.readline() while line: #print(line) line = line.replace('\n','') line = line.replace(' ',' ') line = line.replace(' ',' ') li.append(line) line = f.readline() f.close() return li

Python
View
generate functionFri, 28 Apr 2023

Find all the Japanese characters in the html file

import re def find_japanese(file): japanese_regex = re.compile(r'[\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf]') text = open(file) text = text.read() text = text.strip().split() japanese = [] for i in text: if japanese_regex.search(i): japanese.append(i) return japanese find_japanese('html_file.txt')

Python
View
generate functionTue, 02 May 2023

use BeautifulSoup import a html file

from bs4 import BeautifulSoup import urllib.request def get_soup(url): page = urllib.request.urlopen(url).read() soup = BeautifulSoup(page, "html.parser") return soup soup = get_soup("https://www.pythonforbeginners.com/beautifulsoup/beautifulsoup-4-python")

Python
View
generate functionFri, 28 Apr 2023

Export all of the Japanese characters in a html file to a text file

from bs4 import BeautifulSoup import requests import re r = requests.get('http://www.aozora.gr.jp/cards/000148/files/773_14547.html') soup = BeautifulSoup(r.content, 'lxml') def get_text(soup): text_list = [] for text in soup.find_all('p'): text_list.append(text.get_text()) return text_list with open('text.txt', 'w') as f: for text in get_text(soup): f.write(text + '\n')

Python
View
generate functionTue, 02 May 2023

Export all of the japan characters in the html file into a text file

const fs = require('fs'); const path = require('path'); const html = fs.readFileSync(path.join(__dirname, 'japan.html'), 'utf8'); const puppeteer = require('puppeteer'); const cheerio = require('cheerio'); const $ = cheerio.load(html); const japanCharCodes = [0x3040, 0x3097]; const japanChars = []; japanCharCodes.forEach(japanCharCode => { for (let i = 0; i < 96; i++) { japanChars.push(String.fromCodePoint(japanCharCode + i)); } }); const browser = await puppeteer.launch({ headless: false, }); const page = await browser.newPage(); await page.goto('http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml'); await page.on('console', async msg => {

TypeScript
View

Questions about programming?Chat with your personal AI assistant