Generation

generate functionFri, 12 May 2023

Find all the Chinese characters in a file mixed with Chinese, Japanese and English using langid

def find_chinese(filename): text = open(filename, "r").read() text = re.sub("[\s+\.\!\/_,$%^*(+\"\'“”《》?“]+|[+——!,。?、~@#¥%……&*()]+", "",text) result = "" for i in text: if langid.classify(i)[0]=="zh": result += i return result

Questions about programming?Chat with your personal AI assistant