石井景都

Generation

generate functionMon, 17 Apr 2023

PDFファイルの文字画像だけを切り取り。

using (var pdfDocument = PdfReader.Open("input.pdf", PdfDocumentOpenMode.ReadOnly)) { for (var i = 0; i < pdfDocument.PageCount; i++) { var page = pdfDocument.Pages[i]; var images = page.GetImages() .Where(x => x.ColorSpace.Name == "DeviceGray" || x.ColorSpace.Name == "DeviceCMYK"); foreach (var image in images) { using (var pngStream = new MemoryStream()) { var png = Image.GetInstance(image); png.Save(pngStream, ImageFormat.Png); var content = new Content() { FileName = $"{i}_{image.GetHashCode()}.png", ContentType = "image/png", Content = pngStream.ToArray() }; yield return content; } } } }

Questions about programming?Chat with your personal AI assistant