Scraping PDF text with Python
If you want to extract text from a PDF with Python, there is a library called PDFMiner (beware: does not work in Python 3). This example will walk a directory structure, look for PDFs, and make a “.txt” file next to the PDF with a text rendition. import sys from pdfminer.pdfparser import PDFDocument, PDFParser from …