The Wolfe Pack

If you have to deal with those pesky PDF files, pdftohtml from the poppler package and w3m are all you need.

pdftohtml -i -s -stdout filename.pdf | w3m -T text/html

Make it into a convenient function by adding it to your .bashrc.

pdf()
{
if [ $# -ne 1 ]; then
    echo "Usage: pdf filename."
else
    pdftohtml -i -s -stdout $1 | w3m -T text/html
fi
}

@storm I have found the conversions from these to be ugly at best, and often worse than useless. Converting pdf, even properly tagged pdf, is still a dicey business.

@dgoodmaniii most actual pdf software is not accessible with orca, and even the few that are can be a pain to use. This method at least gets the text in a format that is usable. I figured the formatting wouldn't be exactly the same as the original, but I didn't realize it would be terrible. In my case, however, the layout isn't usually that important.

@storm Fair enough; as long as the pdf has text in it (many scans don't), you will get something readable. I spend a *lot* of time on document conversion, and find the whole process very frustrating with pdf.

The mutools bundle is also worth a look.

@dgoodmaniii oh, no doubt about it, pdf is the root of all evil. It's like plain text came out and it was good, so companies immediately set about finding the worst thing possible lol. Now they all use it, no plain text instruction manuals to be found ever. I will look into mutools, thanks for the suggestion.
replies
0
announces
0
likes
0

I found something called ocrmypdf that seems to do pretty well when you can't get a text conversion. Too many options though, with no simple way to just open up a document and try to extract the text from it in different ways. Checkboxes would be rather nice here, but I just use the sidecar output usually.

@kyle ocrdesktop can also do it. The new version is quite nice for viewing pdfs. My little trick is good for when you are in a hurry and don't want to switch to an X session just to read some text. Although, now that I think about it, ocrdesktop may be able to do it and just pipe the contents to a file. The other good part about my trick is links in the generated document actually work. You can read with w3m and just hit enter on the section name to jump right to it.