PDFs

AlpacaChariot@lemmy.world · 2 months ago

Was it that the PDF produced by latex was less OCR friendly than the word one, or just that you didn’t submit the PDF at all most of the time?

I guess if you trained a program to OCR PDFs that are produced by word it might get really good at that and less good at PDFs from other sources.

I’m curious if your CV font was computer modern?

just_an_average_joe@lemmy.dbzer0.com · 2 months ago

I think OCRs are really good nowadays but i think old ATS systems don’t use them or at least use old OCR. If you parse a pdf (without OCR) a word exported pdf preserve the text order much better than a latex ones.

Like i actually tried some websites and python libraries to extract the text from my latex pdf, none of them gave good results like words inside pdf would be out of order.

If i use ocr then I get good coherent text. Which is really important for ATS but I doubt people use OCRs cuz they are kinda expensive or maybe people just use old ATS systems etc