Post by multicentric technology sdn bhd » Sat Mar 15, 2014 6:18 pm
I have a large PDF file (over 300 MB) with 251 pages. Is there any Tracker Software Products that can help me reduce the size of this file?
Site Admin Posts: 18551 Joined: Mon Jan 12, 2009 8:07 am Contact:Post by Tracker Supp-Stefan » Mon Mar 17, 2014 10:30 am
Hello multicentric technology sdn bhd,
You can try to "reprint" the file which I presume is image based through our printing drivers - and this will reduce the image quality slightly but also the file size.
Another option is to e.g. OCR the file, then make the OCR layer visible and remove the images underneath. This will result in huge reduction of the file size, but unfortunately you can not automate the process for now, and will need to perform it manually.
User Posts: 4 Joined: Mon Nov 11, 2013 7:35 amPost by multicentric technology sdn bhd » Mon Mar 17, 2014 11:39 am
Stefan,
The PDF document is text based, so I have to print it out first with embedded fonts converted to curves. I then OCR the output file.
I cannot print as image as I need to edit the document content.
I find that I cannot use the XChange Editor for printing as it will include printer markers on the border, so I have to use the XChange viewer.
The document is in Malay language, so I OCR using the nearest language - Indonesian. The OCR recognizes the characters but not most of the words, i.e. a lot of extract spaces.
Site Admin Posts: 18551 Joined: Mon Jan 12, 2009 8:07 am Contact:Post by Tracker Supp-Stefan » Mon Mar 17, 2014 12:12 pm
If the document is text based already, then little to no optimization can be offered really. I suspect that the fonts are embedded in the document and that's making it so big, but if you do not include the fonts inside - you will be unable to guarantee that the file will look the same at the recipient's end.
Actually if that's a collection of similar pages that were before individual files - please try running the file through our PDF Tools optimization process. It normally reduces file sizes with just a few percent, but in some extreme cases when the same info is repeated multiple times - it can bring significant results.
Can you please also provide a few pages sample extract from this document?
User Posts: 151 Joined: Sun Apr 06, 2008 7:05 pmPost by guebert » Tue Mar 25, 2014 9:53 am
Tracker Supp-Stefan wrote: Another option is to e.g. OCR the file, then make the OCR layer visible and remove the images underneath. This will result in huge reduction of the file size, but unfortunately you can not automate the process for now, and will need to perform it manually.
Hm, unfortunately there's no "huge" reduction:
document with 300dpi
ScannedDoc.pdf = 1076 kb
ocr run with resample to 150dpi
OcrDoc150.pdf = 954kb
resample to 100dpi
OcrDoc100.pdf =1179kb