Page 1 of 1

How to remove added text layer

Posted: Thu Jan 12, 2012 9:25 am
by enaef
Hi

Somewhere (was it in the newsletter?) I read that the "convert to image only" makes it impossible to remove the text. This probably means, that in the case of "preserve the original content ..." the text layer can be removed. How is this done?
Up until now I have saved the original files in case of an advanced OCR functionality in the future.
If I consequently use the "preserve the original content ..." version and at the same time am able to remove the text layer, I won't need to save the original file (without OCR) anymore ...

Thanks, Ernst

Re: How to remove added text layer

Posted: Thu Jan 12, 2012 11:51 am
by Tracker Supp-Stefan
Hello Ernst,

The "convert to image only" will make all the contents of a page a single image - e.g. if the page contained images and typewritten annotation - it will all become a single image which will then be OCRed.

If you decide to preserve the original content - any machine recognizable text will remain as such.
In both cases - the OCRed text will be placed "on top" in a new invisible layer.
Currently there is no way to remove this layer in our products - but as you are aware - a more advanced OCR set of features is coming.

For now I would recommend you to have a non OCRed copy just in case - and once the new functions become available - you will decide for yourself whether the originals are needed any more.

Best,
Stefan

Re: How to remove added text layer

Posted: Thu Sep 19, 2013 9:02 am
by Ludwig
Hi Stefan,

I am desperately waiting for this new OCR feature. I would very much like to have the option to remove existing text layers (sometimes the layers of given files are wrong and I would like to replace them) - but not by turning the file into a mere image-file with a size that is much larger than the original. Can it be said when such a new (but very essential) feature will be implemented?

Best regards
Ludwig

Re: How to remove added text layer

Posted: Thu Sep 19, 2013 12:21 pm
by Tracker Supp-Stefan
Hi Ludwig,

Actually using the new PDF-X Editor:
https://www.pdf-xchange.com/product ... nge-editor
You can modify the base contents of a file and even remove unwanted components, so do give it a try.
The advanced OCR tool that will allow you to pre-select some such operations to eb performed as part of the OCR process is coming a bit later.

Regards,
Stefan

Re: How to remove added text layer

Posted: Thu Oct 17, 2013 3:07 pm
by Ludwig
Hi,

I just want to double check: You recommended the pdf-x editor and to edit the content. Maybe I misunderstood something - does it mean that the pdf-x editor can delete existing ocr-layers too? I don't want to modify any base content but only the text layers on top. In case there is such a feature already then I simply can not find it (I found the "edit content tool" though).

Thanks a lot, Ludwig

Re: How to remove added text layer

Posted: Thu Oct 17, 2013 3:41 pm
by Tracker Supp-Stefan
Hi Ludwig,

Once the OCR layer of text is added to a document it's added as a "base" element and not as e.g. an annotation, so yes using the "Edit content tool" you should be able to select and remove that invisible text object.

Regards,
Stefan

Re: How to remove added text layer

Posted: Sat Oct 19, 2013 9:27 am
by Ludwig
Hi Stefan,

As I found out only preselected sections of a certain page can be edited this way. But this does not work for a document of several hundred pages. Furthermore the size of a file gets bigger when deleting such information. I also have to admit that it took me a while to find out how to use the editor for my concern. Coming from "Menu - Tools - Content Editing Tools - Edit Content Tool" I miss the a selection of the different things and ways to edit. "Editing" can mean a lot of things.

So thank you for suggesting the editor but I think I will have to wait for the advanced OCR-feature.

Best regards
Ludwig

Re: How to remove added text layer

Posted: Mon Oct 21, 2013 10:33 am
by Tracker Supp-Stefan
Hi Ludwig,

Thanks for trying and understanding. We are already working on the new advanced OCR tool.

Regards,
Stefan

Re: How to remove added text layer

Posted: Wed Nov 05, 2014 1:19 pm
by David.P
Hi @all,

just found this thread by Ludwig, and would like to +1 it.
Ludwig wrote:I am desperately waiting for this new OCR feature. I would very much like to have the option to remove existing text layers (sometimes the layers of given files are wrong and I would like to replace them) - but not by turning the file into a mere image-file with a size that is much larger than the original. Can it be said when such a new (but very essential) feature will be implemented?
I have also "desperately" been looking for ways to remove so-called "renderable" text (layers) from PDF files.

For example, I often have like 500-pages scanned PDF's which are only around 10MB INCLUDING an OCR text layer which however I'd like to remove for certain reasons. By re-printing the file to PDF however I always seem to end up with something that is 5 to 10 times bigger (and that is, without text layer).

So has there been any progress on this (i.e. remove text layers from entire documents)?

Regards David.P

Re: How to remove added text layer

Posted: Wed Nov 05, 2014 2:15 pm
by Tracker Supp-Stefan
Hello David,

When you go to Document -> OCR if you select "create new searchable PDF" - this will effectively rasterize the original file, and OCR it after that. And the result will be a new file that will only have the single raster image as background and the new OCR text layer on top on each page.

Regards,
Stefan

Re: How to remove added text layer

Posted: Wed Nov 05, 2014 4:10 pm
by David.P
Yes thanks Stefan, however the "create new searchable PDF" feature again changes color space and/or resolution of the bitmaps in the original file which in many cases makes the file much larger, and possibly even less "sharp", dpi-wise.

The "Add Text Layer" function OTOH does not do that, but it also does not rasterize any vector objects (or text).

So it seems that there is still no way to remove/convert/rasterize all (hidden and/or visible) text in a PDF file while keeping existing bitmap compression untouched.

Best regards
David

Re: How to remove added text layer

Posted: Wed Nov 05, 2014 10:43 pm
by Will - Tracker Supp
Hi David,

I'm afraid that you're right, this is currently possible but maybe something that we can look into implementing in the future.

Cheers,

Re: How to remove added text layer

Posted: Thu Nov 06, 2014 1:17 pm
by Ludwig
Hi there,

I am still hoping that such an essential feature is going to be implemented one day. I would like to combine this with another request/suggestion: Very often a book includes several languages (at least in humanities and sciences). Especially when an English or German book has long Greek passages this becomes a problem as, when applying German or Greek on the whole book, the Greek parts just produce rubbish. At the moment I have two versions of a book - one ocr-ed in English and one in Greek for instance.

I would like to ocr a book in the main language and then I want to "re-ocr" certain selected parts afterwards in another language. For this I suggest not only to implement a removing/erasing feature of the text layer for the whole book but also for selected parts of a page. This also means that it should be possible to ocr not only whole pages but also selected parts of a page.

Thanks and best regards
Ludwig

Re: How to remove added text layer

Posted: Thu Nov 06, 2014 1:19 pm
by Will - Tracker Supp
Hi Ludwig,

Thanks for that - I'll make sure the suggestion is passed along for consideration.

Cheers,