Crop White Space using PDF Tools SDK

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Crop White Space using PDF Tools SDK

Post by relapse »

(Admin note: This initial post was originally included in https://forum.pdf-xchange.com/ ... 36&t=12972 )

Hi!

I have to reanimate this thread. I'd like to retrieve BBoxes of all graphic objects in a PDF and then use one of the BBoxes as CropBox in order to blank the white space. I iterate through all objects and search their dictionaries for items with the "BBox" key but in vain. It seems to be a false strategy.

I actually need to go through this structure (see the attachment), but I don't know how:

Could you advise me something to solve this issue?
Attachments
BBox.zip
(251 Bytes) Downloaded 258 times
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space

Post by relapse »

I now use the PXCp_llGetPageByIndex method to directly get a page and I'd like to go deeper into the structure. The next level is to get a handle of Resources dictionary. I've coded it this way:

Code: Select all

   _handle = GetHandle(_tempFile);
    int hPage;
    var retVal = PdfXchangePro.PXCp_llGetPageByIndex(_handle, 0, out hPage);
    int dictionary;
    retVal = PdfXchangePro.PXCp_ObjectGetDictionary(hPage, out dictionary);
    if (dictionary == 0) { return; }                        
    int dictionaryItemsCount;
    retVal = PdfXchangePro.PXCp_DictionaryGetCount(dictionary, out dictionaryItemsCount);
    if (dictionaryItemsCount == 0) { return; }
    for (var j = 0; j < dictionaryItemsCount; j++)
    {
        var itemKeyHandle = PdfXchangePro.PXCp_StringCreate();
        int itemValueHandle;
        retVal = PdfXchangePro.PXCp_DictionaryGetPair(dictionary, j, itemKeyHandle, out itemValueHandle);
        int bufferLength;
        retVal = PdfXchangePro.PXCp_StringGetB(itemKeyHandle, null, out bufferLength);
        if (bufferLength == 0) continue;
        var bufferItemKeyName = new byte[bufferLength];
        retVal = PdfXchangePro.PXCp_StringGetB(itemKeyHandle, bufferItemKeyName, out bufferLength);
        var itemKeyName = Encoding.ASCII.GetString(bufferItemKeyName);
        if (!itemKeyName.Equals("Resources")) continue;
        int resourcesDictionary;
        retVal = PdfXchangePro.PXCp_ObjectGetDictionary(itemValueHandle, out resourcesDictionary); // here I get resourcesDictionary = 0, why???
    }
    FreeHandle();
Why can't I get the handle of the "Resources" dictionary?
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space

Post by relapse »

I've tested it with another file and it works: though the Resources dictionary isn't optional, it can be empty - a very bad message for me, because it means that there must not always be a BBox in a PDF.


PS: I think this thread belongs to PDF-Tools Library section.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

I've just split the topic and moved the new one in the appropriate forum, and will ask the guys dealing with the Tools SDK to take a look and advise if any alternative solution could be used.

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Thanks!

And I remind you of my idea: I think I could retrieve the BBox of each graphic object and set the CropBox to the maximal BBox in order to blank the white space. And if there will be left some white space, then I could take the next smaller one.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi! I have a new idea 8) But I need a function (at best both of XChangeViewer and PDF Tools Library) to convert/export the entire (visible) PDF to a bitmap. Is there such one? In XChangeViewer I've only found the Snapshot Tool, but it needs user interaction and I need something automated. In PDF Tools Libbrary I've found nothing of the art.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I've found 2 functions in the XChangeViewer Library: PXCV_DrawPageToDC and PXCV_DrawPageToDIBSection. I think it's possible to use the both ones to export a pdf as a bitmap, isn't it?
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3549
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Ivan - Tracker Software »

I've found 2 functions in the XChangeViewer Library: PXCV_DrawPageToDC and PXCV_DrawPageToDIBSection. I think it's possible to use the both ones to export a pdf as a bitmap, isn't it?
Yes, also you can try to use PXCV_DrawPageToIStream.
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi, I've tried to export a single page PDF document, but in vain. Could somebody point out my mistake? I got a .bmp image, which is completely black, but it does have a size of 2 MB. Here is my code:

Code: Select all

var handle = 0;
var retVal = PdfXchangeViewer.PXCV_Init(out handle, PdfXchangePro.SerialNumber, PdfXchangePro.DevelopmentCode);
retVal = PdfXchangeViewer.PXCV_ReadDocumentW(pdfHandle, _tempFile, 0);
object outHeight, outWidth;
var width = Convert.ToInt32(pdfViewer.GetDocumentProperty(_documentId, "Pages[0].Width", out outWidth, 0));
var height = Convert.ToInt32(pdfViewer.GetDocumentProperty(_documentId, "Pages[0].Height", out outHeight, 0));
var bitmap = new Bitmap(width, height);
var graphics = Graphics.FromImage(bitmap);
var deviceContext = graphics.GetHdc();
var commonRendererParams = new PdfXchangeViewer.PXV_CommonRenderParameters();
commonRendererParams.Flags = (int) PdfXchangeViewer.PXV_CommonRenderParametersFlags.pxvrpf_UseVectorRenderer +
                             (int) PdfXchangeViewer.PXV_CommonRenderParametersFlags.pxvrpf_NoTransparentBkgnd;
commonRendererParams.RenderTarget = PdfXchangeViewer.PXCV_RenderMode.pxvrm_Exporting;
var rect = Marshal.AllocHGlobal(Marshal.SizeOf(typeof(PdfXchangeViewerHelper.RECT)));
var pageRect = new PdfXchangeViewerHelper.RECT { top = height, left = width, right = 0, bottom = 0 };
commonRendererParams.DrawRect = 0;
Marshal.StructureToPtr(pageRect, rect, false);
commonRendererParams.WholePageRect = rect;
PdfXchangeViewer.PXCV_DrawPageToDC(handle, 0, deviceContext, ref commonRendererParams);
bitmap.Save(@"C:\picture.bmp", ImageFormat.Bmp);
retVal = PdfXchangeViewer.PXCV_Delete(handle);
The PXCV_DrawPageToDC method is executed without an error.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I've come further! I've changed the following command:

Code: Select all

var pageRect = new PdfXchangeViewerHelper.RECT { top = height, left = width, right = 0, bottom = 0 };
into:

Code: Select all

var pageRect = new PdfXchangeViewerHelper.RECT { top = 0, left = 0, right = width, bottom = height };
and now I can see the whole page exported to a BMP file. But it is still completely black (I can only see the images the page has and colored text). I suppose I should define the background color of the destination graphic object.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

My idea was right: a command to set the background color of the destination bitmap was absent:

Code: Select all

            var graphics = Graphics.FromImage(bitmap);
            graphics.Clear(Color.White); // this command is new and relevant! :)
            var deviceContext = graphics.GetHdc();
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

Thanks for sharing your results - hopefully they would be useful to someone else too!

So I presume you are now generating proper images from your PDFs and this case is resolved?

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I wanted to export a pdf page to a bitmap in order to analyze each pixel's color (white or not) - so I can find white space to be cropped. :shock: Maybe the relation 1:1 (point : pixel) is not precise enough, but I can export a pdf page zoomed, e.g. 1 point = 3 pixel and so on. I'm busy now with the calculating of the coordinates for the new visible area without white space.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

A 1 point to 1 pixel should provide sufficient resolution - that's 595 x 842 px image for an A4 page, or 612 x 792 px for a Letter/Ansi A page.

Please do share the rest of your calculations when done with them!

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I'll do, but a bitmap with 1:1 relation between a pixel and a point has some lacks e.g. where a small italic font is. There are some small pixels that are absent there. It's no problem in the middle of a text block, but if you have such a thing on the brink of text or a group of some figures then it could lead to loss of calculation precision. That's why I tend to 1:n scaling.

What's more, I suppose it would be useful to consider also some color shades of gray as the pure white color.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi, I still stay tuned to that stuff. I detected that exporting a pdf to a bitmap has to be on a scale of 1:8 or even bigger otherwise it's not precise enough. But it's a calculating problem when parsing a huge number of pixels in a bitmap where an object to be cropped is a small one. I've also detected that the more the white space is, the more is the inexactitude of found coordinates. That's why I'm trying now first to export on a scale of 1:1 (or even 2:1) in order to detect approximate coordinates of a non-white object, then I'll add a couple of pixels (lines) to be sure it won't be cut too small, then I'll go back to the pdf and export that approximate area on a bigger scale to find the exact coordinates. This approach should save the calculating time.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

A new cognition: see the attachment file, it's a screen shot after a localization of coordinates after an export of DIN A4 to a bitmap of 5000x5000 (sorry, I've edited it, 1:1 scaling was earlier). The inexactitude seems to be proportional to the distance from the object's central point to the outer ranges of the bitmap. The measuring unit of the rulers is a point.
1to1.zip
(47.54 KiB) Downloaded 277 times
The question is: When should I add some fractions of a point and when I should subtract them? The first idea is: if the distance between the central point of the object to the documents outer range is bigger than the object itself (distance from object's central point to its outer range) multiplied with a factor k, then - subtract, otherwise - add. Now I have to find the value of that factor.
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: Crop White Space using PDF Tools SDK

Post by John - Tracker Supp »

Hi,

a developer has been assigned to investigate your questions and answer tomorrow as this appears to need some time allocated to thoroughly answer.

thanks for your patience.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi, I got a positive result in my testing using a 2-step approach. First I do rough localization of non-white pixels using an 1:1 export then I make a fine search using an export to a bitmap of a definite size (5000 x 5000). Please check the attachment files (the measuring unit is a mm). I think the inexactitude is minimal. What lacks is an automation by calculating a number of additional pixels during the rough search - it's associated with the issue I described in my previous message. I'm working on editing my code to put it out here.

P.S. The searching algorithm should be also optimized. In the example with the smiley it takes 8 seconds :shock: . With the rectangle - < 2 seconds. But I also execute some testing operations there e.g. I store a bitmap to a .bmp file in order to be able to view it an so on.
Attachments
pdfs.zip
(72.82 KiB) Downloaded 253 times
results.zip
(112.87 KiB) Downloaded 256 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

You've achieved quite some precision there, and I am sure you will improve and speed up your algorithms, but is such a precision necessary? A 1 point precision means that you will have a maximum of 0.352(7) mm of "white space" - just over one third of a mm - at the worst case and will only require the 1:1 export and a single pass through your pages. Maybe a 4:1 export and single pass would also be faster than your current 2 pass method and should give you a 1/4 point precision which is less than the 0.1 mm a normal human eye could distinguish (and is effectively giving you a resolution of 288 dpi)?

As a matter of fact I tested your samples with our Viewer's "Remove White Space" and "Set to White Margins" options -and neither is producing as good results as you even though there are just your test objects and no other hidden items or ones with white background.

Best,
Stefan
Nico - Tracker Supp
User
Posts: 205
Joined: Fri May 18, 2012 8:41 pm

Re: Crop White Space using PDF Tools SDK

Post by Nico - Tracker Supp »

Hi relapse,

Thank you for your post.
I believe your approach about how to perform the first rough localization of the target object sounds logic. To discard big regions with white pixels a rough approximation is the way to go. You can find a rectangle that encloses the target object and then do the fine work around the borders of the object. About when to add some fractions of a point and when to subtract them, I'm assuming you are talking about the rough localization. My interpretation is: as you said it will depend on the size of the object compared to the size of the page. The bigger the object is, the lesser amount of white pixels is, therefore it would be a good idea to add some to remove them later (please let me know if this is right). The smaller the object is, the greater amount of white pixels is, therefore it would be a good idea to remove some. You can do this by coordinate (x and y)and by specifying a maximum error, you should be able to come up with the same approximation no matter where the object is located on the page. I guess this error will have to do with the closest distance the object can be from a border. To improve this error, I would consider implementing Stefan's idea about amplifying the image first when you export it to get higher resolution later at the processing point.

We would like to know if you don't mind what is the use or goal of this functionality. In the end, this will influence what approach to follow.
Please feel free to ask any more questions.
Thanks.

Sincerely,
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi! Please check the attachment! Yesterday I've spent a lot of time trying to fix an error: I manually cropped an area on a pdf document with a small object and then tried to delete white space with my function. In vain: I got an endless loop while searching the fine bitmap. The object couldn't be exported to a bitmap, because of a too big relation between its size (or the size of the cropped pdf because I save it after manual cropping) and the size of the bitmap (e.g 20x20 points to 5000x5000 pixels). The size of the bitmap should be dynamically calculated!
Attachments
attachment.zip
(1.88 KiB) Downloaded 227 times
Last edited by relapse on Fri Jul 20, 2012 9:24 am, edited 1 time in total.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I've changed the values of altering the area size in the each of four iterations and it gives the most successful result while deleting the white space with the particular pdf including a rectangle (see the attachment pdfs.zip)

Code: Select all

                    selection.Top = isRoughCalculating ? y - 1 : y; 
                    selection.Bottom = isRoughCalculating ? y + 3 : y;        
                    selection.Left = isRoughCalculating ? x - 3 : x;                    
                    selection.Right = isRoughCalculating ? x + 3 : x;
whatever you use for rough calculating - a whole document or a cropped area with other proportions for distance from the object to the outer sides. If I use the document with smiley, the result is not so precise. I see no logic there any more. Maybe the value to add/subtract depends upon the size of the object itself? But I don't know the size of the object during the rough calculation! Then I should correct the final box size!
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

To Nico: The goal of this functionality is to delete white space in a pdf document based on color and independent from the structure of the pdf. You can also define a color, if it's not pure white. By the way I've inverted the white pixels of the test pdf I originally posted here and found out that the surrounding color close to the colored object was not white (see the pdf and inverted exported bitmap). It was of RGB (253, 253, 253). :shock: (see the attachment). Using my functionality you can define the color e.g. everything in the range of 255 - 253 should be interpreted as white.

To Stefan: Nico is right pointing out the fact that
The bigger the object is, the lesser amount of white pixels is
and vice versa (calculating the rough size) that's why I have to find the relation between the object and the white border after the rough analyzing in order to respect that result while calculating the fine coordinates. I've also tested the functionality with large objects, it's cut off too "much" (some fractions of a point but more than by small objects) in the result pdf.
Attachments
test_object.zip
(15.69 KiB) Downloaded 229 times
Nico - Tracker Supp
User
Posts: 205
Joined: Fri May 18, 2012 8:41 pm

Re: Crop White Space using PDF Tools SDK

Post by Nico - Tracker Supp »

Hi relapse,

Thank you for sharing with us your findings, that's really awesome.
I believe you are on the right track, if you have any further questions let us know.
Thanks.

Sincerely,
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi! I have a new idea. Instead of analyzing of the whole rough object, it's possible to analyze 4 small areas, localized through finding each of first 4 non-white pixels, enlarging those points with a couple additional pixels. Then I could export each of those small pdf areas in scale of 1:1000 and achieve the best precision. My issue is: I've always exported the whole page, but how can I export a certain area of the page? I do know I have to use CommonRendererParams.DrawRect parameter of the PXCV_DrawPageToDC function, but what is with the CommonRendererParams.WholePageRect parameter? Are they to be equal? Rather not. Please see my attachment file. The rects on the range of the circle are the areas I want to export.
Attachments
new.zip
(75.87 KiB) Downloaded 229 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

Have you checked the "comments" section in the help file for CommonRenderParameters? There are a couple of samples showing how you could render the desired section(s) of your page.

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

I can't get it! :( If I have coordinates of a part of the whole page, why do I need to specify the coordinates of the whole page? Why not simply export that part of the page to a bitmap of a certain size?
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

This is done to ease zooming, and if you are going to e.g. display that portion of the page on screen and allow some sort of scrolling. So that you don't need to specify the zoom every time - as the WholePageRect remains constant until you change the zoom. Then you just change the DrawRect to render specific areas of the page with the desired offset.

So you need to specify the WholePageRect only once - and then use four separate DrawRect-angles to render the areas around the bordering coordinates you've selected earlier in the lower resolution.

Best,
Stefan
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: Crop White Space using PDF Tools SDK

Post by Lzcat - Tracker Supp »

It is very simple.
Lets imagine that we have page 3000x4000 pixels na need to draw rect [800, 600, 950, 900] in page coordinates on screen DC in coordinates 120, 70. In case of bitmap creation you must create bitmap, draw to bitmap and then draw bitmap to DC. It will work, but require bitmap creation for every drawing. And when you try to print you will be forced to rasterize page before printing. Not so goo solution, isn't it?
In your case all that needs to be done is to just calculate two rects.
Normally WholePageRect is [0, 0, 3000, 4000] (in our example, real width/hieght will be different) and DrawRect is [800, 600, 950, 900]. But we need to draw rect in coordinates 120, 70, not 800, 600, so we will offset both rects by 680 to the left and 530 up. So WholePageRect became [-680, -530, 2320, 3470] and DrawRect became [120, 70, 270, 370].
Your case is a bit simpler. For example your page will have dimentions 6000x9000, and you want to draw rect [5500, 6200, 5600, 6400] into bitmap 100x200. So your DrawRect should be [0, 0, 100, 200] (as bitmap starts from 0, 0 coordinates) and WholePageRect should be [-5500, -6200, 500, 2800].
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

OK, but the part I want to export won't be scaled. I get the subarea that should be exported, I define the new size of the bitmap, but after exporting I have a large bitmap and a single black point on it: the exported subarea wasn't scaled.

P.S. It was related to the message of Stefan.
Last edited by relapse on Fri Aug 03, 2012 12:56 pm, edited 1 time in total.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

With Victor's explanation does it make more sense and are you getting the proper and zoomed areas now?

Best,
Stefan
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: Crop White Space using PDF Tools SDK

Post by Lzcat - Tracker Supp »

To scale page you should enlarge WholePageRect - it determine scaling coefficients. For example if page has dimensions 612x792 pt to scale it 1 to 100 WholePageRect should be [0, 0, 61200, 79200].
If you need rect 100x100 pixels (scaled!) from bottom-right corner, WholePageRect should be [-61100, -79100, 100, 100] and DrawRect [0, 0, 100, 100].
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

No, I'm too silly today :D
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

:)
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi, I was on an unforgettable vacation last week. Now I can think a little bit better :). I have read the last message of Lzcat, and as I have a heavily limited spatial visualization ability 8) could you please give me an analog example with DrawRect coordinates which are not equal null. I mean - not DrawRect [0, 0, 100, 100], but e.g. DrawRect [10, 20, 30, 40]. It would be very helpful for me!


Thanks a lot!
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

Glad that you enjoyed your holiday.

An analogy that might help - imagine having a sheet of paper with a small hole on it - that hole is with size 100 by 100 "pixels"
This hole has a top left corner with coordinates 0, 0 and bottom right corner with coordinates 100, 100

You now have another sheet of paper with dimensions 61200 by 79200 "pixels" - you need to position it relative to the coordinate system of the sheet with the hole - so that you can see through the hole the area you want. if that is the bottom right corner of the whole page then you need to move that whole page's top left corner at position -61100, -79100 - and it's bottom right corner will be at exactly 100,100

But if you want an offset of 1000, 1000 from the top left corner of the whole page - your whole rectangle top left corner's coordinates would be -1000, -1000, and the bottom right would be 60200, 78200. You will now see in the DrawRect the area from the whole page which if it had it's own coordinate system starting at it's upper left corner would have been [1000, 1000, 1100, 1100].

Your DrawRect will always start at 0,0 - and will be as big as you need it to be - with the bottom right coordinates calculated accordingly - you then "move" the WholePage so that you can fit the desired part of it in that DrawRect window.

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi. Thanks for your explanation. In order to be absolutely sure (I'm altering my code that's why i can't prove it now), I've tried to visualize it in an example. Please, check the attachment.
As you can see, I have a pdf of 10 x 8. I'd like to export the whole colored area which have the coordinates [3, 1, 6, 3] scaling it to 1000% (scaling factor of 10). The DrawRect would be then [0, 0, 30, 20]. The WholePageRect would be then [-70, -90, 30, 10], wouldn't it? I hope, I'm right at last!

My calculations:
left = -70 = - WholeScaledPageWidth + LeftScaled = - (10*10) + 3*10
top = -80 = - WholeScaledPageHeight + TopScaled = - (8*10) + 1*10
right = 30 = left + WholeScaledPageWidth = -70 + 100
bottom = 20 = top + WholeScaledPageHeight = -80 + 100


Thanks
Attachments
WholePageRect_DrawRect.zip
(25.44 KiB) Downloaded 218 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi relapse,

Yes the Draw Rect coordinates are correct now but the WholePage Rect ones are not. The whole Page's rectangle top left corner should be at position -30,-10 (scaled) - so that the start of your black area could go to position 0,0 in the DrawRect coordinate system.

So your calculations should be:
left = -30 = - LeftScaled = - 3*10
top = -10 = - TopScaled = -1*10
right = 70 = WholeScaledPageWidth - LeftScaled = 100 - 30
bottom = 70 = WholeScaledPageHeight -TopScaled = 80-10

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

8) Thanks! I've understood it, the top left point of the scaled object should be on the beginning of the coordinates. It was surely explained somewhere earlier. Shame on me!
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

;)
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi! I'm almost ready but one issue is left and I can't solve it. Please check my attachment. As you can see, after an export of a pdf (smiley2.pdf) to a bitmap (1to1export.bmp) without any scaling (1:1) it came to loss of the original round object form. It almost "mutated" to a rectangle. That's why while searching for a first non-white pixel it comes to a dislocation (the "area"-files) which may result in losing the 4 peaks I search for. That's why I have to correct the dislocation calculating the coordinates of the four little areas around the peaks "artificially" with some constants instead of simply to add a couple of pixels/points in every direction. Could you advise me anything to "smooth" those corners? Maybe it's possible to export a pdf using aliasing (I'm not sure I use here the correct term)?

Thanks for your reply!
Attachments
exportIssue.zip
(98.04 KiB) Downloaded 219 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

You are exporting a 2.1 mm x 2.1mm smiley with 72 dpi - so the resulting object will be with a size of around 6 by 6 pixels - that's why you get the shape very distorted.

Could you please test your algorithms (particularly the one detecting the left border) with the attached revised sample.

If you are only taking the first occurrence of the leftmost pixel that is not white - then you might be missing objects that are with coordinates closer to the start of your coordinate system than the one detected. - and in your case with the smiley - you have three pixels with the same X or Y coordinate - and seems like your algorithms is picking the first - when in reality you should be checking all three.

If I am right in guessing why you are getting those offsets - your current maximum error would be 1/72 of an inch (0.35mm) - but in reality the average would be closer to half that in extreme cases like my sample, and in practice I don't believe you are very likely to encounter such files.

Best,
Stefan
Attachments
smiley2.pdf
(94.29 KiB) Downloaded 295 times
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi, I've tested the modified pdf, you can find the result in the attachment. Surely I'm aware of the fact that such samples are not the usual art of pdf to be edited but you can find such a tiny curve in the font especially in italics if you want to crop white space in a document that contents text. I do need bigger scaling for the first rough export and larger areas around the 4 outer points. It is a problem if the curve runs very slow. Then the displacement can be of large scale, so I need to really enlarge the portion of exported pdf. It's a pity, I wanted to let the portion bitmap be as little as possible. But one can't have everything :D
Attachments
smiley2_reply_12345_201208281434368748.pdf
(93.91 KiB) Downloaded 221 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

You just need to check all pixels that have the same X (for left and right) or Y (for top and bottom) coordinate and see how many of them are not white - then you will need to check the areas around all of them with the same small zoomed in image - and find the actual extreme.

Maybe there are better algorithms out there - but this is the one that comes to my mind right now.

Best,
Stefan
ajackson
User
Posts: 1
Joined: Wed Sep 12, 2012 3:32 pm

Re: Crop White Space using PDF Tools SDK

Post by ajackson »

I'd like to delete white space retrieving the PDF structure. Is it possible to solve the issue that way?
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi ajackson,

It's technically possible - but you would need to do that by examining each element and calculating it's coordinates, and you will most likely need to rely on the Low Level API - for which we do not provide any support and you would need extensive knowledge of the PDF Structure to be able to use that.

Best,
Stefan
relapse
User
Posts: 167
Joined: Wed Jan 18, 2012 11:10 am

Re: Crop White Space using PDF Tools SDK

Post by relapse »

Hi! I've found the error which resulted in incorrect coordinates of CropBox but only by the elements with curves. It was the PXV_CommonRenderParametersFlags.pxvrpf_UseVectorRenderer flag of the PXV_CommonRenderParameters structure which I use during the export of the pdf to a bitmap! I ceased using it (it's a part of a disjunction) and it functioned! My algorithm was correct. I'll try to post the code snippets later to illustrate it. In order to see the effect that flag causes you can see the attachment.
Attachments
pxvrpf_UseVectorRenderer.zip
(14.49 KiB) Downloaded 207 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17906
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Crop White Space using PDF Tools SDK

Post by Tracker Supp-Stefan »

Hi Relapse,

Glad that you got it all working!
Looking forward to the snippets!

Cheers,
Stefan
Post Reply