Page 1 of 1

PDFXCp_SaveImageFromPage problem and question

Posted: Thu Oct 21, 2004 3:04 pm
by SteveN

I am trying to convert scanned PDF files (each page is one image) to TIF files using VB6.

The code below works, but the TIF image isn't compressed.
(ImCompression_CCITT4 = 7)
How do I compress the file and what should mySaveFormat.CompressionLevel be set to?

I am hardcoding the imgType and resolution (xDpi,yDpi) in the example below. How do I see what these were in the original PDF image ?

Code: Select all

                mySaveFormat.fmtID = PRO_FMT_TIFF_ID
                mySaveFormat.imgType = ImType_bw_1bpp
                mySaveFormat.bConvertToGray = False
                mySaveFormat.bDither = 0
                mySaveFormat.bWriteAlpha = False
                mySaveFormat.xDPI = 150
                mySaveFormat.yDPI = 150
                mySaveFormat.CompressionMethod = ImCompression_CCITT4
                'mySaveFormat.CompressionLevel ????
                mySaveFormat.bAppendToExisting = False
               res = PDFXCp_SaveImageFromPage(pdf, i - 1, j - 1, mySaveFormat, "C:\test.tif")
               If IS_DS_FAILED(res) Then
                   ShowDSErrorString (res)
                   Exit Sub
               End If
PS On further testing I found that if you use a compression method of 3 (LZW) it does compress. However the 5,6,7 (CCITT) compressions don't.
Obviously for a Black and White TIF CCITT compression (eg Goup 4 Fax) is normal.


Posted: Fri Oct 22, 2004 5:01 pm
by John - Tracker Supp
Hi Steve,

there is nothing obvious wrong in your code - can you please put together a small sample project and zip and send - please supply all the project files, compiled exe and also the image files in use and we will take a look and come back


Posted: Fri Oct 22, 2004 6:15 pm
by SteveN
Will do,

Also can you offer any advice on "How do I see what the imgType and resolution (xDpi,yDpi) were in the original PDF image ? "


Posted: Mon Oct 25, 2004 12:26 am
by SteveN

PDF2TIF source code(VB6) and exe attached.

Program reads C:\Test.pdf and converts it to C:\Test.tif

It is not compressing the file when CompressionMethod= 7 (Group4) but will compress when the CompressionMethod is 3 (LZW). As a black and white image Group4 would be a normal compression method, LZW would not.

The input PDF is 28kb, the output TIF file is 272kb

Imaging for Windows and IrfanView both show the TIF file as 'not compressed'.


Posted: Mon Oct 25, 2004 2:42 pm
by John - Tracker Supp
Hi Steve,

thanks for the project - this has allowed us to locate and fix a problem in our TIFF module - this will be available in build 65 available later this week.

With regards the image extraction - the image dimensions are not available I am afraid as when images are stored within a PDF file they are converted to Adobe's own internal format and therefore do not conform to the usual parameters - the image is extracted with the same dimensions as it was converted from originally - unless you choose to scale it. The only way to achieve what you need would be to extract it - then using an imaging tool investigate its properties subsequently and if you need to perform some action, do so - then resave.

However - this is from our dev team member responsible for the new version of XCPRO30.DLL which will be release in the coming 2 months:

1. There is no such property as resolution (xDpi, yDpi) - the image is stored in the PDF and has only the Width and Height (in pixels). Therefore, you would need to know the exact scale factor of the image when it is placed onto the page in the PDF document (i.e. image could be 200x300 pixels, but it is put on the PDF page as 100x500, so from this information one can obtain such properties as the dpi (but it will be very specific dpi, ).

2. With the new xcpro30 version one can obtain the full matrix that was used when the image was actually placed on the page in the PDF, so all parameters (that are defined only by that matrix) will be known.

This operation is not available in the existing version of xcpro30, because it needs the contents of the page to be parsed to obtain the matrix, and it is a very complex operation, because matrixes are written in different places with the contents and there are special rules as to how the final matrix is assembled from all particular matrixes. (In the new version all of this is already implemented)

So hopefully in V3.5 released soon - the user should be able to access the required info.

Hope that helps

Posted: Mon Oct 25, 2004 3:04 pm
by SteveN

I'll pick up build 65 later in the week.

I'm happy to wait for the 3.5 release for the resolution information, does the same apply to checking whether the PDF image is Black and White, Greyscale or Colour ?


Posted: Mon Oct 25, 2004 3:23 pm
by John - Tracker Supp
Yes - this should indeed be possible :)