PDFdriverAPI example project: Word to PDF ocr issue

ase technologies · Post by **ase technologies** » Fri Mar 23, 2012 3:56 pm

I am converting a Word doc that contains an installed True Type ocr font. I have the font setup on my machine, and the font appears correctly in the Word doc. When I use your PDFdriverApi example project to convert it to PDF the font does not make it to the PDF.

Is there additional coding that must be done in the conversion process to get the font to the PDF?

Post by **John - Tracker Supp** » Fri Mar 23, 2012 4:14 pm

Hi,

Please see the section in the manual on page 24 - regarding embedding fonts.

ase technologies · Post by **ase technologies** » Tue Mar 27, 2012 1:56 am

Using the embed all fonts property worked, thank you.

My next question is about applying a text watermark to this newly created PDF/A document. Is it possible to add a text watermark to a PDF/A document and still keep it within PDF/A specifications? Is there a way to embed the font of the text watermark within the PDF/A document?

Tue Mar 27, 2012 12:02 pm

Hello ase technologies,

Please check
3.1.2.11 Section Watermarks
In the Drivers API - for all the available watermark options, and as for the font that will be used - it will be handled at the PDF Creation time - and properly embedded as required by the PDF/A specification - especially when you are already using the "Embed all used fonts" option.

Best,
Stefan

Post by **John - Tracker Supp** » Tue Mar 27, 2012 1:35 pm

BUT - do note that if you are modify an existing PDF - PDF/A status will be compromised.

For now our Libraries do not allow you to set the PDF/A status through the XCPRO40 library when modifying an existing PDF - this can only be done at creation time via the driver or the pxclib40.dll library.

ase technologies · Post by **ase technologies** » Wed Mar 28, 2012 4:59 am

John - Tracker Supp wrote:BUT - do note that if you are modify an existing PDF - PDF/A status will be compromised.

For now our Libraries do not allow you to set the PDF/A status through the XCPRO40 library when modifying an existing PDF - this can only be done at creation time via the driver or the pxclib40.dll library.

John,
I'm trying to apply just a basic text watermark during the conversion with your PDFdriver (using the AddTextWatermark method in the PDFdriverAPI example code). I've added text watermark properties and a call to the AddTextWatermark method to your sample code below, but it doesnt seem to be applying to the PDF. Do I need to modify any of the properties or call the method differently?

private void bFilePrint_Click(object sender, EventArgs e)
{
PDFPrinter.SetAsDefaultPrinter();

PDFPrinter.Option["General.Specification"] = -1;
PDFPrinter.Option["Fonts.EmbedAll"] = true;

string watermarkName = "";
string watermarkText = "THIS IS A SAMPLE WATERMARK" ;
string watermarkFontName = "Helvetica";
int fontweight = 5;
int italic = 0;
int outline = 0;
int fontSize = 72;
int lineWidth = 0;
int textColor = 0;
int alignment = 0;
int xOffset = 100;
int yoffset = 100;
int angle = 0;
int opacity = 0;
int flags = 3;
int placeType = 5;
string pageRange = "1-3";

bPXCPrinterDefault = true;

OpenFileDialog ofd = new OpenFileDialog();
ofd.Filter = "TextFiles|*.doc;*.txt";
if (DialogResult.OK == ofd.ShowDialog(this))
{
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = ofd.FileName;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";

PDFPrinter.AddTextWatermark(watermarkName, watermarkText, watermarkFontName, fontweight, italic, outline, fontSize, lineWidth, textColor, alignment,
xOffset, yoffset, angle, opacity, flags, placeType, pageRange);

printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
}
}

Post by **Tracker Supp-Stefan** » Wed Mar 28, 2012 5:28 am

Hello ase technologies,

After you have defined all your watermarks (using AddTextWatermark or AddImageWatermark) - you then need to enable them for insertion in your document.

Please check the Watermarks section in the API manual (3.1.2.11 in the manual) - you need to both enable printing of the defined watermarks and provide a list of those to be printed (as you can define e.g. 2 watermarks and only include one or the other in alternating PDFs).

Best,
Stefan

ase technologies · Post by **ase technologies** » Tue Apr 03, 2012 7:34 am

Adding a single watermark works fine, but I'm not able to add multiple. As the API states, I've enabled watermarks, I'm using a comma seperated value string of watermarks, and added each watermark with the AddTextWatermark method. Is there something else that I'm missing?

I'm adding multiple watermarks as shown in the method below (The rest of my code for this class is attached). The text watermarks should be very similar, but contain different text and a different page range.

public static void SetWatermarks(string[,] headerInfoArray)
{
string watermarkName = "TestWatermark";
string watermarkText = "THIS IS A SAMPLE WATERMARK";
string watermarkFontName = "Courier";
int fontweight = 5;
int italic = 0;
int outline = 0;
int fontSize = 3;
int lineWidth = 1;
int textColor = 0;
int alignment = 0;
int xOffset = 40;
int yoffset = 2750;
int angle = 0;
int opacity = 50;
int flags = 5;
int placeType = 5;
string pageRange = "";

for (int i = 0; i < 7 ; i++) // i < length of the array
{
if(! string.IsNullOrEmpty(headerInfoArray[i, 1])) //if the page range string is not null or empty
{
watermarkName = headerInfoArray[i, 0];

if (i == 0)
watermarkText = "$$FORMMETADATA$$:LIAB001F";
if(i==1)
watermarkText = "$$FORMMETADATA$$:LIAB001B";
if (i == 2)
watermarkText = "$$FORMMETADATA$$:CCRDS01F";
if (i == 3)
watermarkText = "$$FORMMETADATA$$:CCRDS01B";
if (i == 4)
watermarkText = "$$FORMMETADATA$$:WHITE";
if (i == 5)
watermarkText = "$$FORMMETADATA$$:BILL001F";
if (i == 6)
watermarkText = "$$FORMMETADATA$$:BILL001B";

pageRange = headerInfoArray[i,1];

PDFPrinter.AddTextWatermark(watermarkName, watermarkText, watermarkFontName, fontweight, italic, outline, fontSize, lineWidth, textColor, alignment,
xOffset, yoffset, angle, opacity, flags, placeType, pageRange);
}

}

}

ase technologies · Post by **ase technologies** » Tue Apr 03, 2012 7:41 am

Code for the method that converts the document and applies the watermarks:

public static void WritePDF(string[,] headerInfoArray, string inputFilePathName)
{
var l = new WORD.License();
l.SetLicense("AsposeTotal.Lic");
var k = new PDF.License();
k.SetLicense("AsposeTotal.Lic");

object testObject = new object();

PXCComLib.CPXCControlEx prnFactory = new PXCComLib.CPXCControlEx();
PDFPrinter = (PXCComLib.CPXCPrinter)prnFactory.get_Printer("", "Simple PDF-XChange", "DO NOT POST LICENSE INFO HERE!!!"); // "<YOUR REG KEY>", "<YOUR DEV CODE>")))

PDFPrinter.SetAsDefaultPrinter();

PDFPrinter.Option["General.Specification"] = -1;
PDFPrinter.Option["Fonts.EmbedAll"] = true;
PDFPrinter.Option["Watermarks.Enabled"] = true;

SetWatermarks(headerInfoArray);

string stringOfWatermarkNames = GetWatermarkNamesString(headerInfoArray);
PDFPrinter.Option["Watermarks.Watermarks"] = stringOfWatermarkNames;

bPXCPrinterDefault = true;

System.Diagnostics.Process printJob = new System.Diagnostics.Process();

printJob.StartInfo.FileName = inputFilePathName;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";

printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();

}

Tue Apr 03, 2012 12:20 pm

Hello ase technologies,

First please make sure to NOT port your license key in public forums like this one in the future.
I will now need to disable your code and you will have to contact us at sales@pdf-xchange.com so that we can issue you a new one.

As for the problem what exactly happens when you try to place the watermarks? Is there only one watermark that is placed or none? A sample result would be very helpful.

Best,
Stefan

P.S. I also had to remove your attachment as it contained your license info as well. Attached is your code with the license information properly hidden.

ase technologies · Post by **ase technologies** » Tue Apr 03, 2012 3:43 pm

Stefan,
My apologies. Attached is the latest sample PDF in which I attempted to add 3 watermarks, but none showed up on the output.

ase technologies · Post by **ase technologies** » Tue Apr 03, 2012 8:36 pm

Stefan,
I received an email from Paul O'Rorke (support@pdf-xchange.com) that said it may take a few days to get back to my forum posting because of your upcoming new release. Do you think it will take that long, or might you have a response sooner?

We are really in a crunch time ourselves with this client.

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue Apr 03, 2012 9:33 pm

Hi,

C# is not my area of expertise (I usually handle C++ related inquiries) but I will be looking into this for you as I am familiar with the usage of our products.

The first thing I would recommend is making sure that you do not use the same watermark name for each watermark. This is set with the first argument to the AddTextWatermark() method, and should be different for each watermark - otherwise each subsequent call will overwrite the previous one.

I will look over your code in more detail but on first glance this stuck out (maybe I missed something).

-Walter

Walter-Tracker Supp · Tue Apr 03, 2012 10:42 pm

I've gone over the code and found a couple of things. I have successfully created multiple watermarks as follows:

Code: Select all

            PDFPrinter.Option["Watermarks.Enabled"] = true;


            string watermarkText = "TextWatermark1";
            string watermarkName = "Name";
            string watermarkFontName = "Arial";
            int fontweight = 400;
            int italic = 0;
            int outline = 0;
            int fontSize = 20;
            int lineWidth = 5;
            int textColor = 0;
            int alignment = 0;
            int xOffset = 0;
            int yOffset = 0;
            int angle = 0;
            int opacity = 100;
            int flags = 15;
            int placeType = 0;
            string pageRange = "";
            //PDFPrinter.AddTextWatermark(watermarkText, watermarkText, watermarkFontName, fontweight, italic, outline, fontSize, lineWidth, textColor, alignment, xOffset, yOffset, angle, opacity, flags, placeType, pageRange);

            for (int i = 0; i < 3; i++)
            {
                if (i == 0)
                {
                    watermarkName = "Watermark1";
                    watermarkText = "THis is the first watermark.";
                    yOffset = 50;
                }
                if (i == 1)
                {
                    watermarkName = "Watermark2";
                    watermarkText = "A second one.";
                    yOffset = 100;
                }
                if (i == 2)
                {
                    watermarkName = "Watermark3";
                    watermarkText = "The third, and final, watermark.  The greatest one of all!";
                    yOffset = 150;
                }

                PDFPrinter.AddTextWatermark(watermarkName, watermarkText, watermarkFontName, fontweight, italic, outline, fontSize, lineWidth, textColor, alignment,
                    xOffset, yOffset, angle, opacity, flags, placeType, pageRange);
            }

            PDFPrinter.Option["Watermarks.Watermarks"] = "Watermark1;Watermark2;Watermark3";
            PDFPrinter.ApplyOptions(0);

Things I changed included:

0. The Option["Watermarks.Watermarks"] seems to require a semi-colon separated list of watermark names, not comma-separated as indicated in the documentation. I suppose this is a documentation mistake. We will get this sorted out internally to avoid this kind of confusion.
1. fontweight should be 400 (for normal) or 700 (for bold), according to the documentation.
2. fontsize 3 might be a little small but you can play with that.
3. I changed x and y offset to zero for simplicity and changed y offset for each subsequent watermark. You can play with this to suit your needs, obviously.
4. flags = 15 (binary: 1111). This is all flags on. Also worked with flags = 14 (1110), all on except "background". Background may work for your files, I'm not sure.
5. placetype = 0 for all pages.

Worked for me.

ase technologies · Post by **ase technologies** » Fri Apr 06, 2012 5:27 am

Walter,
Thank you for the response. It must have been the page range separator values that were causing the issue. I now have our converted document showing multiple text watermarks.

We have continually been receiving a MS Word dialog box during the conversion. It says:

"It will not be possible to send PRINT field data to the printer with the currently installed printer driver. Do you want to continue printing?"

Do you have any idea what this is related to? Do I need to install a Tracker print driver or add any additional parameters? If I click the "yes" button, the file converts and prints just fine. We are trying to automate a large number of conversions, and would like to not have this dialog box.

Also, it appears that the document is being opened by MS word, then being converted. There is a noticable delay when MS Word is opening (even if the file is not displayed in Word). It seems like this is the slowest piece of the conversion (especially if you're converting hundreds or thousands of documents a day). Is there an option for disabling the opeing of Miscrosoft Word before the conversion, or is that something that must be done in order to convert the Word document? It possible that I missed something in the API, but I don't recall seeing much about it.

I'm running word 2010 on a Windows 7 development machine.

Thanks.

ase technologies · Post by **ase technologies** » Fri Apr 06, 2012 5:32 am

On a recent test I recieved the following error:

Retrieving the COM class factory for component with CLSID {217974DB-6777-4A9F-90A7-AA5EBA834BA2} failed due to the following error: 80080005.

Are you familiar with this error?

The only forum search that came up with anything on that error was from back in early 2009:
https://forum.pdf-xchange.com/ ... 005#p25090

I haven't made any major changes to my code. I've just been testing different documents.

ase technologies · Post by **ase technologies** » Mon Apr 09, 2012 3:25 pm

Our client is looking for performance numbers in the range of 2 seconds per converted document. The conversion process is taking a bit longer than that, and its partly due to the described situation below. Is a way to convert word to pdf without opening the MS Word application?

ase technologies wrote:Walter,

it appears that the document is being opened by MS word, then being converted. There is a noticable delay when MS Word is opening (even if the file is not displayed in Word). It seems like this is the slowest piece of the conversion (especially if you're converting hundreds or thousands of documents a day). Is there an option for disabling the opeing of Miscrosoft Word before the conversion, or is that something that must be done in order to convert the Word document? It possible that I missed something in the API, but I don't recall seeing much about it.

I'm running word 2010 on a Windows 7 development machine.

Thanks.

ase technologies · Post by **ase technologies** » Mon Apr 09, 2012 3:40 pm

Attached is one of our sample word doc's that we need to convert in 2 seconds for our client. It's about 650k, and 20 pages.

Tue Apr 10, 2012 10:14 am

Hello Peter,

I am afraid that you will need Word to open/load the document and send it to our printing drivers. There is no way for our drivers to directly process that file and convert it to PDF on it's own. You will need to check the Word API calls you are making and see if there is any way to optimise them. e.g. if it's possible to only load Word once -and then process multiple files with the same instance - this will still take the significant initial load time, but multiple document conversions should also happen significantly faster.

Best,
Stefan

ase technologies · Post by **ase technologies** » Tue Apr 10, 2012 1:54 pm

Stefan,
Thanks for the reply.

Unfortunatly this may be a deal breaker for our client as the volumes on their conversions are extremely high. The majority of the code in our conversion piece in this project is modified sample code from your SDK conversion example (we are just adding some dynamic text watermarks). I'm not making any calls to the Word API at the moment. All of that is being done by Tracker code.

I didn't see too much in your API regarding calls to Word and the actual conversion of word to pdf.

Is it possible to work further with someone in support on this?

The entire conversion method we are using is below. As I mentioned before, the conversion piece is taken from your SDK sample project. Might there be a faster way to convert Word docs?

public static void WritePDF(string[,] headerInfoArray, string inputFilePathName, string outputPath)
{
PXCComLib.CPXCControlEx prnFactory = new PXCComLib.CPXCControlEx();
PDFPrinter = (PXCComLib.CPXCPrinter)prnFactory.get_Printer("", "Simple PDF-XChange","#"); // "<YOUR REG KEY>", "<YOUR DEV CODE>")))

PDFPrinter.Option["General.Specification"] = -1;
PDFPrinter.Option["Fonts.EmbedAll"] = true;
PDFPrinter.Option["Watermarks.Enabled"] = true;
PDFPrinter.Option["Save.SaveType"] = 2;
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.Path"] = outputPath;

PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Saver.ShowProgress"] = false;

SetWatermarks(headerInfoArray);
string stringOfWatermarkNames = GetWatermarkNamesString(headerInfoArray);

PDFPrinter.Option["Watermarks.Watermarks"] = stringOfWatermarkNames;
bPXCPrinterDefault = true;

System.Diagnostics.Process printJob = new System.Diagnostics.Process();

printJob.StartInfo.FileName = inputFilePathName;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";

PDFPrinter.SetAsDefaultPrinter();

printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;

printJob.Start();

}

Tracker Supp-Stefan wrote:Hello Peter,

I am afraid that you will need Word to open/load the document and send it to our printing drivers. There is no way for our drivers to directly process that file and convert it to PDF on it's own. You will need to check the Word API calls you are making and see if there is any way to optimise them. e.g. if it's possible to only load Word once -and then process multiple files with the same instance - this will still take the significant initial load time, but multiple document conversions should also happen significantly faster.

Best,
Stefan

Post by **Tracker Supp-Stefan** » Tue Apr 10, 2012 2:37 pm

Hello Peter,

Please try somethign like this:
http://www.c-sharpcorner.com/UploadFile ... ation.aspx
To load word only once and then then open, print and close as many documents as you need.
The initial loading of word will still be present, but it should happen only once.

Best,
Stefan
Tracker

ase technologies · Post by **ase technologies** » Wed Apr 11, 2012 4:12 am

Stefan,
I will check out that link on Word automation that you sent me.

I'm looking at the Xchange driver API SDK and cant find much information on the conversion process. The PDFX4DRVAPI.PDF document tells me about all the methods and properties that are available to a IPXCPrinter, but little about the actual conversion process.

From the sample project I can see that the conversion is done via the printJob object, but I'm not able to follow it at that point.

Do you know what your other enterprise level customers have been doing for conversions? Are they doing something different?

Wed Apr 11, 2012 11:38 am

Hello Peter,

The Drivers API provides you with all the tools needed so that you can correctly generate the needed PDF file from your original input. The PDF generation process itself is deliberately in a black box. If you want a direct control over the construction of a PDF document - you can use the PDF Tools SDK (low level) but this will require significant knowledge of the PDF standard and a lot more coding.

The sample projects are designed to show you how to set some of the most commonly used parameters - and do a simple "print" - if you want something more sophisticated/optimized - you will need to handle your input (and the printing application) in a more direct way e.g. via Word Automation in your case.

Best,
Stefan

ase technologies · Post by **ase technologies** » Fri Apr 27, 2012 3:58 am

Have you guys looked into this at all or see it before?

I've attached a bunch of our test files that all display this message during conversion.

ase technologies wrote:Walter,
We have continually been receiving a MS Word dialog box during the conversion. It says:

"It will not be possible to send PRINT field data to the printer with the currently installed printer driver. Do you want to continue printing?"

Do you have any idea what this is related to? Do I need to install a Tracker print driver or add any additional parameters? If I click the "yes" button, the file converts and prints just fine. We are trying to automate a large number of conversions, and would like to not have this dialog box.

ase technologies · Post by **ase technologies** » Mon Apr 30, 2012 5:14 am

Any updates on this?

Is it related to the PDF-XChange driver? How the driver handles print commands? We have embeded print commands in the word docs that we're converted. If you can't see them in the attached files from my last post each document may contain one or more of the following:

PRINT27 "&l1H"
PRINT27 "&l2H"
PRINT27 "&l3H"
PRINT27 "&l4H"

ase technologies wrote:Have you guys looked into this at all or see it before?

I've attached a bunch of our test files that all display this message during conversion.

ase technologies wrote:Walter,
We have continually been receiving a MS Word dialog box during the conversion. It says:

"It will not be possible to send PRINT field data to the printer with the currently installed printer driver. Do you want to continue printing?"

Do you have any idea what this is related to? Do I need to install a Tracker print driver or add any additional parameters? If I click the "yes" button, the file converts and prints just fine. We are trying to automate a large number of conversions, and would like to not have this dialog box.

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue May 01, 2012 4:24 pm

I will need to check with one of the driver experts (developers); I have an inkling of what's going on but I want to be sure. I'll get back to you as soon as I can get some time with him (this morning).

-Walter

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue May 01, 2012 4:46 pm

Okay, what I suspected (and have confirmed) is that our driver is not a postscript driver, so you cannot send postscript commands (as they are ignored). Word's API is kindly informing you that our printer is not compatible, and so you'll have to work with that in order to determine how to ignore the embedded postscript without popping up a dialog box.

PostScript is basically a scripting / programming language that some printers use to print documents. The PRINT command is just a postscript command, which compatible printers can intepret. Our print driver is GDI based, not postscript based.

ase technologies · Post by **ase technologies** » Tue May 01, 2012 5:59 pm

Walter,
Thank you for your response. Do you or your driver experts know if there is a way around this? Can we somehow fool Word with the Tracker API? Might there be a way to simulate that you're another type of driver? or somehow not allow responses from Word on print driver types?

This is basically a deal breaker with our client as we are looking to install on an Enterprise server with millions of documents to convert and we cant have someone click "yes" every time.

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Tue May 01, 2012 6:22 pm

ase technologies wrote:Walter,
Thank you for your response. Do you or your driver experts know if there is a way around this? Can we somehow fool Word with the Tracker API? Might there be a way to simulate that you're another type of driver? or somehow not allow responses from Word on print driver types?

This is basically a deal breaker with our client as we are looking to install on an Enterprise server with millions of documents to convert and we cant have someone click "yes" every time.

I don't know offhand of a way to pretend to support postscript with our printer (nor do I think it would make any sense to implement this, because you generally don't want to pretend to have capabilities you don't); I would guess that the approach to take would be to go at the source (word itself, or the document). Perhaps there's a way to strip embedded postscript before printing? I'm not familiar with the MS Word API so you'd have to dig around there yourself. I can't imagine that they would not provide a way to do this, though.

You may be able to do it by defining a macro to strip field codes and then running this macro before printing, if there's no direct access through the word api. I don't know though.

I found this description of such a macro:
http://www.techrepublic.com/article/rem ... ro/6170792

PDFdriverAPI example project: Word to PDF ocr issue

PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue

Re: PDFdriverAPI example project: Word to PDF ocr issue