What SDK should we use?

A forum for questions or concerns related to the PDF-XChange Core API SDK

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

What SDK should we use?

Post by jvanlaethem »

We have the impression that your Core & Editor SDK's are the newest and that the idea is to have them replace all the other SDKs? However we find the Core & Editor SDK's not very well documented. We don't find any .Net C# samples for the Core SDK.
Also the product selection wizard doesn't include the Core & Editor SDK.

This is at a high level what we need to do.

We develop in .Net C#
We need to view PDF 's embedded in our own UI
We need to extract text from zones on a page
We need to extract images from zone an image
We need to convert image based PDFs to OCR-ed image+text PDFs
We need to be able to crop (real crop, not change the boundaries) pages in a PDF
We need to be able to convert TIFF, JPG and PNG files to PDF
We need to be able to merge and split PDF files
User avatar
Patrick-Tracker Supp
Site Admin
Posts: 1645
Joined: Thu Mar 27, 2014 6:14 pm
Location: Vancouver Island
Contact:

Re: What SDK should we use?

Post by Patrick-Tracker Supp »

Hello jvanlaethem,

Thank you for your consideration. The documentation for our new SDK's is admittedly a bit lacking right now. For your needs, it sounds like the Editor SDK is your only option.

Our new website is being launched soon so reflect the changes to our product line. As for examples and help- you have come to the right place. Though I am not a developer and cannot provide this, these forums are monitored by our development support team, who should get back to you within 24 hours or less (Monday to Friday) with answers to any questions you may have.

Cheers!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Cheers,

Patrick Charest
Tracker Support North America
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi jvanlaethem,

The Core SDK is a part of the Editor SDK. Like Patrick said, the Editor SDK is what you need - it will be possible to do those things you want with it. To read more technical information on the Editor SDK please visit this link https://www.pdf-xchange.com/editor-sdk-more-info.
As for the samples - there is a sample project for the Editor SDK on C# which you can find here https://github.com/tracker-software/PDF ... DKExamples.

HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Hello Sasha, Thank you for your help,

We've already been taking a look at the sample code of the Editor SDK. But we haven't found a way to disable all editing features.

Does the viewer control have a property to disable all editing features? We tried to hide the menu and the editing options in the ribbon, but some options we don't need still appear in the right-click menu: "Add bookmarks", "Properties"... We'd also like to avoid other editing related actions: 1) opening a PDF/A file shows a message asking to "Enable Editing" 2) hyper links still work 3) a message "At least one signature requires validating" may show...

From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. The selection would be used in one of the following ways:
1) to extract the text or image automatically
2) save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.

We'd also like to customize the icons, and use our own toolbar.

Jean

Maybe we should use this thread as the main discussion thread.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Maybe we should delete other topics? :wink:

For the disabling of the editing features look here:
https://forum.pdf-xchange.com/ ... 66&t=24163

As for the text extraction from the rectangle do experiment with this code:

Code: Select all

public bool HasIntersect(ref PDFXEdit.PXC_Rect r1, ref PDFXEdit.PXC_Rect r2)
{
       if ((r1.left >= r2.right) || (r1.right <= r2.left))
              return false;
       if ((r1.top <= r2.bottom) || (r1.bottom >= r2.top))
              return false;
       return true;
}

// obtain text that is covered by rcTest rectangle on the first page 

string textInTestBox = "";

PDFXEdit.PXC_Rect rcTest;
rcTest.top = 500;
rcTest.right = 300;
rcTest.left = rcTest.right - 200;
rcTest.bottom = rcTest.top - 100; 

PDFXEdit.IPXV_Document doc = pdfCtl.Doc;
PDFXEdit.IPXC_PageText text = doc.CoreDoc.Pages[0].GetText(null); 

const uint undefVal = 0xffffffff; 

uint cnt = text.CharCount;
uint lastLineIndex = undefVal;
uint lastCharIndex = undefVal;
for (uint i = 0; i < cnt; i++)
{
       PDFXEdit.PXC_Rect rcChar = text.CharRect[i];
       if (HasIntersect(ref rcChar, ref rcTest))
       {
              uint lineIndex = text.CharLineIndex[i];
              if (lastLineIndex != lineIndex)
              {
                      if (lastLineIndex != undefVal)
                             textInTestBox += "\r\n";
                      lastLineIndex = lineIndex;
              }
              else if (lastCharIndex != i)
              {
                      if (lastCharIndex != undefVal)
                      {
                            // uint cntSP = i - lastCharIndex;
                            // for (uint j = 0; j < cntSP; j++)
                                    textInTestBox += " ";
                      }
              } 

              lastCharIndex = i;
              textInTestBox += text.Char[i];
       }
}
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Thank you for your reply,

However two questions remained unanswered. Could you please answer these:

1) From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. We need to save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.

2) We'd also like to customize the icons, and use our own toolbar.

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi Jean,

As for the command customization do check this topic:
https://forum.pdf-xchange.com/ ... 66&t=24171

As for the first question - it would be nice if you could describe about what you need on source code level and divide it to logical parts. From what I see now is that for starters you'll need to create your own command that would turn on/off the possibility to draw that rectangle - am I right? For that you can check the previous topic. Then we'll go to the drawing on the view part. :wink:

HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Regarding the text extraction:
- with my sample files, page.GetDimension() returns the size in points. Is this always true? If no, how do I know the unit?
- text.CharRect[] doesn't seem to be in points. What is its unit? Does it always use the same unit?

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi Jean,

Both of these functions return points. Note that the PDF file has a coordinate system with Y axis from bottom to top. And bottom coordinate will not always be 0 - sometimes it can be a positive or negative number.
If you want further assistance please provide the code sample on how to recreate your problem and the file where the problem occurs.

HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Thank you Sasha,

How do I then know the value of the bottom of a page?
Is the left of a page always 0?

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi Jean,

Code: Select all

	//Getting desired page from document
	PDFXEdit.IPXC_Page pPage = pDoc.Pages[0];
	//Media box represents page's physical size
	PDFXEdit.PXC_Rect rcMedia = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_MediaBox);
	//Page box represents page's visible size (for example when we have cropped pages)
	PDFXEdit.PXC_Rect rcPageBox = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_PageBox);
The left of the page can also differ - it all depends on the page's boxes.

HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Hi Sasha,

In the sample text extraction code above, what is the page origin (bottom, left) used for text.CharRect[], i.e. what offset should be applied to rcTest? Is it rcMedia or rcPageBox?

Jean
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: What SDK should we use?

Post by Tracker Supp-Stefan »

Hello Jean,

Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory, so as Sasha said - please do provide sample code and a sample file so that we can test with the same input as yourself and see why the issues occur.

Regards,
Stefan
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Hi Stefan

I don't have any problem with a particular file. I thought that the unit of the text.CharRect[] might not be points. But I was wrong.

What I'm trying to do is a 2 step process:
1) A program showing a sample pdf and allowing the user to draw a selection rectangle on a page. The coordinates of that selection are then saved.
2) A program processing multiple pdf's unattended. The processing consists of extracting the text at the selection that was saved in 1). This is done by a background task, without UI.

Step 1) is implemented, by rasterizing the pages using PDFDoc.DrawPageToDC(...).

I'm still working on step 2 and I just need to apply the selection. For that, I need to use the same unit and origin as the values returned by page.GetText(null).CharRect[]. But the sample code that Sasha provided doesn't say anything about that. He then made it clear that the unit is points, just like in page.GetDimension(). But I'm confused after he wrote "bottom coordinate will not always be 0". This is why I asked "what is the page origin (bottom, left) used by text.CharRect[]?". I'm still confused after you added that "Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory".

How do I match a selection rectangle to the values returned by CharRect[]? Knowing this, I'd be able to extract the text at the specified selection.

I also have another question. My test program is based on your CSharp FullDemo. This is a UI based application, but my program retrieving the text has no UI. How could I retrieve the text of a page without UI control?

Thank you

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi jvanlaethem,

This is the PDF coordinate system:
Image

The page itself can be situated in different coordinates on the PDF coordinate grid:
Image

So the media box of the left page is starting at (80,87) point. Thus the coordinate of the left bottom of the text is approx. (90, 110) and right top is approx. (450, 650).
The CharRect[] is also in the PDF coordinate system. Your selection rectangle is a visible rectangle on page. To transform screen coordinates to PDF coordinates you'll need to use a ScreenToPage matrix.
The problem is that we don't have the DrawPageToDC in the Editor SDK or in Core API SDK - it's from our old SimpleViewerSDK. What SDK exactly are you using?
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

1) I understand that the origin of the page, as used by the coordinates returned by CharRect[], is always 0, 0. MediaBox is something else that doesn't need to be considered when parsing CharRect[].

2) PXCV_DrawPageToDC comes from pxcview.dll (PXCView36.sln).
What current SDK should I use to rasterize an image? How should I proceed? In its simplest form, it could look like:

Code: Select all

Bitmap GetPageBitmap(String fileName, UInt32 pageIndex)
{
    using PDFDoc pdf = new PDFDoc())
    {
        pdf.OpenDocFromPath(fileName);
        Bitmap bitmap = pdf.GetPageBitmap(pageIndex, 200);
        return bitmap;
    }
}
3) How could I retrieve the text of a page without using a UI control? What I need is something that looks like this:

Code: Select all

String GetPageSelectionText(String fileName, UInt32 pageIndex, RectF selection)
{
    using PDFDoc pdf = new PDFDoc())
    {
        pdf.OpenDocFromPath(fileName);
        Page page = pdf.GetPage(pageIndex);
        Double width, height;
        page.GetDimension(out width, out height);
        PageText text = page.GetText();
...the sample text extraction code that you provided earlier, where rcTest would be initialized from selection using width and height.
        return textInTestBox;
    }
}
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi jvanlaethem,

1) Depends on the rectangle that you need to draw. Note that pages in files can vary in sizes and Media Box placement. Due to that, you'll need to figure out what position of the drawn rectangle you should use when applying it to the multiple files. For example you can calculate displacements or proportions from the current page's Media Box and apply this results to the selection rectangle on other pages. All depends on what you need.

3) You don't need an UI control for that. To open a document check this topic
https://forum.pdf-xchange.com/ ... 67&t=24299
And then, having a core document, you can use the code described above.
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Sasha,

1) There are two different things:
- identifying the offset(s) to apply to the text extraction
- applying the(se) offset(s)

I can easily do the second part. For instance, in the sample code, applying rcPageBox to rcTest to extract the text located in the top half of a page would be:

Code: Select all

                Double width;
                Double height;
                page.GetDimension(out width, out height);
                PXC_Rect rcTest = new PXC_Rect
                {
                    top = height,
                    bottom = height / 2,
                    left = 0,
                    right = width,
                };
                PXC_Rect rcPageBox = page.get_Box(PXC_BoxType.PBox_PageBox);
                rcTest.top += rcPageBox.bottom;
                rcTest.left += rcPageBox.left;
                rcTest.bottom += rcPageBox.bottom;
                rcTest.right += rcPageBox.left;
My problem is more about knowing what offset(s) to apply to be sure the code works with any pdf: MediaBox, PageBox, ViewBox... What does your tool kit consider in text.CharRect?

2) Is there a way with any of your current SDK's to rasterize an image?

3) Is implemented, thank you.

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

1) You can apply offsets for example from the PageBox boundaries (meaning delta offsets from the real visible part of the page so you can use these deltas in different pages size - the rectangle itself will change size depending on the page size). Or maybe you don't need proportional resize - then you'll need to recalculate the rectangle depending on the MediaBox coordinate (for example, you can save offsets from left-top coordinate of the MediaBox and then just move whole rectangle not resize it).
In short the difference between PageBox and MediaBox is that the PageBox is in general an intersection between CropBox and MediaBox meaning the visible part of the page.

2) It took me quite some time but I've implemented the GetBitmap function for you. It will include all of the pages rotations, DPI and Zoom. Though it uses several other functions for calculations.

Code: Select all

		void UpdateMinMax(double v, ref double vMin, ref double vMax)
		{
			if (vMin > v)
				vMin = v;
			if (vMax < v)
				vMax = v;
		}

		private void Transform(PDFXEdit.PXC_Matrix m1, ref double x, ref double y)
		{
			double tx = x;
			x = (tx * m1.a + y * m1.c + m1.e);
			y = (tx * m1.b + y * m1.d + m1.f);
		}

		private void TransformRect(PDFXEdit.PXC_Matrix m1, ref PDFXEdit.PXC_Rect rcPageBox)
		{
			double x = rcPageBox.left;
			double y = rcPageBox.bottom;
			Transform(m1, ref x, ref y);
			double x1 = x;
			double x2 = x;
			double y1 = y;
			double y2 = y;
			x = rcPageBox.left;
			y = rcPageBox.top;
			Transform(m1, ref x, ref y);
			UpdateMinMax(x, ref x1, ref x2);
			UpdateMinMax(y, ref y1, ref y2);
			x = rcPageBox.right;
			y = rcPageBox.top;
			Transform(m1, ref x, ref y);
			UpdateMinMax(x, ref x1, ref x2);
			UpdateMinMax(y, ref y1, ref y2);
			x = rcPageBox.right;
			y = rcPageBox.bottom;
			Transform(m1, ref x, ref y);
			UpdateMinMax(x, ref x1, ref x2);
			UpdateMinMax(y, ref y1, ref y2);
			rcPageBox.left = x1;
			rcPageBox.right = x2;
			rcPageBox.bottom = y1;
			rcPageBox.top = y2;
		}

		private PDFXEdit.PXC_Matrix Multiply(PDFXEdit.PXC_Matrix m1, PDFXEdit.PXC_Matrix m2)
		{
			double t0 = (double)(m1.a * m2.a + m1.b * m2.c);
			double t2 = (double)(m1.c * m2.a + m1.d * m2.c);
			double t4 = (double)(m1.e * m2.a + m1.f * m2.c + m2.e);
			m1.b = (double)(m1.a * m2.b + m1.b * m2.d);
			m1.d = (double)(m1.c * m2.b + m1.d * m2.d);
			m1.f = (double)(m1.e * m2.b + m1.f * m2.d + m2.f);
			m1.a = t0;
			m1.c = t2;
			m1.e = t4;
			return m1;
		}

		private Bitmap GetBitmap(PDFXEdit.IPXC_Document pDoc, UInt32 nPageNumber)
		{
			//DPI of the resulting bitmap
			const double cDPI = 96.0; //96 DPI
			//Zoom of the resulting bitmap
			const double cZoom = 1.5; //150%

			PDFXEdit.IPXC_Page pPage = pDoc.Pages[nPageNumber];
			PDFXEdit.PXC_Rect rcPageBox = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_PageBox);
			//Applying page matrix to page box
			TransformRect(pPage.Matrix, ref rcPageBox);
			//Page proportions in pt
			double nPageWidth = rcPageBox.right - rcPageBox.left;
			double nPageHeight = rcPageBox.top - rcPageBox.bottom;
			//Getting image proportions in px
			int nImageWidth = (int)Math.Round((nPageWidth / 72.0) * cDPI * cZoom);
			int nImageHeight = (int)Math.Round((nPageHeight / 72.0) * cDPI * cZoom);

			PDFXEdit.tagRECT rcImgRect;
			rcImgRect.left = 0;
			rcImgRect.right = nImageWidth;
			rcImgRect.top = 0;
			rcImgRect.bottom = nImageHeight;

			int nStride = nImageWidth * 4;
			int nSize = nStride * nImageHeight;
			//Allocating buffer for our new bitmap
			IntPtr pBuffer = Marshal.AllocHGlobal(nSize);
			double nKX = nImageWidth / nPageWidth;
			double nKY = nImageHeight / nPageHeight;
			//PDF Matrix for zoom and flip
			PDFXEdit.PXC_Matrix mZoomAndFlipMatrix;
			mZoomAndFlipMatrix.a = nKX;
			mZoomAndFlipMatrix.b = 0;
			mZoomAndFlipMatrix.c = 0;
			mZoomAndFlipMatrix.d = -nKY;
			mZoomAndFlipMatrix.e = 0;
			mZoomAndFlipMatrix.f = nImageHeight;
			//Now we need to multiply it to the page matrix that has rotation included
			PDFXEdit.PXC_Matrix mResMatrix = Multiply(pPage.Matrix, mZoomAndFlipMatrix);

			//Drawing page to memory buffer
			pPage.DrawToMemory(pBuffer, nStride, PDFXEdit.PXC_DrawFormat.kDrawFormat_BGRA, ref rcImgRect, ref mResMatrix);
			//Creating new bitmap from buffer
			Bitmap bmp = new Bitmap(nImageWidth, nImageHeight, nStride, System.Drawing.Imaging.PixelFormat.Format32bppArgb, pBuffer);
			return bmp;
		}
3) Glad to hear that

HTH =)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

We now have the PDF viewer with the custom icons and extra features we need: selection tool, highlight,... and the text extraction running unattended.

Thank you very much, Sasha and Stefan, for your time and help.

FYI, using another dpi than 96 requires to set the resolution of the bitmap to be consistent:

Code: Select all

Bitmap bmp = new Bitmap(nImageWidth, nImageHeight, nStride, System.Drawing.Imaging.PixelFormat.Format32bppArgb, pBuffer);
[b]bmp.SetResolution(cDPI, cDPI);[/b]
return bmp;
Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello Jean,

Glad to hear that and thanks for the code update :wink:

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Sasha,

Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hi Jean,

Currently the OCR API is being drastically changed so the functionality that you need will be available in the nearest future builds.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Thank you, please let me know when the sdk is available for download.

With your SDK, how could I convert pdf files to pdf/a?

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
capturebites bvba
User
Posts: 7
Joined: Wed Sep 02, 2015 10:41 am

Re: What SDK should we use?

Post by capturebites bvba »

Hi Sasha,

Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.

Do you have any update on that?

The original question was:

Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: What SDK should we use?

Post by Tracker Supp-Stefan »

Hello capturebites bvba,

We are planning to get the new OCR SDK module (based on the Editor) out for build 318 if all goes well (317 is planned for release next Monday).

Please note that the new OCR SDK will only work with an Editor SDK license. We do have a current OCR SDK package, that you can use if you have an Editor or PRO SDK license, but please note that this is a bit older and different than what the new OCR will be.

Regards,
Stefan
capturebites bvba
User
Posts: 7
Joined: Wed Sep 02, 2015 10:41 am

Re: What SDK should we use?

Post by capturebites bvba »

Can you tell me when build 318 with the new OCR SDK module will be available?
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: What SDK should we use?

Post by Tracker Supp-Stefan »

Hello capturebites bvba,

We do not have a specific date for this yet - but our builds are usually 8-12 weeks apart.

Regards,
Stefan
Tom Princen
User
Posts: 83
Joined: Wed Mar 25, 2015 10:15 am

Re: What SDK should we use?

Post by Tom Princen »

is OCR allready available in the core API?
and export to PDF/a?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello Tom,

The OCR is not available in the Core API and won't be.
The export to PDF/A is also unavailable - only the PDF\A creation is possible (as a new document) - that is described in one of the forum topics.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Tom Princen
User
Posts: 83
Joined: Wed Mar 25, 2015 10:15 am

Re: What SDK should we use?

Post by Tom Princen »

do you have the link?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
capturebites bvba
User
Posts: 7
Joined: Wed Sep 02, 2015 10:41 am

Re: What SDK should we use?

Post by capturebites bvba »

Hi Sasha,

Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.

Do you have any update on that?

The original question was:

Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello capturebites bvba,

The new OCR SDK has not been yet published.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
capturebites bvba
User
Posts: 7
Joined: Wed Sep 02, 2015 10:41 am

Re: What SDK should we use?

Post by capturebites bvba »

When is it planned for?
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: What SDK should we use?

Post by Tracker Supp-Stefan »

Hello capturebites bvba,

For now the plan is to release the new OCR SDK with the next build (321) of our products.
While internally it will be based on different code - the available methods will at first be exactly the same as in the current one, so you can download and start working with the existing one, and even develop with it - and with the next build the transition to the new OCR SDK will be pretty straight forward.

Regards,
Stefan
capturebites bvba
User
Posts: 7
Joined: Wed Sep 02, 2015 10:41 am

Re: What SDK should we use?

Post by capturebites bvba »

I suppose, we are back to the original question then:

Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
This would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: What SDK should we use?

Post by Tracker Supp-Stefan »

Hello capturebites bvba,

Please download and install this package:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
And then you will find the OCR sample projects under.
Pro SDK installation folder\Examples\OcrSDKExamples\C#Examples

Regards,
Stefan
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Sasha,

I need to also apply a rotation of a multiple of 90° to the image. How could this be done with the zoom and flip matrix?

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello jvanlaethem,

What SDK are you using?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

We are currently using the core api.

I just hoped that changing some setting in the matrix in the code for GetBitmap you provided earlier might do the trick.

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello Jean,

Check the https://sdkhelp.pdf-xchange.com/vie ... MathHelper interface of the auxInst that contains methods to work with matrices.
Also, to understand matrices more, read the PDF Specification (these chapters):
8.3 Coordinate System
8.4 Graphics State

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Sasha,

I don't see how to call MathHelper.Matrix_Rotate() from the root IPXC_Inst.

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello Jean,

Please read my post again:
interface of the auxInst
And also look at wiki:
Capture8.PNG
Capture8.PNG (8.2 KiB) Viewed 11919 times
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
jvanlaethem
User
Posts: 29
Joined: Thu Oct 29, 2015 3:16 pm

Re: What SDK should we use?

Post by jvanlaethem »

Sasha,

Thank you. So let's come back to the question: "I need to also apply a rotation of a multiple of 90° to the image." How should the sample code of your post #19 be updated to handle this?

Jean
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: What SDK should we use?

Post by Sasha - Tracker Dev Team »

Hello Jean,

Have you tried experimenting on your own - we don't do a direct coding as a part of a support. We can help or advice though if you are struggling with something.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply