What SDK should we use?
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
What SDK should we use?
We have the impression that your Core & Editor SDK's are the newest and that the idea is to have them replace all the other SDKs? However we find the Core & Editor SDK's not very well documented. We don't find any .Net C# samples for the Core SDK.
Also the product selection wizard doesn't include the Core & Editor SDK.
This is at a high level what we need to do.
We develop in .Net C#
We need to view PDF 's embedded in our own UI
We need to extract text from zones on a page
We need to extract images from zone an image
We need to convert image based PDFs to OCR-ed image+text PDFs
We need to be able to crop (real crop, not change the boundaries) pages in a PDF
We need to be able to convert TIFF, JPG and PNG files to PDF
We need to be able to merge and split PDF files
Also the product selection wizard doesn't include the Core & Editor SDK.
This is at a high level what we need to do.
We develop in .Net C#
We need to view PDF 's embedded in our own UI
We need to extract text from zones on a page
We need to extract images from zone an image
We need to convert image based PDFs to OCR-ed image+text PDFs
We need to be able to crop (real crop, not change the boundaries) pages in a PDF
We need to be able to convert TIFF, JPG and PNG files to PDF
We need to be able to merge and split PDF files
- Patrick-Tracker Supp
- Site Admin
- Posts: 1645
- Joined: Thu Mar 27, 2014 6:14 pm
- Location: Vancouver Island
- Contact:
Re: What SDK should we use?
Hello jvanlaethem,
Thank you for your consideration. The documentation for our new SDK's is admittedly a bit lacking right now. For your needs, it sounds like the Editor SDK is your only option.
Our new website is being launched soon so reflect the changes to our product line. As for examples and help- you have come to the right place. Though I am not a developer and cannot provide this, these forums are monitored by our development support team, who should get back to you within 24 hours or less (Monday to Friday) with answers to any questions you may have.
Cheers!
Thank you for your consideration. The documentation for our new SDK's is admittedly a bit lacking right now. For your needs, it sounds like the Editor SDK is your only option.
Our new website is being launched soon so reflect the changes to our product line. As for examples and help- you have come to the right place. Though I am not a developer and cannot provide this, these forums are monitored by our development support team, who should get back to you within 24 hours or less (Monday to Friday) with answers to any questions you may have.
Cheers!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi jvanlaethem,
The Core SDK is a part of the Editor SDK. Like Patrick said, the Editor SDK is what you need - it will be possible to do those things you want with it. To read more technical information on the Editor SDK please visit this link https://www.pdf-xchange.com/editor-sdk-more-info.
As for the samples - there is a sample project for the Editor SDK on C# which you can find here https://github.com/tracker-software/PDF ... DKExamples.
HTH
The Core SDK is a part of the Editor SDK. Like Patrick said, the Editor SDK is what you need - it will be possible to do those things you want with it. To read more technical information on the Editor SDK please visit this link https://www.pdf-xchange.com/editor-sdk-more-info.
As for the samples - there is a sample project for the Editor SDK on C# which you can find here https://github.com/tracker-software/PDF ... DKExamples.
HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Hello Sasha, Thank you for your help,
We've already been taking a look at the sample code of the Editor SDK. But we haven't found a way to disable all editing features.
Does the viewer control have a property to disable all editing features? We tried to hide the menu and the editing options in the ribbon, but some options we don't need still appear in the right-click menu: "Add bookmarks", "Properties"... We'd also like to avoid other editing related actions: 1) opening a PDF/A file shows a message asking to "Enable Editing" 2) hyper links still work 3) a message "At least one signature requires validating" may show...
From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. The selection would be used in one of the following ways:
1) to extract the text or image automatically
2) save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.
We'd also like to customize the icons, and use our own toolbar.
Jean
Maybe we should use this thread as the main discussion thread.
We've already been taking a look at the sample code of the Editor SDK. But we haven't found a way to disable all editing features.
Does the viewer control have a property to disable all editing features? We tried to hide the menu and the editing options in the ribbon, but some options we don't need still appear in the right-click menu: "Add bookmarks", "Properties"... We'd also like to avoid other editing related actions: 1) opening a PDF/A file shows a message asking to "Enable Editing" 2) hyper links still work 3) a message "At least one signature requires validating" may show...
From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. The selection would be used in one of the following ways:
1) to extract the text or image automatically
2) save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.
We'd also like to customize the icons, and use our own toolbar.
Jean
Maybe we should use this thread as the main discussion thread.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Maybe we should delete other topics?
For the disabling of the editing features look here:
https://forum.pdf-xchange.com/ ... 66&t=24163
As for the text extraction from the rectangle do experiment with this code:
For the disabling of the editing features look here:
https://forum.pdf-xchange.com/ ... 66&t=24163
As for the text extraction from the rectangle do experiment with this code:
Code: Select all
public bool HasIntersect(ref PDFXEdit.PXC_Rect r1, ref PDFXEdit.PXC_Rect r2)
{
if ((r1.left >= r2.right) || (r1.right <= r2.left))
return false;
if ((r1.top <= r2.bottom) || (r1.bottom >= r2.top))
return false;
return true;
}
// obtain text that is covered by rcTest rectangle on the first page
string textInTestBox = "";
PDFXEdit.PXC_Rect rcTest;
rcTest.top = 500;
rcTest.right = 300;
rcTest.left = rcTest.right - 200;
rcTest.bottom = rcTest.top - 100;
PDFXEdit.IPXV_Document doc = pdfCtl.Doc;
PDFXEdit.IPXC_PageText text = doc.CoreDoc.Pages[0].GetText(null);
const uint undefVal = 0xffffffff;
uint cnt = text.CharCount;
uint lastLineIndex = undefVal;
uint lastCharIndex = undefVal;
for (uint i = 0; i < cnt; i++)
{
PDFXEdit.PXC_Rect rcChar = text.CharRect[i];
if (HasIntersect(ref rcChar, ref rcTest))
{
uint lineIndex = text.CharLineIndex[i];
if (lastLineIndex != lineIndex)
{
if (lastLineIndex != undefVal)
textInTestBox += "\r\n";
lastLineIndex = lineIndex;
}
else if (lastCharIndex != i)
{
if (lastCharIndex != undefVal)
{
// uint cntSP = i - lastCharIndex;
// for (uint j = 0; j < cntSP; j++)
textInTestBox += " ";
}
}
lastCharIndex = i;
textInTestBox += text.Char[i];
}
}
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Thank you for your reply,
However two questions remained unanswered. Could you please answer these:
1) From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. We need to save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.
2) We'd also like to customize the icons, and use our own toolbar.
Jean
However two questions remained unanswered. Could you please answer these:
1) From a UI point of view, we need a viewer, where we'd also like to let the user draw a selection rectangle, similarly to the Snapshot tool. We need to save the rectangle coordinates, show it on some other pages while browsing the document, and process multiple batches of documents unattended.
2) We'd also like to customize the icons, and use our own toolbar.
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi Jean,
As for the command customization do check this topic:
https://forum.pdf-xchange.com/ ... 66&t=24171
As for the first question - it would be nice if you could describe about what you need on source code level and divide it to logical parts. From what I see now is that for starters you'll need to create your own command that would turn on/off the possibility to draw that rectangle - am I right? For that you can check the previous topic. Then we'll go to the drawing on the view part.
HTH
As for the command customization do check this topic:
https://forum.pdf-xchange.com/ ... 66&t=24171
As for the first question - it would be nice if you could describe about what you need on source code level and divide it to logical parts. From what I see now is that for starters you'll need to create your own command that would turn on/off the possibility to draw that rectangle - am I right? For that you can check the previous topic. Then we'll go to the drawing on the view part.
HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Regarding the text extraction:
- with my sample files, page.GetDimension() returns the size in points. Is this always true? If no, how do I know the unit?
- text.CharRect[] doesn't seem to be in points. What is its unit? Does it always use the same unit?
Jean
- with my sample files, page.GetDimension() returns the size in points. Is this always true? If no, how do I know the unit?
- text.CharRect[] doesn't seem to be in points. What is its unit? Does it always use the same unit?
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi Jean,
Both of these functions return points. Note that the PDF file has a coordinate system with Y axis from bottom to top. And bottom coordinate will not always be 0 - sometimes it can be a positive or negative number.
If you want further assistance please provide the code sample on how to recreate your problem and the file where the problem occurs.
HTH
Both of these functions return points. Note that the PDF file has a coordinate system with Y axis from bottom to top. And bottom coordinate will not always be 0 - sometimes it can be a positive or negative number.
If you want further assistance please provide the code sample on how to recreate your problem and the file where the problem occurs.
HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Thank you Sasha,
How do I then know the value of the bottom of a page?
Is the left of a page always 0?
Jean
How do I then know the value of the bottom of a page?
Is the left of a page always 0?
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi Jean,
The left of the page can also differ - it all depends on the page's boxes.
HTH
Code: Select all
//Getting desired page from document
PDFXEdit.IPXC_Page pPage = pDoc.Pages[0];
//Media box represents page's physical size
PDFXEdit.PXC_Rect rcMedia = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_MediaBox);
//Page box represents page's visible size (for example when we have cropped pages)
PDFXEdit.PXC_Rect rcPageBox = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_PageBox);
HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Hi Sasha,
In the sample text extraction code above, what is the page origin (bottom, left) used for text.CharRect[], i.e. what offset should be applied to rcTest? Is it rcMedia or rcPageBox?
Jean
In the sample text extraction code above, what is the page origin (bottom, left) used for text.CharRect[], i.e. what offset should be applied to rcTest? Is it rcMedia or rcPageBox?
Jean
- Tracker Supp-Stefan
- Site Admin
- Posts: 17824
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: What SDK should we use?
Hello Jean,
Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory, so as Sasha said - please do provide sample code and a sample file so that we can test with the same input as yourself and see why the issues occur.
Regards,
Stefan
Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory, so as Sasha said - please do provide sample code and a sample file so that we can test with the same input as yourself and see why the issues occur.
Regards,
Stefan
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Hi Stefan
I don't have any problem with a particular file. I thought that the unit of the text.CharRect[] might not be points. But I was wrong.
What I'm trying to do is a 2 step process:
1) A program showing a sample pdf and allowing the user to draw a selection rectangle on a page. The coordinates of that selection are then saved.
2) A program processing multiple pdf's unattended. The processing consists of extracting the text at the selection that was saved in 1). This is done by a background task, without UI.
Step 1) is implemented, by rasterizing the pages using PDFDoc.DrawPageToDC(...).
I'm still working on step 2 and I just need to apply the selection. For that, I need to use the same unit and origin as the values returned by page.GetText(null).CharRect[]. But the sample code that Sasha provided doesn't say anything about that. He then made it clear that the unit is points, just like in page.GetDimension(). But I'm confused after he wrote "bottom coordinate will not always be 0". This is why I asked "what is the page origin (bottom, left) used by text.CharRect[]?". I'm still confused after you added that "Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory".
How do I match a selection rectangle to the values returned by CharRect[]? Knowing this, I'd be able to extract the text at the specified selection.
I also have another question. My test program is based on your CSharp FullDemo. This is a UI based application, but my program retrieving the text has no UI. How could I retrieve the text of a page without UI control?
Thank you
Jean
I don't have any problem with a particular file. I thought that the unit of the text.CharRect[] might not be points. But I was wrong.
What I'm trying to do is a 2 step process:
1) A program showing a sample pdf and allowing the user to draw a selection rectangle on a page. The coordinates of that selection are then saved.
2) A program processing multiple pdf's unattended. The processing consists of extracting the text at the selection that was saved in 1). This is done by a background task, without UI.
Step 1) is implemented, by rasterizing the pages using PDFDoc.DrawPageToDC(...).
I'm still working on step 2 and I just need to apply the selection. For that, I need to use the same unit and origin as the values returned by page.GetText(null).CharRect[]. But the sample code that Sasha provided doesn't say anything about that. He then made it clear that the unit is points, just like in page.GetDimension(). But I'm confused after he wrote "bottom coordinate will not always be 0". This is why I asked "what is the page origin (bottom, left) used by text.CharRect[]?". I'm still confused after you added that "Normally it's the lower left corner of the page that is with 0,0 coordinates but that is not mandatory".
How do I match a selection rectangle to the values returned by CharRect[]? Knowing this, I'd be able to extract the text at the specified selection.
I also have another question. My test program is based on your CSharp FullDemo. This is a UI based application, but my program retrieving the text has no UI. How could I retrieve the text of a page without UI control?
Thank you
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi jvanlaethem,
This is the PDF coordinate system:
The page itself can be situated in different coordinates on the PDF coordinate grid:
So the media box of the left page is starting at (80,87) point. Thus the coordinate of the left bottom of the text is approx. (90, 110) and right top is approx. (450, 650).
The CharRect[] is also in the PDF coordinate system. Your selection rectangle is a visible rectangle on page. To transform screen coordinates to PDF coordinates you'll need to use a ScreenToPage matrix.
The problem is that we don't have the DrawPageToDC in the Editor SDK or in Core API SDK - it's from our old SimpleViewerSDK. What SDK exactly are you using?
This is the PDF coordinate system:
The page itself can be situated in different coordinates on the PDF coordinate grid:
So the media box of the left page is starting at (80,87) point. Thus the coordinate of the left bottom of the text is approx. (90, 110) and right top is approx. (450, 650).
The CharRect[] is also in the PDF coordinate system. Your selection rectangle is a visible rectangle on page. To transform screen coordinates to PDF coordinates you'll need to use a ScreenToPage matrix.
The problem is that we don't have the DrawPageToDC in the Editor SDK or in Core API SDK - it's from our old SimpleViewerSDK. What SDK exactly are you using?
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
1) I understand that the origin of the page, as used by the coordinates returned by CharRect[], is always 0, 0. MediaBox is something else that doesn't need to be considered when parsing CharRect[].
2) PXCV_DrawPageToDC comes from pxcview.dll (PXCView36.sln).
What current SDK should I use to rasterize an image? How should I proceed? In its simplest form, it could look like:
3) How could I retrieve the text of a page without using a UI control? What I need is something that looks like this:
2) PXCV_DrawPageToDC comes from pxcview.dll (PXCView36.sln).
What current SDK should I use to rasterize an image? How should I proceed? In its simplest form, it could look like:
Code: Select all
Bitmap GetPageBitmap(String fileName, UInt32 pageIndex)
{
using PDFDoc pdf = new PDFDoc())
{
pdf.OpenDocFromPath(fileName);
Bitmap bitmap = pdf.GetPageBitmap(pageIndex, 200);
return bitmap;
}
}
Code: Select all
String GetPageSelectionText(String fileName, UInt32 pageIndex, RectF selection)
{
using PDFDoc pdf = new PDFDoc())
{
pdf.OpenDocFromPath(fileName);
Page page = pdf.GetPage(pageIndex);
Double width, height;
page.GetDimension(out width, out height);
PageText text = page.GetText();
...the sample text extraction code that you provided earlier, where rcTest would be initialized from selection using width and height.
return textInTestBox;
}
}
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi jvanlaethem,
1) Depends on the rectangle that you need to draw. Note that pages in files can vary in sizes and Media Box placement. Due to that, you'll need to figure out what position of the drawn rectangle you should use when applying it to the multiple files. For example you can calculate displacements or proportions from the current page's Media Box and apply this results to the selection rectangle on other pages. All depends on what you need.
3) You don't need an UI control for that. To open a document check this topic
https://forum.pdf-xchange.com/ ... 67&t=24299
And then, having a core document, you can use the code described above.
1) Depends on the rectangle that you need to draw. Note that pages in files can vary in sizes and Media Box placement. Due to that, you'll need to figure out what position of the drawn rectangle you should use when applying it to the multiple files. For example you can calculate displacements or proportions from the current page's Media Box and apply this results to the selection rectangle on other pages. All depends on what you need.
3) You don't need an UI control for that. To open a document check this topic
https://forum.pdf-xchange.com/ ... 67&t=24299
And then, having a core document, you can use the code described above.
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Sasha,
1) There are two different things:
- identifying the offset(s) to apply to the text extraction
- applying the(se) offset(s)
I can easily do the second part. For instance, in the sample code, applying rcPageBox to rcTest to extract the text located in the top half of a page would be:
My problem is more about knowing what offset(s) to apply to be sure the code works with any pdf: MediaBox, PageBox, ViewBox... What does your tool kit consider in text.CharRect?
2) Is there a way with any of your current SDK's to rasterize an image?
3) Is implemented, thank you.
Jean
1) There are two different things:
- identifying the offset(s) to apply to the text extraction
- applying the(se) offset(s)
I can easily do the second part. For instance, in the sample code, applying rcPageBox to rcTest to extract the text located in the top half of a page would be:
Code: Select all
Double width;
Double height;
page.GetDimension(out width, out height);
PXC_Rect rcTest = new PXC_Rect
{
top = height,
bottom = height / 2,
left = 0,
right = width,
};
PXC_Rect rcPageBox = page.get_Box(PXC_BoxType.PBox_PageBox);
rcTest.top += rcPageBox.bottom;
rcTest.left += rcPageBox.left;
rcTest.bottom += rcPageBox.bottom;
rcTest.right += rcPageBox.left;
2) Is there a way with any of your current SDK's to rasterize an image?
3) Is implemented, thank you.
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
1) You can apply offsets for example from the PageBox boundaries (meaning delta offsets from the real visible part of the page so you can use these deltas in different pages size - the rectangle itself will change size depending on the page size). Or maybe you don't need proportional resize - then you'll need to recalculate the rectangle depending on the MediaBox coordinate (for example, you can save offsets from left-top coordinate of the MediaBox and then just move whole rectangle not resize it).
In short the difference between PageBox and MediaBox is that the PageBox is in general an intersection between CropBox and MediaBox meaning the visible part of the page.
2) It took me quite some time but I've implemented the GetBitmap function for you. It will include all of the pages rotations, DPI and Zoom. Though it uses several other functions for calculations.
3) Glad to hear that
HTH =)
In short the difference between PageBox and MediaBox is that the PageBox is in general an intersection between CropBox and MediaBox meaning the visible part of the page.
2) It took me quite some time but I've implemented the GetBitmap function for you. It will include all of the pages rotations, DPI and Zoom. Though it uses several other functions for calculations.
Code: Select all
void UpdateMinMax(double v, ref double vMin, ref double vMax)
{
if (vMin > v)
vMin = v;
if (vMax < v)
vMax = v;
}
private void Transform(PDFXEdit.PXC_Matrix m1, ref double x, ref double y)
{
double tx = x;
x = (tx * m1.a + y * m1.c + m1.e);
y = (tx * m1.b + y * m1.d + m1.f);
}
private void TransformRect(PDFXEdit.PXC_Matrix m1, ref PDFXEdit.PXC_Rect rcPageBox)
{
double x = rcPageBox.left;
double y = rcPageBox.bottom;
Transform(m1, ref x, ref y);
double x1 = x;
double x2 = x;
double y1 = y;
double y2 = y;
x = rcPageBox.left;
y = rcPageBox.top;
Transform(m1, ref x, ref y);
UpdateMinMax(x, ref x1, ref x2);
UpdateMinMax(y, ref y1, ref y2);
x = rcPageBox.right;
y = rcPageBox.top;
Transform(m1, ref x, ref y);
UpdateMinMax(x, ref x1, ref x2);
UpdateMinMax(y, ref y1, ref y2);
x = rcPageBox.right;
y = rcPageBox.bottom;
Transform(m1, ref x, ref y);
UpdateMinMax(x, ref x1, ref x2);
UpdateMinMax(y, ref y1, ref y2);
rcPageBox.left = x1;
rcPageBox.right = x2;
rcPageBox.bottom = y1;
rcPageBox.top = y2;
}
private PDFXEdit.PXC_Matrix Multiply(PDFXEdit.PXC_Matrix m1, PDFXEdit.PXC_Matrix m2)
{
double t0 = (double)(m1.a * m2.a + m1.b * m2.c);
double t2 = (double)(m1.c * m2.a + m1.d * m2.c);
double t4 = (double)(m1.e * m2.a + m1.f * m2.c + m2.e);
m1.b = (double)(m1.a * m2.b + m1.b * m2.d);
m1.d = (double)(m1.c * m2.b + m1.d * m2.d);
m1.f = (double)(m1.e * m2.b + m1.f * m2.d + m2.f);
m1.a = t0;
m1.c = t2;
m1.e = t4;
return m1;
}
private Bitmap GetBitmap(PDFXEdit.IPXC_Document pDoc, UInt32 nPageNumber)
{
//DPI of the resulting bitmap
const double cDPI = 96.0; //96 DPI
//Zoom of the resulting bitmap
const double cZoom = 1.5; //150%
PDFXEdit.IPXC_Page pPage = pDoc.Pages[nPageNumber];
PDFXEdit.PXC_Rect rcPageBox = pPage.get_Box(PDFXEdit.PXC_BoxType.PBox_PageBox);
//Applying page matrix to page box
TransformRect(pPage.Matrix, ref rcPageBox);
//Page proportions in pt
double nPageWidth = rcPageBox.right - rcPageBox.left;
double nPageHeight = rcPageBox.top - rcPageBox.bottom;
//Getting image proportions in px
int nImageWidth = (int)Math.Round((nPageWidth / 72.0) * cDPI * cZoom);
int nImageHeight = (int)Math.Round((nPageHeight / 72.0) * cDPI * cZoom);
PDFXEdit.tagRECT rcImgRect;
rcImgRect.left = 0;
rcImgRect.right = nImageWidth;
rcImgRect.top = 0;
rcImgRect.bottom = nImageHeight;
int nStride = nImageWidth * 4;
int nSize = nStride * nImageHeight;
//Allocating buffer for our new bitmap
IntPtr pBuffer = Marshal.AllocHGlobal(nSize);
double nKX = nImageWidth / nPageWidth;
double nKY = nImageHeight / nPageHeight;
//PDF Matrix for zoom and flip
PDFXEdit.PXC_Matrix mZoomAndFlipMatrix;
mZoomAndFlipMatrix.a = nKX;
mZoomAndFlipMatrix.b = 0;
mZoomAndFlipMatrix.c = 0;
mZoomAndFlipMatrix.d = -nKY;
mZoomAndFlipMatrix.e = 0;
mZoomAndFlipMatrix.f = nImageHeight;
//Now we need to multiply it to the page matrix that has rotation included
PDFXEdit.PXC_Matrix mResMatrix = Multiply(pPage.Matrix, mZoomAndFlipMatrix);
//Drawing page to memory buffer
pPage.DrawToMemory(pBuffer, nStride, PDFXEdit.PXC_DrawFormat.kDrawFormat_BGRA, ref rcImgRect, ref mResMatrix);
//Creating new bitmap from buffer
Bitmap bmp = new Bitmap(nImageWidth, nImageHeight, nStride, System.Drawing.Imaging.PixelFormat.Format32bppArgb, pBuffer);
return bmp;
}
HTH =)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
We now have the PDF viewer with the custom icons and extra features we need: selection tool, highlight,... and the text extraction running unattended.
Thank you very much, Sasha and Stefan, for your time and help.
FYI, using another dpi than 96 requires to set the resolution of the bitmap to be consistent:
Jean
Thank you very much, Sasha and Stefan, for your time and help.
FYI, using another dpi than 96 requires to set the resolution of the bitmap to be consistent:
Code: Select all
Bitmap bmp = new Bitmap(nImageWidth, nImageHeight, nStride, System.Drawing.Imaging.PixelFormat.Format32bppArgb, pBuffer);
[b]bmp.SetResolution(cDPI, cDPI);[/b]
return bmp;
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello Jean,
Glad to hear that and thanks for the code update
Cheers,
Alex
Glad to hear that and thanks for the code update
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Sasha,
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Jean
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hi Jean,
Currently the OCR API is being drastically changed so the functionality that you need will be available in the nearest future builds.
Cheers,
Alex
Currently the OCR API is being drastically changed so the functionality that you need will be available in the nearest future builds.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Thank you, please let me know when the sdk is available for download.
With your SDK, how could I convert pdf files to pdf/a?
Jean
With your SDK, how could I convert pdf files to pdf/a?
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 7
- Joined: Wed Sep 02, 2015 10:41 am
Re: What SDK should we use?
Hi Sasha,
Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.
Do you have any update on that?
The original question was:
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.
Do you have any update on that?
The original question was:
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
- Tracker Supp-Stefan
- Site Admin
- Posts: 17824
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: What SDK should we use?
Hello capturebites bvba,
We are planning to get the new OCR SDK module (based on the Editor) out for build 318 if all goes well (317 is planned for release next Monday).
Please note that the new OCR SDK will only work with an Editor SDK license. We do have a current OCR SDK package, that you can use if you have an Editor or PRO SDK license, but please note that this is a bit older and different than what the new OCR will be.
Regards,
Stefan
We are planning to get the new OCR SDK module (based on the Editor) out for build 318 if all goes well (317 is planned for release next Monday).
Please note that the new OCR SDK will only work with an Editor SDK license. We do have a current OCR SDK package, that you can use if you have an Editor or PRO SDK license, but please note that this is a bit older and different than what the new OCR will be.
Regards,
Stefan
-
- User
- Posts: 7
- Joined: Wed Sep 02, 2015 10:41 am
Re: What SDK should we use?
Can you tell me when build 318 with the new OCR SDK module will be available?
- Tracker Supp-Stefan
- Site Admin
- Posts: 17824
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: What SDK should we use?
Hello capturebites bvba,
We do not have a specific date for this yet - but our builds are usually 8-12 weeks apart.
Regards,
Stefan
We do not have a specific date for this yet - but our builds are usually 8-12 weeks apart.
Regards,
Stefan
-
- User
- Posts: 83
- Joined: Wed Mar 25, 2015 10:15 am
Re: What SDK should we use?
is OCR allready available in the core API?
and export to PDF/a?
and export to PDF/a?
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello Tom,
The OCR is not available in the Core API and won't be.
The export to PDF/A is also unavailable - only the PDF\A creation is possible (as a new document) - that is described in one of the forum topics.
Cheers,
Alex
The OCR is not available in the Core API and won't be.
The export to PDF/A is also unavailable - only the PDF\A creation is possible (as a new document) - that is described in one of the forum topics.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 83
- Joined: Wed Mar 25, 2015 10:15 am
Re: What SDK should we use?
do you have the link?
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Here's the topic:
https://forum.pdf-xchange.com/ ... 878#p99878
https://forum.pdf-xchange.com/ ... 878#p99878
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 7
- Joined: Wed Sep 02, 2015 10:41 am
Re: What SDK should we use?
Hi Sasha,
Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.
Do you have any update on that?
The original question was:
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Last time we checked for conversion of image based PDF's to text searchable PDF's (30 NOV 2015), the reply was that the OCR API was being drastically changed so the functionality that we would need would be available in the nearest future builds.
Do you have any update on that?
The original question was:
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
Similarly to the text extraction, this would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello capturebites bvba,
The new OCR SDK has not been yet published.
Cheers,
Alex
The new OCR SDK has not been yet published.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 7
- Joined: Wed Sep 02, 2015 10:41 am
Re: What SDK should we use?
When is it planned for?
- Tracker Supp-Stefan
- Site Admin
- Posts: 17824
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: What SDK should we use?
Hello capturebites bvba,
For now the plan is to release the new OCR SDK with the next build (321) of our products.
While internally it will be based on different code - the available methods will at first be exactly the same as in the current one, so you can download and start working with the existing one, and even develop with it - and with the next build the transition to the new OCR SDK will be pretty straight forward.
Regards,
Stefan
For now the plan is to release the new OCR SDK with the next build (321) of our products.
While internally it will be based on different code - the available methods will at first be exactly the same as in the current one, so you can download and start working with the existing one, and even develop with it - and with the next build the transition to the new OCR SDK will be pretty straight forward.
Regards,
Stefan
-
- User
- Posts: 7
- Joined: Wed Sep 02, 2015 10:41 am
Re: What SDK should we use?
I suppose, we are back to the original question then:
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
This would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
Do you have a c# sample code to convert image based PDF's to text searchable PDF's.
This would be a 2-step process:
1) setup to select the options: language(s), accuracy...
2) background, unattended conversion of multiple files
- Tracker Supp-Stefan
- Site Admin
- Posts: 17824
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: What SDK should we use?
Hello capturebites bvba,
Please download and install this package:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
And then you will find the OCR sample projects under.
Pro SDK installation folder\Examples\OcrSDKExamples\C#Examples
Regards,
Stefan
Please download and install this package:
https://www.pdf-xchange.com/produc ... ge-pro-sdk
And then you will find the OCR sample projects under.
Pro SDK installation folder\Examples\OcrSDKExamples\C#Examples
Regards,
Stefan
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Sasha,
I need to also apply a rotation of a multiple of 90° to the image. How could this be done with the zoom and flip matrix?
Jean
I need to also apply a rotation of a multiple of 90° to the image. How could this be done with the zoom and flip matrix?
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello jvanlaethem,
What SDK are you using?
Cheers,
Alex
What SDK are you using?
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
We are currently using the core api.
I just hoped that changing some setting in the matrix in the code for GetBitmap you provided earlier might do the trick.
Jean
I just hoped that changing some setting in the matrix in the code for GetBitmap you provided earlier might do the trick.
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello Jean,
Check the https://sdkhelp.pdf-xchange.com/vie ... MathHelper interface of the auxInst that contains methods to work with matrices.
Also, to understand matrices more, read the PDF Specification (these chapters):
8.3 Coordinate System
8.4 Graphics State
Cheers,
Alex
Check the https://sdkhelp.pdf-xchange.com/vie ... MathHelper interface of the auxInst that contains methods to work with matrices.
Also, to understand matrices more, read the PDF Specification (these chapters):
8.3 Coordinate System
8.4 Graphics State
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Sasha,
I don't see how to call MathHelper.Matrix_Rotate() from the root IPXC_Inst.
Jean
I don't see how to call MathHelper.Matrix_Rotate() from the root IPXC_Inst.
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello Jean,
Please read my post again:
Alex
Please read my post again:
And also look at wiki: Cheers,interface of the auxInst
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 29
- Joined: Thu Oct 29, 2015 3:16 pm
Re: What SDK should we use?
Sasha,
Thank you. So let's come back to the question: "I need to also apply a rotation of a multiple of 90° to the image." How should the sample code of your post #19 be updated to handle this?
Jean
Thank you. So let's come back to the question: "I need to also apply a rotation of a multiple of 90° to the image." How should the sample code of your post #19 be updated to handle this?
Jean
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: What SDK should we use?
Hello Jean,
Have you tried experimenting on your own - we don't do a direct coding as a part of a support. We can help or advice though if you are struggling with something.
Cheers,
Alex
Have you tried experimenting on your own - we don't do a direct coding as a part of a support. We can help or advice though if you are struggling with something.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ