NEWB Question-Region Selection

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

NEWB Question-Region Selection

Post by Lambchop »

My sincerest apologies as I am just getting started with the SDK. I have spent a couple of days already and I am just not making progress. Problem: I want to select a region within the PDF and put into a memory stream to print as its own PDF document. This is a precursor step to sending it to my own custom OCR engine. My problem is that I cannot seem to figure out how to extract a selected region from within the PDF. My ultimate goal is to pull searchable text from a pdf table based on a region selection.

I am feeling so clueless as this is probably super simple but I just don't see how to make this work after reading through the FullDemo code. Is there other developer documentation besides (https://sdkhelp.pdf-xchange.com/view/PXV:IUIX_Cmd)?

thank you!
-Eric
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3549
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: NEWB Question-Region Selection

Post by Ivan - Tracker Software »

Based on your description, one of the DrawXXX functions should help.
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

Thank you for that reference. Do you have any actual implementation insight or suggested approach that has been successfully attempted before? I am sure I am not asking a unique question ... "How to select a region and retrieve all text and/or all image data within that region and save it to a memory stream?" Unfortunately, I do not find any help documentation, forum posts, or sample reference material for this particular question. But that's why I am sure it is a newb question because I am still learning the SDK documentation. So if you could provide some greater insight that would be very helpful. Thank You!!
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3549
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: NEWB Question-Region Selection

Post by Ivan - Tracker Software »

Can you please describe your task in a bit more detail?

You say "How to select a region...". How do you expect this selection should happen? By your user on a rendered page? Programmatically?

We can try to help or direct you in the right direction, but we need to understand what exactly you are trying to archive.
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

Ivan - very fair question! Specifically, there are 2 use cases. In each case, the user will perform a selection by using their mouse too select a region on the PDF through the PDF Viewer. The information being extracted should be in tabular form and can break across pages.

Case 1: The PDF is searchable with text in a tabular format.
Case 2: The PDF is an image and the text is in a tabular format.

In both cases: The text data needs to be extracted based on the region selected using positional reference data such that headers and starting column row text descriptors can be identified as part of the extraction process. If the data breaks across pages then the data can be extracted as a complete table, column, or row set. The number of rows in the data is NOT consistent but the number of columns per table and the table titles consistent for each type of PDF report.

The users are working with consistent PDF form layouts that they first map out the regions for the data extraction. These mappings are saved within our database so that when a user wants to extract tabular data in the future they select the appropriate mapping layout and then I use the SDK to pull the associated data sets. "IF" the data breaks across pages then I will pull the data until the table or column is completely extracted.

I have my own OCR engine where I can send it a PDF image and it can convert it to text. But in both the searchable text and the PDF image scenarios, I cannot seem too pull the necessary information from the viewer control. I have not figured out which method I can use that will return the necessary RECTANGLE properties for me to pull the information. I know this must be super simple but I am just stuck and I am concerned that my ignorance of the library is pointing me in the wrong direction. I know I have to compensate for viewer zoom, page orientation, and page relations. Let me know if you have any other questions as this is super important for me to understand as I get going! :)
User avatar
Chris - Tracker Supp
Site Admin
Posts: 795
Joined: Tue Apr 14, 2009 11:33 pm

Re: NEWB Question-Region Selection

Post by Chris - Tracker Supp »

Hi Lambchop,

I sent you the sample app yesterday that our developers prepared for you.

Hope that helps :)
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.


Chris Attrell
Tracker Sales & Support North America
http://www.tracker-software.com
zarkogajic
User
Posts: 1370
Joined: Thu Sep 05, 2019 12:35 pm

Re: NEWB Question-Region Selection

Post by zarkogajic »

Hi Chris,

Can you share this sample somewhere?

-žarko
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3549
Joined: Thu Jul 08, 2004 10:36 pm
Location: Vancouver Island - Canada
Contact:

Re: NEWB Question-Region Selection

Post by Ivan - Tracker Software »

Yes, we will. We are working on adding this kind of sample to our github repository.
Tracker Software (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

Ivan - Thank you tons for the sample code. I notice that you used a few compressed coding lines that are just not making sense as I cannot convert them cleanly into steps. When you can... please break out these lines with their associated references:

line 1: tagPOINT pt = pEvent.get_Pos();
issue: get_Pos() does not exist off of IUIX_Event

line 2: m_curPageBBox = pView.Doc.CoreDoc.Pages[(uint)editPageIndex].get_Box(PXC_BoxType.PBox_ViewBox);
issue: get_Box does not exist off of IPXC_Page or PXC_Rect

FYI ... the ambiguous use of implements vs. inherits in C# is just too fun :roll:
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: NEWB Question-Region Selection

Post by Vasyl-Tracker Dev Team »

What programming language are you using?
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

Visual Studio 2019
I code in VB.NET but the class extensions are not recognized in the C# either.
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: NEWB Question-Region Selection

Post by Vasyl-Tracker Dev Team »

That C#-sample project we provided - can you compile it on your side?
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: NEWB Question-Region Selection

Post by Vasyl-Tracker Dev Team »

I just made a simple VB.NET app (WinForm, Desktop), added to its main form the Editor's ActiveX control, and I can see this:
image.png
I'm not so experienced in VB.NET but seems all look very similar to corresponding stuff in C#-project...
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

I think we are not communicating about the same thing ... what I posted was about "get_Pos()" and "get_Box". Something is really odd ... I set the project reference to the same PDXEdit.dll and it shows this code for the IUIX_Event from a NEW Project:
Namespace PDFXEdit
<ComConversionLoss> <Guid("482E54A4-F8E8-4C78-8472-1FA890ED1C3A")> <TypeLibTypeAttribute(4288)>
Public Interface IUIX_Event <DispId(1610743809)>
Property Code As Integer
<DispId(1610743811)>
Property Handled As Boolean
<ComAliasName("PDFXEdit.PARAM_T")> <DispId(1610743813)>
Property Param1 As <ComAliasName("PDFXEdit.PARAM_T")> UInteger
<ComAliasName("PDFXEdit.PARAM_T")> <DispId(1610743815)>
Property Param2 As <ComAliasName("PDFXEdit.PARAM_T")> UInteger
<ComAliasName("PDFXEdit.PARAM_T")> <DispId(1610743817)>
Property Result As <ComAliasName("PDFXEdit.PARAM_T")> UInteger
<DispId(1610743820)>
Property Pos As tagPOINT

<DispId(1610743808)>
Function _GetRawPtr() As IntPtr
<DispId(1610743819)>
Function Clone() As IUIX_Event
End Interface
End Namespace
************************
Notice that it does not have any reference to the set_Pos and get_Pos methods. Your SDK Help https://sdkhelp.pdf-xchange.com/view/PXV:IUIX_Event#Methods also does not show Get_Pos method. I am confused between the sample project code and the SDK documentation and the VS class definitions. If you create a NEW project the DLL pulls a different class definition for the IUIX_Event.
I think this is an issue of exposing managed vs unmanaged code base.
Current FULL DEMO Project Class Definition for IUIX_Event: notice ... it has the code reference for set_Pos and get_Pos
namespace PDFXEdit
{
[ComConversionLoss]
[Guid("482E54A4-F8E8-4C78-8472-1FA890ED1C3A")]
[TypeLibType(4288)]
public interface IUIX_Event
{
[DispId(1610743808)]
IntPtr _GetRawPtr();
[DispId(1610743819)]
IUIX_Event Clone();
[DispId(1610743820)]
tagPOINT get_Pos();
[DispId(1610743820)]
void set_Pos(ref tagPOINT stPos);

[DispId(1610743809)]
int Code { get; set; }
[DispId(1610743811)]
bool Handled { get; set; }
[ComAliasName("PDFXEdit.PARAM_T")]
[DispId(1610743813)]
uint Param1 { get; set; }
[ComAliasName("PDFXEdit.PARAM_T")]
[DispId(1610743815)]
uint Param2 { get; set; }
[ComAliasName("PDFXEdit.PARAM_T")]
[DispId(1610743817)]
uint Result { get; set; }
[DispId(1610743820)]
tagPOINT Pos { get; set; }
}
}
Attachments
image.png
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2352
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: NEWB Question-Region Selection

Post by Vasyl-Tracker Dev Team »

It is known 'issue'... When you have the C# project and add a reference to the ActiveX - it imports all interfaces and types from it. And for some internal purposes the usual COM-property, like IUIX_Event::Pos, can be imported as pair of simple get_Pos()/set_Pos() functions.
However, in VB.NET the standard COM-importer interprets it as a natural get/set Pos-property of IUIX_Event-object.
With the IPXV_Page::Box get/set property we have a slightly another case: Box is the property-with-parameter. So, to get/set the value you need to specify parameter too (VB.NET):

Code: Select all

Dim page As PDFXEdit.IPXC_Page
Dim curViewBox As PDFXEdit.PXC_Rect
Dim newCropBox As PDFXEdit.PXC_Rect
...
pageBox = page.Box(PDFXEdit.PXC_BoxType.PBox_ViewBox)
page.Box(PDFXEdit.PXC_BoxType.PBox_CropBox) = newCropBox
And, while VB.NET supports property-with-parameter - the C# doesn't! So in this case, with C# you have only one way to use such properties, via corresponding get/set functions:

Code: Select all

curViewBox = page.get_Box(PDFXEdit.PXC_BoxType.PBox_ViewBox);
page.set_Box(PDFXEdit.PXC_BoxType.PBox_CropBox, ref newCropBox);
Please note: here we talking about SDK based on ActiveX-technology that provides special COM-interfaces to share API from itself. And according to COM-standard, this SDK has built-in description of all public interfaces/features it carries inside (TypeLibrary).
So any programming language, that 'understands' such COM-stuff - may automatically import such API-description info to your project and in terms of your programming language when you just add the reference to that SDK. And the COM-importer on your side(in your IDE) has its own rules on how to import native COM-interfaces from an external SDK to your project and to your programming language. It is not controlled by the SDK at all.

But typically it is easier than it may look:
- in VB.NET: all COM-properties are imported as natural properties
- in C#:
a) when COM-property has simple type (int, double, string, interface) - it will be imported as natural C#-property
b) when COM-property sets/gets the structure - it will be imported as pair of functions
c) when COM-property has a parameter - it will be imported as pair of functions
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Lambchop
User
Posts: 34
Joined: Mon May 02, 2022 5:58 pm

Re: NEWB Question-Region Selection

Post by Lambchop »

WOW! Great explanation ... I learned something new on this one. Thank you for all your patience!
User avatar
Chris - Tracker Supp
Site Admin
Posts: 795
Joined: Tue Apr 14, 2009 11:33 pm

NEWB Question-Region Selection

Post by Chris - Tracker Supp »

:)
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.


Chris Attrell
Tracker Sales & Support North America
http://www.tracker-software.com
Post Reply