Feature Request: remove hard returns from copied text

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Sean - Tracker, Paul - Tracker Supp, Chris - Tracker Supp, Tracker Supp-Stefan, Ivan - Tracker Software

Post Reply
gollux
User
Posts: 6
Joined: Mon Jan 06, 2020 12:15 am

Feature Request: remove hard returns from copied text

Post by gollux » Mon Jan 06, 2020 12:21 am

When copying and pasting text from the PDF-Xchange editor, each line of text in the pdf is inserted with a hard return (as if each line were a new paragraph). This means that using copied text requires manually deleted each hard return to create continuous text. This is cumbersome, but it seems to be unnecessary.

In an earlier post, Timur shared an Autohotkey script that removes hard returns from copied text:

viewtopic.php?f=62&t=30795&p=135780&hil ... ks#p135780

I have also found this freestanding program, which does the same:

http://www.onehourprogramming.com/blog/ ... -pdfs.html

(However, the freestanding program inserts some errors into copied text.)

I would like to suggest that this feature should be incorporated into the PDF-XChange Editor itself.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14095
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Feature Request: remove hard returns from copied text

Post by Tracker Supp-Stefan » Mon Jan 06, 2020 2:40 pm

Hello gollux,

Please try switching to Edit Text Mode - in this mode - we try to determine continuous areas of text - and if you are in that more - copying and pasting of text should work better with less 'hard returns'.

I have a sample file where pretty much each word is a separate object, and still with that file the above mentioned way of selecting the text does work better than the normal selection tool.

I will check with our devs if the same logic can be applied when copying text with the normal text select tool!

Regards,
Stefan

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Tue Jan 07, 2020 12:42 pm

Using "Edit Text" works in parts, but is only usable on unlocked PDF files that allow editing. I does remove line-breaks at the end of a column, but not remove those happening because of text wrapping around images. Furthermore it only removes some slash "-" breaks within words, but leaves others behind.

Here is a comparison of text wrapped around an image, except for the headline and first line of text. The headline is special in this example. Only Acrobat Reader gets the upper vs. lower case correct. Furthermore my AHK script removes the line-break that should stay intact, because it cannot differentiate between headline and text-body.

Select Text

erzfeInde und bevorzuGtes Gelände
Eine gute Wahl für Erzfeinde in diesem Abenteuerpfad
wären Aberrationen und Humanoide (Menschen),
gefolgt von Externare (Böse) und Untote und
schlussendlich Magische Bestien und Schlicke.
Dies deckt zwar nicht alle Monster und Her-
ausforderungen ab, mit denen es dein Cha-
rakter zu tun bekommen wird, wohl aber die
häufigsten Gegner, welche auftreten werden.
Ein guter Teil der Handlung spielt in städ-
tischer Umgebung, so dass diese Option
sich bei Bevorzugtem Gelände als nütz-
lich erweisen dürfte. Die SC werden aber
mit dem Boot unterwegs sein und später
Wüsten bereisten, so dass auch Aquatisch
und Wüste gute Optionen sind.

Edit Text

erzfeInde und bevorzuGtes Gelände
Eine gute Wahl für Erzfeinde in diesem Abenteuerpfad wären Aberrationen und Humanoide (Menschen),
gefolgt von Externare (Böse) und Untote und
schlussendlich Magische Bestien und Schlicke.
Dies deckt zwar nicht alle Monster und Herausforderungen ab, mit denen es dein Cha-
rakter zu tun bekommen wird, wohl aber die
häufigsten Gegner, welche auftreten werden.
Ein guter Teil der Handlung spielt in städ-
tischer Umgebung, so dass diese Option
sich bei Bevorzugtem Gelände als nütz-
lich erweisen dürfte. Die SC werden aber
mit dem Boot unterwegs sein und später
Wüsten bereisten, so dass auch Aquatisch
und Wüste gute Optionen sind.

Acrobat Reader

Erzfeinde und Bevorzugtes Gelände
Eine gute Wahl für Erzfeinde in diesem Abenteuerpfad
wären Aberrationen und Humanoide (Menschen),
gefolgt von Externare (Böse) und Untote und
schlussendlich Magische Bestien und Schlicke.
Dies deckt zwar nicht alle Monster und Herausforderungen
ab, mit denen es dein Charakter
zu tun bekommen wird, wohl aber die
häufigsten Gegner, welche auftreten werden.
Ein guter Teil der Handlung spielt in städtischer
Umgebung, so dass diese Option
sich bei Bevorzugtem Gelände als nützlich
erweisen dürfte. Die SC werden aber
mit dem Boot unterwegs sein und später
Wüsten bereisten, so dass auch Aquatisch
und Wüste gute Optionen sind.

Autohotkey script
erzfeInde und bevorzuGtes Gelände Eine gute Wahl für Erzfeinde in diesem Abenteuerpfad wären Aberrationen und Humanoide (Menschen), gefolgt von Externare (Böse) und Untote und schlussendlich Magische Bestien und Schlicke. Dies deckt zwar nicht alle Monster und Herausforderungen ab, mit denen es dein Charakter zu tun bekommen wird, wohl aber die häufigsten Gegner, welche auftreten werden.
Ein guter Teil der Handlung spielt in städtischer Umgebung, so dass diese Option sich bei Bevorzugtem Gelände als nützlich erweisen dürfte. Die SC werden aber mit dem Boot unterwegs sein und später Wüsten bereisten, so dass auch Aquatisch und Wüste gute Optionen sind.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3794
Joined: Wed Jan 03, 2018 6:52 pm

Re: Feature Request: remove hard returns from copied text

Post by TrackerSupp-Daniel » Tue Jan 07, 2020 11:53 pm

Hello Timur,

Thank you for the examples. Might I ask if any of these examples you have provided is "correct"?

I can understand the request itself, but any way the we implement this would either result in something identical to the AHK script, which you have said is incorrect, or would have to consider too many variables to be correct in most situations.

The end result is that it is always better to manually correct issues like this, as any automated method is likely to give errors that will need to be addressed regardless.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Wed Jan 08, 2020 9:15 am

The headline is correct in the Acrobat example. The removal of "-" slashes is correct in the Acrobat and AHK example. Except for the headline the AHK example is the most correct one. So if Editor is able to identify headlines from text-body then AHK plus correct headline would be the best version.

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Wed Jan 08, 2020 9:38 am

How does Acrobat correct the case of the headline characters? It correctly copies the first character of the words as upper-case and keeps the middle characters lower-case. Do they use a dictionary for copy & paste? Editor copies the text as present in the content:
grafik.png
grafik.png (9.68 KiB) Viewed 986 times

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14095
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Feature Request: remove hard returns from copied text

Post by Tracker Supp-Stefan » Wed Jan 08, 2020 11:53 am

Hello Timur,

Our devs asked me to make this ticket:
#5059: FR: Copy Text Option: Add carriage-return symbols only between visual paragraphs.
When I brought this to their attention, so we will be looking at making improvements in the future.

As for how Adobe do it - they probably also apply custom logic to try and recognize continuous blocks of text.

Regards,
Stefan

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Wed Jan 08, 2020 1:17 pm

Keeping columns (CR at right column border) intact for copy & paste can be beneficial, too. So it's not a clear right or wrong thing. I am happy with my AHK solution, because it gives me both options.

I still wonder about the upper- vs. lower-case differences in the headline. Here is the original headline:

grafik.png
Why does Editor interpret the "I" and "G" in the middle of the words as upper-case and the first character of the words as lower-case?

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3794
Joined: Wed Jan 03, 2018 6:52 pm

Re: Feature Request: remove hard returns from copied text

Post by TrackerSupp-Daniel » Wed Jan 08, 2020 6:23 pm

Hello Timur,

Most likely these letters are capitalized and the rest are not, in the original text you have copied from. I cannot say for certain without seeing the original source you are copying from of course, but as a test, copy this text and paste it into notepad. The capitalization should be the same as pasting into our software. If not, please send us a copy of that file or webpage you are copying this from, and we will see what can be done.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Wed Jan 08, 2020 6:40 pm

According to Editor's Content pane the original content has that strange capitalization, see post:

viewtopic.php?f=62&t=33670#p139405

According to content and Editor's copy & paste the original is (wrong):

erzfeInde und bevorzuGtes Gelände


But according to the rendered PDF (see screenshot earlier) and Adobe's copy & paste it is (correct):

Erzfeinde und Bevorzugtes Gelände

So for some reason the rendered content in Editor is different to the content pane, albeit practically correct. But the copied content is the same as the content pane, albeit practically incorrect.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3794
Joined: Wed Jan 03, 2018 6:52 pm

Re: Feature Request: remove hard returns from copied text

Post by TrackerSupp-Daniel » Wed Jan 08, 2020 6:52 pm

Hello Timur,

A screenshot is not helpful in this situation, without the original source that you are copying this content from, we cannot analyze what is happening or why it would be interpreted like that. As you have shown in the content pane, the I and G in the middle of those words are indeed capitalized as far as we can tell. Why this is however, cannot be determined without investigating the original source file you have copied this text from. Regardless of how Adobe or any competitors handle it, there is nothing that we can do without an example to test with.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623


User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3794
Joined: Wed Jan 03, 2018 6:52 pm

Re: Feature Request: remove hard returns from copied text

Post by TrackerSupp-Daniel » Wed Jan 08, 2020 8:33 pm

Hello Timur,

Thank you for the sample file, As expected, the text is actually formatted like this in the original document:
image.png
This test was done completely separate from our products, I opened the document in both Edge and Firefox, Copied it from there, then pasted into Notepad.

While we will be working on the ticket that Stefan mentioned to handle return characters, It is unlikely that we will be changing this capital letter handling, as respecting the existing capital state of pasted text is rather critical.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Wed Jan 08, 2020 9:43 pm

Let's ignore Adobe's handling for a moment, but how does Editor interpret the underlying text capitalization into the different capitalization rendered on screen? Are the fonts different for the first letters of "E rzfeinde" and "B evorzugtes"?

It's a bit confusing to get different copy & paste results from what is rendered on screen, albeit I recognize that this is a special case of the rendered words being all capitalized anyway.

User avatar
TrackerSupp-Daniel
Site Admin
Posts: 3794
Joined: Wed Jan 03, 2018 6:52 pm

Re: Feature Request: remove hard returns from copied text

Post by TrackerSupp-Daniel » Wed Jan 08, 2020 10:26 pm

Hello Timur,

As the document is locked, I cannot check this personally, but from my understanding (especially with fonts that appear in full caps regardless of the capitalization) this is often the case, either the fonts themselves are different, or the character size is different (sometimes both). Regardless, I have forwarded this to our Dev team for further investigation (they have the tech to get into the nitty-gritty details of the document and find out what is going on for you.

If this turns out to be something we are doing wrong, you can rest assured they will be fixing it in an upcoming build.

Kind regards,
Daniel McIntyre
Support Technician
Tracker Software Products (Canada) LTD

Support: <Support@tracker-software.com>
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623

Lzcat - Tracker Supp
Site Admin
Posts: 720
Joined: Thu Jun 28, 2007 8:42 am

Re: Feature Request: remove hard returns from copied text

Post by Lzcat - Tracker Supp » Thu Jan 09, 2020 7:36 am

Hi Timur.
All is very simple:
image.png
As you can see it is not capital letter, but just font with smallcaps letters and using larger size for "capitals".
HTH.
Attachments
image.png
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Thu Jan 09, 2020 11:43 am

Thank you Victor,

so the content was just created in a stupid way with no regard for possible copy & paste usage. Still interesting that Acrobat Reader corrects the case automatically upon text copy. This suggests that they may be using a dictionary for cleaning up copied text, likely also to get rid of those "-" slash word-wraps correctly.

I will keep my AHK scrip and decide on a per use basis whether to copy from Editor or Acrobat.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14095
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Feature Request: remove hard returns from copied text

Post by Tracker Supp-Stefan » Thu Jan 09, 2020 12:15 pm

Hello Timur,

Yes - it would seem like the file was created in a weird way!
Adobe apparently have some such uppercase and hyphenation correction logic that they do apply!

Regards,
Stefan

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Thu Jan 09, 2020 12:17 pm

I am a bit confused that Editor does remove some hyphenations, but seems to leave most intact. In the above example only the word "Her-ausforderungen" is corrected by Editor, but not the other words. Fortunately the AHK script takes care of this.

User avatar
Tracker Supp-Stefan
Site Admin
Posts: 14095
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Feature Request: remove hard returns from copied text

Post by Tracker Supp-Stefan » Thu Jan 09, 2020 12:31 pm

Hello Timur,

Maybe there was something specific for that particular word. Can you include a sample page from the original file so that I can try copying it here at my end?

Cheers,
Stefan

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Thu Jan 09, 2020 1:11 pm

The original is protected, so I "cannot" extract single pages, but I will take a look at the content on my end. I noticed this in another part of the text, too.

Timur Born
User
Posts: 702
Joined: Tue Jun 26, 2012 1:50 pm

Re: Feature Request: remove hard returns from copied text

Post by Timur Born » Thu Jan 09, 2020 1:14 pm

I don't see anything special about it compared to the next hyphenated word?!
grafik.png
grafik.png (6.86 KiB) Viewed 913 times

Post Reply