Text copied from PDF to Notepad has extra line breaks

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hello,

I have a PDF with a (hidden) text layer. One passage reads as follows:

Code: Select all

Die sehr gute Akzeptanz der zweiten Auflage hat die Autoren
dazu bewogen, das Buch erneut inhaltlich zu überarbeiten und
verbliebene Wünsche der Leserschaft zu ergänzen.
When I open the PDF with Acrobat Reader, select the text, copy the text and paste it into Notepad (or Word or Notepad++, etc.), the result is exactly as shown in the original.

When I do the same from PDF-XChange Editor, I get the following result when pasting:

Code: Select all

Die 
 sehr gute Akzeptanz 
 der  zweiten Auflage hat  die 
 Autoren 
dazu bewogen, das Buch erneut inhaltlich zu überarbeiten 
 und  
verbliebene Wünsche 
 der  Leserschaft zu ergänzen.
As you see, there are many extra line breaks and also additional spaces. How can this be avoided?

Thank you!

Ralf
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Will - Tracker Supp »

Hi Seeker45,

Thanks for the post - could you please send us the PDF to take a look at? I'm having trouble reproducing this here.

If you don't want the PDF available to the general public, could you send it to
support@pdf-xchange.com, with a link back to this topic?

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi Will,

you have mail :D

In fact, I have other files where PDF-XChange Editor behaves normally, i.e., no added line breaks, but I have had a some files so far which showed this odd behaviour in PDF-XChange Editor, while the Acrobat Reader interprets the copied text correctly.

All the best
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Thanks for the sample file Ralf,

A colleague of mine from the dev team made a preliminary check on it and asked me to make a ticket in our internal system:
RT#2214: Editor 3.0.307.1: New line breaks and extra spaces when copying text from some files.
So that we can investigate this in details and fix it for a future build. We will post back here in this topic when there are any further news in the ticket.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF has extra line breaks and spaces

Post by Seeker45 »

Hi,

another example where PDF-XChange Editor adds spaces. You will see what I mean when you copy the line beginning with "Document Description:"

This is the result with PDF-XChange Editor:
D o c u m e n t D e scri ptio n: A f te r F i n a l C o n s i d e r a t i o n P i l o t P r o g r a m R e q u e s t

This is the result with Acrobat Reader:
Document Description: After Final Consideration Pilot Program Request

Looking forward to seeing a fix soon. Thank you.

Ralf
Attachments
AFCP_Request.pdf
(213.59 KiB) Downloaded 147 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Thanks for the sample. It will certainly be useful when working on the ticket. And I just tested text selection in this file with other products - including our Viewer and they all had some issues with the content in that file.

Regards,
Stefan
steveawa
User
Posts: 2
Joined: Sun Mar 30, 2014 11:56 pm

Re: Text copied from PDF to Notepad has extra line breaks

Post by steveawa »

Do you need more examples of this?

It is happening for me with most journal articles that I download as pdfs.

I have attached an example:

PDF-XChange viewer gives:
Serotonin (5-HT) is a neurotransmitter that influences a broad
range of physiological processes and behaviors, including pain,
mood, cardiovascular function, and food intake.

PDF-XChange editor gives:
S erotonin (5-H T) is a n eurotra nsmitter that i nflu ences a b road
range o f physio logica l p ro ce ss e s and beha vio rs , i ncludi ng pa i n,
mood, c ar dio vasc ula r f un ctio n, a nd fo od i ntake.

In case it makes a difference, I am working on a high dpi display (Surface Pro 2)

Steve
Attachments
16530.full.pdf
(1.29 MiB) Downloaded 136 times
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Will - Tracker Supp »

Hi Steve,

Thanks for the post and sample - I've reproduced the issue with this file and have uploaded it to ticker 2214.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
bqxmprij
User
Posts: 162
Joined: Tue Dec 18, 2012 3:51 am

Re: Text copied from PDF to Notepad has extra line breaks

Post by bqxmprij »

I think these issues are related.

https://forum.pdf-xchange.com/ ... 62&t=19385

I look forward to this being fixed since I copy text from pdfs daily.
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Will - Tracker Supp »

Hi bqxmprij,

I'm a little on the fence as to whether they are, or aren't. The were some options implemented that do generally help with this, but they don't seem to be here, which is why I created a new ticket.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

I am not sure that I saw a corresponding entry in the release notes, but, obviously, I tested the new 308 build with regard to this issue. With the 308 build, copying the text from the AFCP_Request.pdf document now produces:

Document Description: After Final Consideration Pilot Program Request

whereas build 307.2 has produced:

D o c u m e n t D e scri ptio n: A f te r F i n a l C o n s i d e r a t i o n P i l o t P r o g r a m R e q u e s t

So, this part of the problem seems to be resolved.

The initial issue is not resolved yet, however, the behaviour is slightly less problematic:

Code: Select all

Die sehr gute Akzeptanz der zweiten Auflage hat die Autoren 
dazu bewogen, das Buch erneut inhaltlich zu überarbeiten 

 und 
verbliebene Wünsche Leserschaft zu ergänzen.
Unfortunately, I noted another significant problem, which I will post in a separate thread.

Please keep me posted, when this second issue is resolved. Thanks.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Did you try all of the "Edit -> Preferences -> Page Text -> Copy White Spaces Mode" options?

Regards,
Stefan
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hello All,

Just a follow up to let you know that ticket RT#2214 is now resolved, and the fix will be included in build 310 of the Editor.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Glad to hear that things are moving forward. While I do not know when (after the release) we will go to 310, but rest assured that I will test it :lol:

Cheers
Ralf
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Will - Tracker Supp »

Hi Ralf,

Absolutely - do keep us posted on the results :)
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi,

I have sent you email. This issue may still be present.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Thanks for the file. As discussed over e-mails - the text is actually spread over two separate objects. I've passed the e-mail to my colleagues in the dev team for further investigation.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi,

I continue to see this issue also in the latest version 5.5.316.0. In fact, the situation is still present as in my original post, where PDF-XChange Editor adds line breaks, while Acrobat Reader interprets it correctly as one line. In particular, why are line breaks inserted between words?

I just wanted to check that this ticket is still open. Thank you.

Best regards
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Is the problem still with the AFCP_Request.pdf file you posted above?
It seems like the contents of that file is broken down to quite small elements (each word and each space is a separate object).

When I right click a selected text and choose to copy it as rich text - it then pastes without the line breaks in Word (but with line breaks in Notepad). There also seem to be two characters/spaces at the end of each line in the PDF file.

Attached is a test word document where I pasted some parts of the first page - you can see there is a double space between Pilot and Program - but the two words are not on separate lines.

Regards,
Stefan
Attachments
test.zip
(10.88 KiB) Downloaded 87 times
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi Stefan,

I checked the "extra spaces" issue with AFCP_Request.pdf again, and I do not see this issue anymore. So this seems to have been solved.

What I was referring to is the original post about the "extra line breaks" which I continue to see.

Thank you.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

There's only the AFCP_Request.pdf file in this topic so I tested with it.
Each word in that is a separate object - so the Editor does not know that they are one single block of text. I will check with our devs and see if there is any logic we can apply in such scenarios so that the copied text does not have line breaks in it.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi Stefan,

I am resending the file which I sent on Jan 14, 2014. Please attach this file to the pending ticket. Thank you.

Also, perhaps you misunderstood me: The "extra spaces" problem does not seem to be present anymore. Are you still seeing this? It is the "extra line breaks" problem that still exists to which I referred in my original post and for which I have resent the corresponding file.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Thanks - got the file and am checking it with a colleague from the dev team at the moment.

Regards,
Stefan
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hello Ralf,

After checking the file - I've made a new ticket (the old one is resolved and the files from it are different than the last one you sent): #3390
We are work on getting this file's text to copy and paste better in future builds, and I will update this forum topic as soon as there are any news on the subject.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi Stefan,

Thank you for creating the ticket.

And, just to make sure that we are on the same page: The file I sent yesterday is identical to the file that I sent on Jan 14, 2014 and to which my original post was related. Since a number of files have been associated with this thread, perhaps this original file was not attached to this case before. But, again, the original file I sent is identical to the file you have now attached, so also the "intermediate state" shown here is still applicable in case this hint can help.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Yes it is the same file, but it's issue is different than the one with the files for the earlier ticket discussed in this topic. That's why a separate ticket was needed for yours.

Regards,
Stefan
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi all,

Ticket #3390 was resolved in build 318.1. Updating to this build will resolve the issue.

Regards,
Stefan
Seeker45
User
Posts: 162
Joined: Wed Dec 18, 2013 2:32 pm
Location: Germany

Re: Text copied from PDF to Notepad has extra line breaks

Post by Seeker45 »

Hi Stefan,

I am afraid I cannot confirm that. I just tried it with the latest version. This is the result:

Code: Select all

Die 

 sehr gute Akzeptanz 

 der zweiten Auflage hat die 

 Autoren 
dazu bewogen, das Buch erneut inhaltlich zu überarbeiten 

 und 

verbliebene Wünsche 

 der Leserschaft zu ergänzen.
Please see my first post for a comparison.

Cheers
Ralf
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi Ralf,

Thanks. Indeed I am also able to reproduce the issue, but my colleagues from the dev team say that it works OK for them with a development version of what will become build 319. I am now asking them for an update once again so that we can make sure this will get properly fixed for 319.

Regards,
Stefan
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17907
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: Text copied from PDF to Notepad has extra line breaks

Post by Tracker Supp-Stefan »

Hi All,

Just a follow up - that ticket #3390 is now confirmed to be fixed for build 319. Please let me know if you still experience the issue with the extra line breaks once that build is released.

Regards,
Stefan
Post Reply