Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty paragraphs are pasted in with an extra <br> when copied from Google Docs #1511

Open
ChiriVulpes opened this issue Feb 18, 2025 · 6 comments

Comments

@ChiriVulpes
Copy link

I don't actually know if this is Google Docs's copy format in specific or if this would always occur for empty paragraphs like this.

To reproduce:

  1. Open this document https://docs.google.com/document/d/1eALoiE4ufLdYqKaGYfHrT3cnVHpYW-o1rUzHs2Q1yFY/edit
  2. Ctrl + A, Ctrl + C
  3. Paste into the example editor https://prosemirror.net/
  4. The resulting text is paragraph 1, followed by a paragraph of two blank lines, followed by paragraph 2
@marijnh
Copy link
Member

marijnh commented Feb 18, 2025

The HTML we get in this case looks, if I remove the attributes, like this:

<p><span>Paragraph 1</span></p><br><p><span>Paragraph 2</span></p>

ProseMirror assumes the stray <br> actually stands for a break element, so it parses it to a hard_break node. That node must occur in an inline context, so it gets a parent paragraph. There is a hack in ProseMirror's parser that will drop <br> nodes at the end of their parent block, since those are typically used as placeholders in empty textblocks. But the way it completely replaces the entire paragraph with a loose break node in this situation isn't a common style, and is hard to distinguish from situations where the break node is intended to be part of the document.

@ChiriVulpes
Copy link
Author

ChiriVulpes commented Feb 18, 2025

Hmm. Google at it again... This is going to come up a lot for the community using my application, lots of authors copy paste from GDocs. But maybe if that's not common usage for Prosemirror overall it should be fixed on my end? That would be transformPastedHTML, correct?

Edit: I went ahead and just fixed it on my end. Feel free to do what you like with this issue.

@marijnh
Copy link
Member

marijnh commented Feb 18, 2025

I'm going to leave this open and see if more people are running into it. It's quite possible that this is a recent change in Google Docs—I'm pretty sure that last time I looked, empty paragraphs had a <p> tag around the <br> in their clipboard format.

@bZichett
Copy link
Contributor

I don't think it is recent but I am not positive. Would appreciate a special case here as I sporadically paste a lot from Google docs.

@moetelo
Copy link

moetelo commented Feb 20, 2025

Right now, we're using a somewhat messy transformPastedHTML in production to handle pasting from Google Docs, MS Word and OpenOffice. However, it doesn't feel reliable enough as users report issues every few months. Plus, I'm not sure if such logic should be the responsibility of an application using ProseMirror.

It would be great if ProseMirror natively preserved the formatting of content pasted from Google Docs and other Word-like editors.

@marijnh
Copy link
Member

marijnh commented Feb 20, 2025

It would be great if ProseMirror natively preserved the formatting of content pasted from Google Docs and other Word-like editors.

I'm not sure how you expect that to work. Firstly, ProseMirror is schema-agnostic, so it doesn't magically know how the nodes you define would map to whatever equivalent constructs exist in the various word processing systems. Secondly, as you found, these spit out all kinds of completely ludicrous HTML, and in some situations it's not even clear how to extract the semantic meaning from that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants