-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
This issue replaces #30.
The issue is that user-inputted data that includes these newline characters:
- \u2028
- \u2029
- \x85
causes the dump to think that the line is actually split into more than one. The result is that the dump raises:
ValueError("Mismatch between column names and values.")
To solve it I added the following to the Python processes:
process = subprocess.Popen(
(
"pg_dump",
# Force output to be UTF-8 encoded.
"--encoding=utf-8",
# Quote all table and column names, just in case.
"--quote-all-identifiers",
# Luckily `pg_dump` supports DB URLs, so we can just pass it the
# URL as argument to the command.
"--dbname",
url.geturl().replace('postgis://', 'postgresql://'),
) + tuple(extra_params),
stdout=subprocess.PIPE,
)
# Remove newline characters.
process = subprocess.Popen(
"sed $'s/\u2028/ /g'",
shell=True,
stdin=process.stdout,
stdout=subprocess.PIPE)
process = subprocess.Popen(
"sed $'s/\u2029/ /g'",
shell=True,
stdin=process.stdout,
stdout=subprocess.PIPE)
process = subprocess.Popen(
"sed $'s/\x85/ /g'",
shell=True,
stdin=process.stdout,
stdout=subprocess.PIPE)
I'd be happy to add as a PR if it's helpful, or is there a better way to handle the issue?
Metadata
Metadata
Assignees
Labels
No labels