Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BytesIO used in BER (and CER, DER) decode #175

Closed
wants to merge 11 commits into from

Conversation

janpipek
Copy link
Contributor

I implemented the functionality so that instead of copying strings, it passes a BytesIO object whenever possible. It should solve the #174 issue.

Changes in API:

  • decodeStream in the modules accepts a stream directly
  • decode behaves as it did before, constructing a new object and passing it to decodeStream

Problems:

  • With streams, slicing the substrate via substrateFunc seems difficult to support. Is that a public feature?

The request is a draft at least until:

  • Remaining TODOs are solved
  • The discussion about the passing of substrateFunc in tests is resolved
  • Benchmark prove the new implementation is not slower in common scenarios

@codecov-io
Copy link

codecov-io commented Aug 29, 2019

Codecov Report

Merging #175 into master will decrease coverage by 0.36%.
The diff coverage is 94.16%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
- Coverage   86.08%   85.72%   -0.37%     
==========================================
  Files          29       29              
  Lines        4270     4293      +23     
==========================================
+ Hits         3676     3680       +4     
- Misses        594      613      +19
Impacted Files Coverage Δ
pyasn1/codec/cer/decoder.py 94.59% <100%> (+0.84%) ⬆️
pyasn1/codec/der/decoder.py 95.65% <100%> (+0.91%) ⬆️
pyasn1/codec/ber/decoder.py 87.42% <93.44%> (-1.9%) ⬇️
pyasn1/type/univ.py 83.59% <0%> (-0.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3cae125...b2498bf. Read the comment docs.

@etingof
Copy link
Owner

etingof commented Aug 29, 2019

Thank you for working on this matter! Once it's done, I believe it should improve things and overall design!

With streams, slicing the substrate via substrateFunc seems difficult to support. Is that a public feature?

This is not a public feature.

I think one of the use-cases for it is to optimize decoding fragments of bit/octet strings encoded in "chunked" mode. Instead of creating a full-blown ASN.1 object from each chunk, the chunks first merged together while they are still Python bytes/str, then a single ASN.1 object is created from the merged contents.

if isinstance(substrate, bytes):
return not bool(substrate)
else:
return not bool(asStream(substrate).peek(1))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about reducing the while function into:

return not (substrate and asStream(substrate).peek(1))

?

# untagged Any container, recover inner header substrate
length += len(fullSubstrate) - len(substrate)
substrate = fullSubstrate
substrate.seek(fullPosition, os.SEEK_SET)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, backtracking! Perhaps that's not always be possible to seek back, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, probably we need to keep the extra bytes elsewhere (like the original "fullSubstrate" but smaller in memory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the different implementation (#176), I envelop the non-seekable objects in BufferedReader.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious which streams are not seekable since we depend on that... From Python docs it seems some are not.

@@ -1600,11 +1613,13 @@ def testErrorCondition(self):

def testRawDump(self):
decode = decoder.Decoder(decoder.tagMap, decoder.typeMap)
substrate = ints2octs((31, 8, 2, 1, 1, 131, 3, 2, 1, 12))
stream = decoder.asStream(substrate, )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hanging comma

Copy link
Owner

@etingof etingof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me from design perspective!

One thing I'd like to suggest is to keep the "streaming" spirit in en/decoder implementation. That is assume no backtracking, indefinitely long sequences of chunks, presence of other consumers of the same stream that will pick it up where we drop off.

The other thing is performance - it would be great to have at least performance parity with previous implementation...

@janpipek
Copy link
Contributor Author

janpipek commented Sep 6, 2019

The other pull request (#176) is superior to this one:

  1. It's fully compatible with pyasn1-modules (the test suite: OK)
  2. It's more compatible with file and other types in Python 2.7
  3. It's on par or faster in terms of performance for anything but a couple of bytes (while this is ~20 % slower for smallish objects).

I suggest to close this.

@etingof
Copy link
Owner

etingof commented Sep 7, 2019

The other pull request (#176) is superior to this one:

  1. It's fully compatible with pyasn1-modules (the test suite: OK)
  2. It's more compatible with file and other types in Python 2.7
  3. It's on par or faster in terms of performance for anything but a couple of bytes (while this is ~20 % slower for smallish objects).

I suggest to close this.

#176 is indeed way superior to #175! Closing this PR them.

@etingof etingof closed this Sep 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants