-
Notifications
You must be signed in to change notification settings - Fork 24
Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
jonathan343
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Antonio, this is a great start!
This PR can really be broken down into two main components:
- Formatting docstrings to comply with the requirements of MkDocs
- Generating the MkDocs specific files and static
.md
The first 100% needs to happen at codegen time. However, the more I look at this, the more I feel like doing 2 during codegen time and committing all the static files will really bloat our SDK repo when we're working with 400+ clients. This open PR is adding 4000+ lines for one client. The more scalable approach here might be to do all this generation on the fly when needed like we do with botocore.
I don't see anything in the generated files in https://github.com/awslabs/aws-sdk-python/pull/24/files that would make it difficult to do the approach mentioned above. The generated clients themselves have all the information we need.
...aws/core/src/main/java/software/amazon/smithy/python/aws/codegen/AwsMkDocsFileGenerator.java
Outdated
Show resolved
Hide resolved
jonathan343
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good so far, thanks Antonio! Just had a few comments/questions.
| import software.amazon.smithy.python.codegen.GenerationContext; | ||
| import software.amazon.smithy.python.codegen.PythonSettings; | ||
|
|
||
| public class MarkdownConverterTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One case I saw often in botocore was nested formatting. I'm curious how pandoc handles this. For example, when you have a inline code that has nested elements that are italicized. This is something that's very common in AWS docs that we should verify is handled properly.
You can see above that amzn-s3-demo-bucket is italicized inside the inline code block.
Also might be worth trying our some of the other test cases I added in this PR: https://github.com/boto/botocore/pull/2817/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to write a long comment to make sure I covered all bases here.
Inline Code with Nested Formatting
It looks like markdown also doesn't support formatting inside code blocks according to their specification as all text inside is treated literally.
However, pandoc does handle this by separating the formatted text into their own code blocks and wrapping them in the specified inline format. For example, <code>Hello <b>World</b></code> becomes Hello World (literal markdown = `Hello `**`World`**. This does create some awkward padding between formatted words and regular inline code text but users are still able to copy the content if needed. We can possibly fix this by editing the CSS padding for inline code or remove the nested formatting like we do in botocore.
After discussing with Jonathan offline, we agreed that this is a better state than what our boto3/botocore docs are currently doing so we'll keep the pandoc behavior for now.
This an example of the Mkdocs rendering of inline code with nested formatting:

Test Cases
I added the test cases to MarkdownConverterTest.java and confirmed they all pass. The only modification necessary was trimming href links with whitespaces. I added this in a preprocessing function that will clean up our input before passing it into pandoc.
For future context, these tests were added due to issues found in two services:
- KMS - The service documentation has a link tag with nested italic formatting that lead to rendering issues in previous botocore versions (example). However, pandoc handles this without any modification:
- IAM - The
GetOrganizationsAccessReportoperation has inline code with a nested link that does not get rendered in previous botocore versions(example). Pandoc also handles this without modification and preserves the link:
Note about fullname tags
The KMS service description also has a <fullname> element at the beginning which is ignored in boto3/botocore docs. I added logic to also remove these tags in preprocessing.
| :param plugins: A list of callables that modify the configuration dynamically. These | ||
| can be used to set defaults, for example.""", rstDocs); | ||
| }); | ||
| .orElse("Client for " + service.getId().getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change from .orElse("Client for " + serviceSymbol.getName());?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a minor improvement to the default client description in cases where a Smithy service does not have a service-level documentation trait.
serviceSymbol.getName() returns the client class name so the default description looks like "Client for BedrockRuntimeClient" which seems wrong to me.
This switches the default docstring to use the service id. For Bedrock Runtime, this looks like "Client for AmazonBedrockFrontendService".
I did consider using the AWS sdk id but that would exclude non-AWS services. I also considered using the smithy.api#title trait for the service but its not required so a service without it fails to generate.
This was only a minor improvement I noticed and don't feel too strong about it.
| build-backend = "hatchling.build" | ||
| [tool.hatch.build.targets.bdist] | ||
| [tool.hatch.build] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using tool.hatch.build seems to not be recommended by hatch:
Although not recommended, you may define global configuration in the
tool.hatch.build table. Keys may then be overridden by target config.
Ref: https://hatch.pypa.io/1.9/config/build/#file-selection
We initially had bdist which doesn't do anything. I'm assuming what we were actually trying to do was tool.hatch.build.targets.wheel.
Given that this wasn't doing anything in the first place AND the default behavior for wheels only includes source code, I don't think we actually need this.
The source distribution currently does include tests (and examples for transcribe) which could be removed to make things slimmer, however, keeping these seems fine to me for now tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this section in new commit.
| "--from=" + fromFormat, | ||
| "--to=" + toFormat, | ||
| "--wrap=auto", | ||
| "--columns=72"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
72 is pandoc's default character limit for text wrapping. I've explicitly specified it here (rather than relying on the default) to make this behavior visible to future developers who may need to debug docstring generation. I extracted this value into a named constant in the most recent commit I pushed up to eliminate the magic number.
I also chose 72 specifically because it provides a 16-character buffer below our 88-character line limit for generated code. I have not seen case where docstrings exceed 3 levels of indentation (12 characters), so this 72-character wrap limit ensures they stay within our enforced line length.

Description
This PR updates the generated docstring format to use the Google style with Markdown descriptions for all services (AWS and non-AWS).
This PR also removes the Sphinx-based reStructuredText (RST) documentation generation system introduced in #418. All documentation stubs will now be generated at build time in the
awslabs/aws-sdk-pythonrepo. This will prevent static doc stubs from bloating the repo when 400+ services are supported.Key changes:
Client,Structure,Config,Enum,Union) to generate Google style docstrings with Markdown descriptions using the newMarkdownConverterclassAwsRstDocFileGeneratorplugin that generates.rstfiles for Sphinx doc genNote
The new docstring format in AWS clients enables us to generate documentation using Material for MkDocs. We will generate MkDocs stubs in awslabs/aws-sdk-python that work with the
mkdocstringstool. This will automatically create documentation from docstrings for all clients.Testing
Important
For local testing, please install pandoc v3.8.2 before running code generator.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.