Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592

arandito · 2025-10-30T19:53:15Z

Description

This PR updates the generated docstring format to use the Google style with Markdown descriptions for all services (AWS and non-AWS).

This PR also removes the Sphinx-based reStructuredText (RST) documentation generation system introduced in #418. All documentation stubs will now be generated at build time in the awslabs/aws-sdk-python repo. This will prevent static doc stubs from bloating the repo when 400+ services are supported.

Key changes:

Updates all code generators (Client, Structure, Config, Enum, Union) to generate Google style docstrings with Markdown descriptions using the new MarkdownConverter class
Adds pandoc CLI tool as a required build dependency to convert Smithy model documentation strings to Markdown
Updates README with new pandoc dependency
Removes the AwsRstDocFileGenerator plugin that generates .rst files for Sphinx doc gen
Remove docs dependencies and Sphinx configuration files from generated clients
Removes RST-to-Markdown conversion logic

Note

The new docstring format in AWS clients enables us to generate documentation using Material for MkDocs. We will generate MkDocs stubs in awslabs/aws-sdk-python that work with the mkdocstrings tool. This will automatically create documentation from docstrings for all clients.

Testing

Added unit tests
Regenerated Bedrock Runtime and Transcribe Streaming clients and confirmed all docstrings were updated
- PR for updated clients: Update clients to support Google-style docstrings. awslabs/aws-sdk-python#24

Important

For local testing, please install pandoc v3.8.2 before running code generator.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jonathan343

Thanks Antonio, this is a great start!

This PR can really be broken down into two main components:

Formatting docstrings to comply with the requirements of MkDocs
Generating the MkDocs specific files and static .md

The first 100% needs to happen at codegen time. However, the more I look at this, the more I feel like doing 2 during codegen time and committing all the static files will really bloat our SDK repo when we're working with 400+ clients. This open PR is adding 4000+ lines for one client. The more scalable approach here might be to do all this generation on the fly when needed like we do with botocore.

I don't see anything in the generated files in https://github.com/awslabs/aws-sdk-python/pull/24/files that would make it difficult to do the approach mentioned above. The generated clients themselves have all the information we need.

.github/workflows/ci.yml

...aws/core/src/main/java/software/amazon/smithy/python/aws/codegen/AwsMkDocsFileGenerator.java

jonathan343

This looks good so far, thanks Antonio! Just had a few comments/questions.

jonathan343 · 2025-11-17T21:54:43Z

...n/core/src/test/java/software/amazon/smithy/python/codegen/writer/MarkdownConverterTest.java

+import software.amazon.smithy.python.codegen.GenerationContext;
+import software.amazon.smithy.python.codegen.PythonSettings;
+
+public class MarkdownConverterTest {


One case I saw often in botocore was nested formatting. I'm curious how pandoc handles this. For example, when you have a inline code that has nested elements that are italicized. This is something that's very common in AWS docs that we should verify is handled properly.

For example:

You can see above that amzn-s3-demo-bucket is italicized inside the inline code block.

Also might be worth trying our some of the other test cases I added in this PR: https://github.com/boto/botocore/pull/2817/files

Had to write a long comment to make sure I covered all bases here.

Inline Code with Nested Formatting

It looks like markdown also doesn't support formatting inside code blocks according to their specification as all text inside is treated literally.

However, pandoc does handle this by separating the formatted text into their own code blocks and wrapping them in the specified inline format. For example, <code>Hello <b>World</b></code> becomes Hello World (literal markdown = `Hello `**`World`**. This does create some awkward padding between formatted words and regular inline code text but users are still able to copy the content if needed. We can possibly fix this by editing the CSS padding for inline code or remove the nested formatting like we do in botocore.

After discussing with Jonathan offline, we agreed that this is a better state than what our boto3/botocore docs are currently doing so we'll keep the pandoc behavior for now.

This an example of the Mkdocs rendering of inline code with nested formatting:

Test Cases

I added the test cases to MarkdownConverterTest.java and confirmed they all pass. The only modification necessary was trimming href links with whitespaces. I added this in a preprocessing function that will clean up our input before passing it into pandoc.

For future context, these tests were added due to issues found in two services:

KMS - The service documentation has a link tag with nested italic formatting that lead to rendering issues in previous botocore versions (example). However, pandoc handles this without any modification:

IAM - The GetOrganizationsAccessReport operation has inline code with a nested link that does not get rendered in previous botocore versions(example). Pandoc also handles this without modification and preserves the link:

Note about fullname tags

The KMS service description also has a <fullname> element at the beginning which is ignored in boto3/botocore docs. I added logic to also remove these tags in preprocessing.

jonathan343 · 2025-11-18T03:23:24Z

codegen/core/src/main/java/software/amazon/smithy/python/codegen/ClientGenerator.java

-                        :param plugins: A list of callables that modify the configuration dynamically. These
-                            can be used to set defaults, for example.""", rstDocs);
-            });
+                    .orElse("Client for " + service.getId().getName());


Why did this change from .orElse("Client for " + serviceSymbol.getName());?

This was a minor improvement to the default client description in cases where a Smithy service does not have a service-level documentation trait.

serviceSymbol.getName() returns the client class name so the default description looks like "Client for BedrockRuntimeClient" which seems wrong to me.

This switches the default docstring to use the service id. For Bedrock Runtime, this looks like "Client for AmazonBedrockFrontendService".

I did consider using the AWS sdk id but that would exclude non-AWS services. I also considered using the smithy.api#title trait for the service but its not required so a service without it fails to generate.

This was only a minor improvement I noticed and don't feel too strong about it.

jonathan343 · 2025-11-18T05:21:23Z

codegen/core/src/main/java/software/amazon/smithy/python/codegen/generators/SetupGenerator.java

                    build-backend = "hatchling.build"

-                    [tool.hatch.build.targets.bdist]
+                    [tool.hatch.build]


Using tool.hatch.build seems to not be recommended by hatch:

Although not recommended, you may define global configuration in the tool.hatch.build table. Keys may then be overridden by target config.

Ref: https://hatch.pypa.io/1.9/config/build/#file-selection

We initially had bdist which doesn't do anything. I'm assuming what we were actually trying to do was tool.hatch.build.targets.wheel.

Given that this wasn't doing anything in the first place AND the default behavior for wheels only includes source code, I don't think we actually need this.

The source distribution currently does include tests (and examples for transcribe) which could be removed to make things slimmer, however, keeping these seems fine to me for now tbh.

Removed this section in new commit.

jonathan343 · 2025-11-18T05:28:32Z

codegen/core/src/main/java/software/amazon/smithy/python/codegen/writer/MarkdownConverter.java

+                "--from=" + fromFormat,
+                "--to=" + toFormat,
+                "--wrap=auto",
+                "--columns=72");


Why this number?

72 is pandoc's default character limit for text wrapping. I've explicitly specified it here (rather than relying on the default) to make this behavior visible to future developers who may need to debug docstring generation. I extracted this value into a named constant in the most recent commit I pushed up to eliminate the magic number.

I also chose 72 specifically because it provides a 16-character buffer below our 88-character line limit for generated code. I have not seen case where docstrings exceed 3 levels of indentation (12 characters), so this 72-character wrap limit ensures they stay within our enforced line length.

arandito requested a review from a team as a code owner October 30, 2025 19:53

arandito marked this pull request as draft October 30, 2025 20:46

arandito force-pushed the add-mkdocs branch from e056f72 to 93586b0 Compare October 31, 2025 16:45

arandito mentioned this pull request Oct 31, 2025

Update clients to support Google-style docstrings. awslabs/aws-sdk-python#24

Open

arandito marked this pull request as ready for review October 31, 2025 18:38

arandito requested review from SamRemis, alexgromero and jonathan343 and removed request for SamRemis November 3, 2025 22:55

jonathan343 reviewed Nov 11, 2025

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

...aws/core/src/main/java/software/amazon/smithy/python/aws/codegen/AwsMkDocsFileGenerator.java Outdated Show resolved Hide resolved

arandito changed the title ~~Replace Sphinx doc gen with MkDocs and Markdown~~ Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation Nov 14, 2025

arandito force-pushed the add-mkdocs branch from 503f895 to fd4368a Compare November 14, 2025 08:19

jonathan343 reviewed Nov 18, 2025

View reviewed changes

arandito and others added 4 commits November 24, 2025 10:12

Replace Sphinx doc gen with MkDocs and Markdown

c5cb2a0

Exclude docs from sdists and wheels

3d9b40e

Remove Mkdocs stub file generation

d4317e4

Address PR feedback

f86d70d

arandito force-pushed the add-mkdocs branch from b5a5627 to f86d70d Compare November 24, 2025 15:15

Add docstring for Plugin type alias

03ae94a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592

Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592

Uh oh!

arandito commented Oct 30, 2025 •

edited

Loading

Uh oh!

jonathan343 left a comment

Uh oh!

Uh oh!

Uh oh!

jonathan343 left a comment

Uh oh!

jonathan343 Nov 17, 2025

Uh oh!

arandito Nov 20, 2025

Uh oh!

jonathan343 Nov 18, 2025

Uh oh!

arandito Nov 19, 2025

Uh oh!

jonathan343 Nov 18, 2025

Uh oh!

arandito Nov 20, 2025

Uh oh!

jonathan343 Nov 18, 2025

Uh oh!

arandito Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592

Are you sure you want to change the base?

Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation #592

Uh oh!

Conversation

arandito commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key changes:

Testing

Uh oh!

jonathan343 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jonathan343 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Inline Code with Nested Formatting

Test Cases

Note about fullname tags

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arandito commented Oct 30, 2025 •

edited

Loading