Skip to content

Conversation

@arandito
Copy link
Contributor

@arandito arandito commented Oct 30, 2025

Description

This PR updates the generated docstring format to use the Google style with Markdown descriptions for all services (AWS and non-AWS).

This PR also removes the Sphinx-based reStructuredText (RST) documentation generation system introduced in #418. All documentation stubs will now be generated at build time in the awslabs/aws-sdk-python repo. This will prevent static doc stubs from bloating the repo when 400+ services are supported.

Key changes:

  • Updates all code generators (Client, Structure, Config, Enum, Union) to generate Google style docstrings with Markdown descriptions using the new MarkdownConverter class
  • Adds pandoc CLI tool as a required build dependency to convert Smithy model documentation strings to Markdown
  • Updates README with new pandoc dependency
  • Removes the AwsRstDocFileGenerator plugin that generates .rst files for Sphinx doc gen
  • Remove docs dependencies and Sphinx configuration files from generated clients
  • Removes RST-to-Markdown conversion logic

Note

The new docstring format in AWS clients enables us to generate documentation using Material for MkDocs. We will generate MkDocs stubs in awslabs/aws-sdk-python that work with the mkdocstrings tool. This will automatically create documentation from docstrings for all clients.

Testing

Important

For local testing, please install pandoc v3.8.2 before running code generator.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@arandito arandito requested a review from a team as a code owner October 30, 2025 19:53
@arandito arandito marked this pull request as draft October 30, 2025 20:46
@arandito arandito marked this pull request as ready for review October 31, 2025 18:38
@arandito arandito requested review from SamRemis, alexgromero and jonathan343 and removed request for SamRemis November 3, 2025 22:55
Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Antonio, this is a great start!

This PR can really be broken down into two main components:

  1. Formatting docstrings to comply with the requirements of MkDocs
  2. Generating the MkDocs specific files and static .md

The first 100% needs to happen at codegen time. However, the more I look at this, the more I feel like doing 2 during codegen time and committing all the static files will really bloat our SDK repo when we're working with 400+ clients. This open PR is adding 4000+ lines for one client. The more scalable approach here might be to do all this generation on the fly when needed like we do with botocore.

I don't see anything in the generated files in https://github.com/awslabs/aws-sdk-python/pull/24/files that would make it difficult to do the approach mentioned above. The generated clients themselves have all the information we need.

@arandito arandito changed the title Replace Sphinx doc gen with MkDocs and Markdown Switch to Google-style docstrings with Markdown and remove Sphinx/RST generation Nov 14, 2025
Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far, thanks Antonio! Just had a few comments/questions.

import software.amazon.smithy.python.codegen.GenerationContext;
import software.amazon.smithy.python.codegen.PythonSettings;

public class MarkdownConverterTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case I saw often in botocore was nested formatting. I'm curious how pandoc handles this. For example, when you have a inline code that has nested elements that are italicized. This is something that's very common in AWS docs that we should verify is handled properly.

For example:
Image

You can see above that amzn-s3-demo-bucket is italicized inside the inline code block.

Also might be worth trying our some of the other test cases I added in this PR: https://github.com/boto/botocore/pull/2817/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to write a long comment to make sure I covered all bases here.

Inline Code with Nested Formatting

It looks like markdown also doesn't support formatting inside code blocks according to their specification as all text inside is treated literally.

However, pandoc does handle this by separating the formatted text into their own code blocks and wrapping them in the specified inline format. For example, <code>Hello <b>World</b></code> becomes Hello World (literal markdown = `Hello `**`World`**. This does create some awkward padding between formatted words and regular inline code text but users are still able to copy the content if needed. We can possibly fix this by editing the CSS padding for inline code or remove the nested formatting like we do in botocore.

After discussing with Jonathan offline, we agreed that this is a better state than what our boto3/botocore docs are currently doing so we'll keep the pandoc behavior for now.

This an example of the Mkdocs rendering of inline code with nested formatting:
Screenshot 2025-11-20 at 11 12 24 AM

Test Cases

I added the test cases to MarkdownConverterTest.java and confirmed they all pass. The only modification necessary was trimming href links with whitespaces. I added this in a preprocessing function that will clean up our input before passing it into pandoc.

For future context, these tests were added due to issues found in two services:

  • KMS - The service documentation has a link tag with nested italic formatting that lead to rendering issues in previous botocore versions (example). However, pandoc handles this without any modification:
Screenshot 2025-11-20 at 11 21 20 AM
  • IAM - The GetOrganizationsAccessReport operation has inline code with a nested link that does not get rendered in previous botocore versions(example). Pandoc also handles this without modification and preserves the link:
Screenshot 2025-11-20 at 11 22 07 AM

Note about fullname tags

The KMS service description also has a <fullname> element at the beginning which is ignored in boto3/botocore docs. I added logic to also remove these tags in preprocessing.

:param plugins: A list of callables that modify the configuration dynamically. These
can be used to set defaults, for example.""", rstDocs);
});
.orElse("Client for " + service.getId().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change from .orElse("Client for " + serviceSymbol.getName());?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a minor improvement to the default client description in cases where a Smithy service does not have a service-level documentation trait.

serviceSymbol.getName() returns the client class name so the default description looks like "Client for BedrockRuntimeClient" which seems wrong to me.

This switches the default docstring to use the service id. For Bedrock Runtime, this looks like "Client for AmazonBedrockFrontendService".

I did consider using the AWS sdk id but that would exclude non-AWS services. I also considered using the smithy.api#title trait for the service but its not required so a service without it fails to generate.

This was only a minor improvement I noticed and don't feel too strong about it.

build-backend = "hatchling.build"
[tool.hatch.build.targets.bdist]
[tool.hatch.build]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tool.hatch.build seems to not be recommended by hatch:

Although not recommended, you may define global configuration in the tool.hatch.build table. Keys may then be overridden by target config.

Ref: https://hatch.pypa.io/1.9/config/build/#file-selection

We initially had bdist which doesn't do anything. I'm assuming what we were actually trying to do was tool.hatch.build.targets.wheel.

Given that this wasn't doing anything in the first place AND the default behavior for wheels only includes source code, I don't think we actually need this.

The source distribution currently does include tests (and examples for transcribe) which could be removed to make things slimmer, however, keeping these seems fine to me for now tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this section in new commit.

"--from=" + fromFormat,
"--to=" + toFormat,
"--wrap=auto",
"--columns=72");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

72 is pandoc's default character limit for text wrapping. I've explicitly specified it here (rather than relying on the default) to make this behavior visible to future developers who may need to debug docstring generation. I extracted this value into a named constant in the most recent commit I pushed up to eliminate the magic number.

I also chose 72 specifically because it provides a 16-character buffer below our 88-character line limit for generated code. I have not seen case where docstrings exceed 3 levels of indentation (12 characters), so this 72-character wrap limit ensures they stay within our enforced line length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants