Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate ways to reduce final jar size #411

Closed
aajtodd opened this issue Nov 10, 2021 · 3 comments
Closed

Investigate ways to reduce final jar size #411

aajtodd opened this issue Nov 10, 2021 · 3 comments
Labels
feature-request A feature should be added or improved. no-auto-closure We do not want this issue to be automatically closed.

Comments

@aajtodd
Copy link
Contributor

aajtodd commented Nov 10, 2021

The dynamodb jar size is ~5MB, which is approximately twice as big as aws-sdk-java-v2.

The task here is to investigate ways to reduce the final size. I did some preliminary investigation and there is definitely some low hanging fruit.

@aajtodd
Copy link
Contributor Author

aajtodd commented Nov 10, 2021

Investigation Results

Below are quick investigations into dynamodb class files and some opportunities identified to reduce the overall size

tl;dr
Some things contributing to our overall size

  • compiler generated state machines to support suspend at the deserializer level (which we don't even use)
  • backing classes for lambdas in operation middleware (and probably elsewhere)
  • Model classes still need investigated, some seem large for what they contain (e.g. CreateTableRequest.class is 10kb)

Use of suspend in deserializers

> ls CreateTableOperationDeserializerKt*
CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2.class
CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1.class
CreateTableOperationDeserializerKt$throwCreateTableError$1.class
CreateTableOperationDeserializerKt.class

> javap CreateTableOperationDeserializerKt*

Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2 extends kotlin.coroutines.jvm.internal.SuspendLambda implements kotlin.jvm.functions.Function2<aws.smithy.kotlin.runtime.serde.Deserializer$FieldIterator, kotlin.coroutines.Continuation<? super kotlin.Unit>, java.lang.Object> {
  java.lang.Object L$1;
  int label;
  final aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor $TABLEDESCRIPTION_DESCRIPTOR;
  final aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder $builder;
  final aws.smithy.kotlin.runtime.serde.json.JsonDeserializer $deserializer;
  aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2(aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor, aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder, aws.smithy.kotlin.runtime.serde.json.JsonDeserializer, kotlin.coroutines.Continuation<? super aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$2>);
  public final java.lang.Object invokeSuspend(java.lang.Object);
  public final kotlin.coroutines.Continuation<kotlin.Unit> create(java.lang.Object, kotlin.coroutines.Continuation<?>);
  public final java.lang.Object invoke(aws.smithy.kotlin.runtime.serde.Deserializer$FieldIterator, kotlin.coroutines.Continuation<? super kotlin.Unit>);
  public java.lang.Object invoke(java.lang.Object, java.lang.Object);
}


Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.smithy.kotlin.runtime.serde.SdkObjectDescriptor$DslBuilder, kotlin.Unit> {
  final aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor $TABLEDESCRIPTION_DESCRIPTOR;
  aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$deserializeCreateTableOperationBody$OBJ_DESCRIPTOR$1(aws.smithy.kotlin.runtime.serde.SdkFieldDescriptor);
  public final void invoke(aws.smithy.kotlin.runtime.serde.SdkObjectDescriptor$DslBuilder);
  public java.lang.Object invoke(java.lang.Object);
}


Compiled from "CreateTableOperationDeserializer.kt"
final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1 extends kotlin.coroutines.jvm.internal.ContinuationImpl {
  java.lang.Object L$0;
  java.lang.Object L$1;
  java.lang.Object result;
  int label;
  aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1(kotlin.coroutines.Continuation<? super aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt$throwCreateTableError$1>);
  public final java.lang.Object invokeSuspend(java.lang.Object);
}


Compiled from "CreateTableOperationDeserializer.kt"
public final class aws.sdk.kotlin.services.dynamodb.transform.CreateTableOperationDeserializerKt {
  public static final java.lang.Object access$throwCreateTableError(aws.smithy.kotlin.runtime.client.ExecutionContext, aws.smithy.kotlin.runtime.http.response.HttpResponse, kotlin.coroutines.Continuation);
  public static final java.lang.Object access$deserializeCreateTableOperationBody(aws.sdk.kotlin.services.dynamodb.model.CreateTableResponse$DslBuilder, byte[], kotlin.coroutines.Continuation);
}

Looking at what's generated for one operation deserializer. There is overhead due to backing state machines for implementing deserialization as suspend. Doesn't explain the model package size since those have no suspend functionality.

A quick test to remove suspend from our deserializer interface and codegen:

10064 -rw-r--r--  1 todaaron  staff   4.9M Oct 27 14:52 dynamodb-0.8.1-SNAPSHOT.jar
10248 -rw-r--r--  1 todaaron  staff   4.6M Nov  9 09:38 dynamodb-0.9.2-SNAPSHOT.jar

I wasn't able to remove all uses of it so could be even smaller potentially. Every suspend function is generating a backing class for the state machine. We don't actually use suspend in our deserializer. It was added with the hope that we could literally process bytes off the wire as they come but the work on replacing gson showed that using suspend at the tokenizer level resulted in terrible performance. I doubt we'll ever implement deserialization using suspend outside of a custom use case since 99% of documents that come back are going to be small enough that reading it into memory and deserializing it directly from ByteArray is the right move (and larger payloads should be paginated anyway).

If we needed to support incremental parsing/tokenization we would adapt the tokenizer to deal with working off chunks rather than use suspend. The tokenizer could return e.g. a sealed class that indicates a token or incomplete and needing more data.

Operation Middleware

Operation middleware classes (uncompressed) add up to 1.1M. Default client is 700K. Wondering if we can't find a different way to generate this

internal fun registerCreateTableMiddleware(config: DynamoDbClient.Config, op: SdkHttpOperation<CreateTableRequest,CreateTableResponse>) {
    op.apply {
        install(ResolveAwsEndpoint) {
            serviceId = ServiceId
            resolver = config.endpointResolver
        }
        install(RetryFeature) {
            strategy = config.retryStrategy
            policy = AwsDefaultRetryPolicy
        }
        install(AwsJsonProtocol) {
            serviceShapeName = "DynamoDB_20120810"
            version = "1.0"
        }
        install(UserAgent) {
            staticMetadata = awsUserAgentMetadata
        }
        install(AwsSigV4SigningMiddleware) {
            this.credentialsProvider = config.credentialsProvider
            this.signingService = "dynamodb"
        }
    }
}
> javap OperationMiddlewareKt\$registerCreateTableMiddleware*     
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$1 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.http.middleware.ResolveAwsEndpoint$Config, kotlin.Unit> {
  final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
  aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$1(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
  public final void invoke(aws.sdk.kotlin.runtime.http.middleware.ResolveAwsEndpoint$Config);
  public java.lang.Object invoke(java.lang.Object);
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$2 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.smithy.kotlin.runtime.http.middleware.Retry$Config, kotlin.Unit> {
  final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
  aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$2(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
  public final void invoke(aws.smithy.kotlin.runtime.http.middleware.Retry$Config);
  public java.lang.Object invoke(java.lang.Object);
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.protocol.json.AwsJsonProtocol$Config, kotlin.Unit> {
  public static final aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3 INSTANCE;
  aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$3();
  public final void invoke(aws.sdk.kotlin.runtime.protocol.json.AwsJsonProtocol$Config);
  public java.lang.Object invoke(java.lang.Object);
  static {};
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.http.middleware.UserAgent$Config, kotlin.Unit> {
  public static final aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4 INSTANCE;
  aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$4();
  public final void invoke(aws.sdk.kotlin.runtime.http.middleware.UserAgent$Config);
  public java.lang.Object invoke(java.lang.Object);
  static {};
}
Compiled from "OperationMiddleware.kt"
final class aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$5 extends kotlin.jvm.internal.Lambda implements kotlin.jvm.functions.Function1<aws.sdk.kotlin.runtime.auth.signing.AwsSigV4SigningMiddleware$Config, kotlin.Unit> {
  final aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config $config;
  aws.sdk.kotlin.services.dynamodb.OperationMiddlewareKt$registerCreateTableMiddleware$1$5(aws.sdk.kotlin.services.dynamodb.DynamoDbClient$Config);
  public final void invoke(aws.sdk.kotlin.runtime.auth.signing.AwsSigV4SigningMiddleware$Config);
  public java.lang.Object invoke(java.lang.Object);
}

Looks like it's creating class for all of the lambda install methods. We could easily get rid of all of this. I've been thinking about this anyway. We want middleware to be per/operation still but we could also have a class of middleware that exists per/client and is only created once. Most middleware (in fact maybe all of them) don't retain any state. Thus we could have a single instance per client rather than allocating per operation.

We could also revisit the whole Feature interface. Most of the time it's not necessary and we could implement Middleware directly. This would cut down on the number of backing lambda classes that are generated behind the scenes.

@aajtodd
Copy link
Contributor Author

aajtodd commented Jan 13, 2022

as an update here we did end up refactoring the middleware: smithy-lang/smithy-kotlin#536

Looks like we are currently down to ~3.1MB.

The current Java V2 SDK sits at around ~2.2MB.

A few other areas to look into:

  • There are some areas in generated serde that contribute to unnecessary size (e.g. use of this)
  • The way we generate and throw operation errors contributes to the overall size. We could probably just generate a single throwServiceError function rather than per/operation functions (we actually had something similar at one point where all errors were registered in a single place). At least for AWS services since they rely on an errorCode for matching.

I'll leave this open for tracking and +1's but the low hanging fruit is probably gone at this point.

@ianbotsf ianbotsf added the no-auto-closure We do not want this issue to be automatically closed. label Jul 11, 2022
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. no-auto-closure We do not want this issue to be automatically closed.
Projects
None yet
Development

No branches or pull requests

3 participants