Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix getPartitionedStats miss subscription's messageAckRate #2

Open
wants to merge 89 commits into
base: master
Choose a base branch
from

Conversation

wangjialing218
Copy link
Owner

No description provided.

RobertIndie and others added 30 commits March 21, 2023 16:17
…apache#19852)

### Motivation

The `onNegativeAcksSend` is only called by:
https://github.com/apache/pulsar/blob/80c5791b87482bee3392308ecef45f455f8de885/pulsar-client/src/main/java/org/apache/pulsar/client/impl/NegativeAcksTracker.java#L83

It will not be null. The statement `null if acknowledge fail.` is incorrect.

### Modifications

* Remove the incorrect statement and refine the doc.

Signed-off-by: Zike Yang <[email protected]>
Co-authored-by: Jun Ma <[email protected]>
…nti-affinity-group policy from topk load bundles (apache#19742)

### Motivation

Raising a PR to implement apache#16691.

Bundles with isolation policies or anti-affinity-group policies are not ideal targets to auto-unload as destination brokers are limited. 

### Modifications
This PR 
- Excluded bundles with isolation policy or anti-affinity-group policy from topk load bundles
- Introduced a config `loadBalancerSheddingBundlesWithPoliciesEnabled ` to control this behavior.
…eleted (apache#19825)

When deleting the zk node of the cursor, if the exception `MetadataStoreException.NotFoundException`  occurs, the deletion is considered successful.
…pic fails due to enabled geo-replication (apache#19879)

Motivation: As expected, If geo-replication is enabled, a topic cannot be deleted. However deleting that topic returns a 500, and no further info.
Modifications: Make response code to 400 instead of 500 when delete topic fails due to enabled geo-replication
…ioned-topic stat (apache#19942)

### Motivation

Pulsar will merge the variable `PartitionedTopicStatsImpl.replication[x].connected` by the way below when we call `pulsar-admin topics partitioned-stats`

``` java
this.connected = this.connected & other.connected
```

But the variable `connected` of `PartitionedTopicStatsImpl.replication` is initialized `false`, so the expression `this.connected & other.connected` will always be `false`.

Then we will always get the value `false` if we call `pulsar-admin topics partitioned-stats`.

### Modifications

make the variable `` of `PartitionedTopicStatsImpl` is initialized `true`
…pache#19851)

PIP: apache#16691

### Motivation
Raising a PR to implement apache#16691.

We need to support delete namespace bundle admin API.

### Modifications

* Support delete namespace bundle admin API.
* Add units test.
Master Issue: Master Issue: apache#16691, apache#18099

### Motivation

Raising a PR to implement Master Issue: apache#16691, apache#18099

We want to reduce unload frequencies from flaky traffic.

### Modifications
This PR 
- Introduced a config `loadBalancerSheddingConditionHitCountThreshold` to further restrict shedding conditions based on the hit count.
- Normalized offload traffic
- Lowered the default `loadBalanceSheddingDelayInSeconds` value from 600 to 180, as 10 mins are too long. 3 mins can be long enough to catch the new load after unloads.
- Changed the config `loadBalancerBundleLoadReportPercentage` to `loadBalancerMaxNumberOfBundlesInBundleLoadReport` to make the topk bundle count absolute instead of relative.
- Renamed `loadBalancerNamespaceBundleSplitConditionThreshold` to `loadBalancerNamespaceBundleSplitConditionHitCountThreshold` to be consistent with `*ConditionHitCountThreshold`.
- Renamed `loadBalancerMaxNumberOfBrokerTransfersPerCycle ` to `loadBalancerMaxNumberOfBrokerSheddingPerCycle`.
- Added LoadDataStore cleanup logic in BSC monitor.
- Added `msgThroughputEMA` in BrokerLoadData to smooth the broker throughput info.
- Updated Topk bundles sorted in a ascending order (instead of descending)
- Update some info logs to only show in the debug mode.
- Added load data tombstone upon Own, Releasing, Splitting
- Added the bundle ownership(isOwned) check upon split and unload.
- Added swap unload logic
heesung-sn and others added 27 commits April 10, 2023 10:27
PIP: apache#19771 

### Motivation

This is the primary PR for PIP 257 (apache#19771). It adds an OpenID Conenct `AuthenticationProvider` implementation. The implementation is intended to be compliant with the OpenID Specs defined here: https://openid.net/developers/specs/. We specifically implement the discovery and these two:

> * [OpenID Connect Core](https://openid.net/specs/openid-connect-core-1_0.html) – Defines the core OpenID Connect functionality: authentication built on top of OAuth 2.0 and the use of claims to communicate information about the End-User
> * [OpenID Connect Discovery](https://openid.net/specs/openid-connect-discovery-1_0.html) – Defines how clients dynamically discover information about OpenID Providers

### Modifications

* Add new module `pulsar-broker-auth-oidc`
* Add implementation that relies on auth0 client libraries to verify the signature and claims of the JWT
* Use async http client for all http requests
* Cache the provider metadata and the JWKS results
* Support different types of `FallbackDiscoveryMode`s, as documented in the code. Essentially, this setting allows users to more easily integrate with k8s. We need this coupling with kubernetes to deal with some of the nuances of the k8s implementation. Note that this part of the code is experimental and is subject to change as requirements and cloud provider implementations change. One important reason we use the k8s client is because the API Server requires special configuration for authentication and TLS. Since these do not appear to be generic requirements, the K8s client simplifies this integration. Here is a reference to the decision that requires authentication by default for getting OIDC info kubernetes/kubernetes#80724. That discussion also indicate to me that this is an isolated design decision in k8s. If we find that authentication is a generic requirement, we should easily be able to expand the existing feature at a later time.
* Add metrics to help quantify success and failure. (I had thought I would add audit logging, but that is an independent feature that we can add to the Pulsar framework. It seems outside the scope of an Authentication Provider implementation to implement this feature.)

### Verifying this change

There are many new tests to cover this new implementation. Some of the tests are unit tests while others are integration tests that rely on Wire Mock to return the public key information.

### Documentation

- [x] `doc-required` 

This feature will need new docs.

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#35
### Motivation

With BK 4.16, we don't need to pass `SafeRunnable` instances to the `OrderedExecutor` anymore. The executor has embedded the logic of checking and logging exceptions.

Removing the SafeRun will avoid extra allocations in critical path and clutter of the code

### Verifying this change

- [ ] Make sure that the change passes the CI checks.

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads (10MB)*
  - *Extended integration test for recovery after broker failure*

### Does this pull request potentially affect one of the following parts:

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

*If the box was checked, please highlight the changes*

- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment

### Documentation

<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->

- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

### Matching PR in forked repository

PR in forked repository: (merlimat#6)

<!--
After opening this PR, the build in apache/pulsar will fail and instructions will
be provided for opening a PR in the PR author's forked repository.

apache/pulsar pull requests should be first tested in your own fork since the 
apache/pulsar CI based on GitHub Actions has constrained resources and quota.
GitHub Actions provides separate quota for pull requests that are executed in 
a forked repository.

The tests will be run in the forked repository until all PR review comments have
been handled, the tests pass and the PR is approved by a reviewer.
-->
Motivation: PulsarClient using ExecutorService as ScheduledExecutorService

Modifications: Use exactly ScheduledExecutorService
…e#19514)

Motivation: If automatic topic creation is enabled, producers and consumers can create topics with long names, but once created, such topics cannot be managed by the Admin API, it will respond "414 URI is too large".

Modifications: add a config `pulsar.conf.httpMaxRequestHeaderSize` to make the jetty can accept the request with a longer Uri
### Motivation
After PR apache#8611, the acquiring permits can be larger than configured msg-rate if used by subscribing. But doc was not updated in time.

### Modifications

fix the outdated doc
…ersistentTopic (apache#19737)

part-1 of PIP-240: add a new method unloadSubscription( String subName ) for PersistentTopic
…ls (apache#20031)

### Motivation
After apache#15868, we allow `PULSAR_MEM` & `PULSAR_GC` to be overridden in `pulsar_tool_env.sh`.

Many users set `-Xms` to `2G` or larger in `PULSAR_MEM`, this will make the tools(such as `pulsar-admin`) cost a lot of memory, and if users execute `pulsar-admin` or another tool on the machine where the Broker is deployed, the current device will not have enough memory to allocate, resulting in a broker crash.

### Modifications

When `PULSAR_MEM` is overridden  in `pulsar_tool_env.sh`, delete parameter `-Xms`
PIP apache#19623 
Relates to apache#19540

### Motivation

In order to get more information about connections, it is helpful for the proxy to supply its version to the broker.

### Modifications

* Add `proxy_version` field to the `CommandConnect` protobuf message 
* Update proxy and broker to handle this new field

### Verifying this change

New tests are added with this PR.

### Does this pull request potentially affect one of the following parts:

- [x] The binary protocol

This will be submitted as part of a PIP.

### Documentation

- [x] `doc-not-needed`
Fixes: apache#10816
PIP: apache#19771
Supersedes: apache#19026
Depends on: apache#20062

### Motivation

The Pulsar Proxy does not properly handle authentication data refresh when in state `ProxyLookupRequests`. The consequence is described in apache#10816. Essentially, the problem is that the proxy caches stale authentication data and sends it to the broker leading to connection failures.

apache#17831 attempted to fix the underlying problem, but it missed an important edge cases. Specifically, it missed the case that the `ConnectionPool` will have multiple connections when a lookup gets redirected. As such, the following problem exists (and is fixed by this PR):

1. Client opens connection to perform lookups.
2. Proxy connects to broker 1 to get the topic ownership info.
3. Time passes.
4. Client does an additional lookup, and this topic is on a newly created broker 2. In this case, the proxy opens a new connection with the stale client auth data.
5. Broker 2 rejects the connection because it fails with expired authentication.

### Modifications

* Remove some of the implementation from apache#17831. This new implementation still allows a broker to challenge the client through the proxy, but notably, it limits the number of challenges sent to the client. Further, the proxy does not challenge the client when the auth data is not expired.
* Introduce authentication refresh in the proxy so that the proxy challenges the client any time the auth data is expired.
* Update the `ProxyClientCnx` to get the `clientAuthData` from the `ProxyConnection` to ensure that it gets new authentication data.
* Add clock skew to the `AuthenticationProviderToken`. This is necessary to make some of the testing not flaky and it will also be necessary for users to configure in their clusters.

### Verifying this change

The `ProxyRefreshAuthTest` covers the existing behavior and I expanded it to cover the edge case described above.

Additionally, testing this part of the code will be much easier to test once we implement apache#19624.

### Documentation

- [x] `doc-not-needed`

### Matching PR in forked repository

PR in forked repository: the relevant tests pass locally, so I am going to skip the forked tests.
apache#20070)

### Motivation

BK deployed with Pulsar fail on start with
```
ERROR org.apache.bookkeeper.common.component.AbstractLifecycleComponent - Failed to start Component: http-service
java.lang.NoSuchMethodError: 'io.vertx.core.Future io.vertx.core.Vertx.deployVerticle(io.vertx.core.Verticle)'
	at org.apache.bookkeeper.http.vertx.VertxHttpServer.startServer(VertxHttpServer.java:93) ~[org.apache.bookkeeper.http-vertx-http-server-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.server.service.HttpService.doStart(HttpService.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[com.google.guava-guava-31.0.1-jre.jar:?]
	at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:84) ~[org.apache.bookkeeper-bookkeeper-common-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.server.Main.doMain(Main.java:224) ~[org.apache.bookkeeper-bookkeeper-server-4.16.0.jar:4.16.0]
	at org.apache.bookkeeper.server.Main.main(Main.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.16.0.jar:4.16.0]

```

The root cause is that BK upgraded to 4.16.0 and needs newer version of vertx that introduced API changes apache/bookkeeper#3435

Pulsar forces older version of dependency.

### Modifications

Upgrade Vertx to the same version as BK
…ocksDB dependency (apache#20072)

### Motivation
BookKeeper has upgraded the RocksDB dependency to 7.9.2, related discussion:
https://lists.apache.org/thread/8j90y4vrvgz1nvt5pb0xdjjy3o8z57z7

apache/bookkeeper#3734

However, Pulsar also has the RocksDB dependency and it will override the BookKeeper's RocksDB dependency version. It will lead to the release package still using the old RocksDB version (6.29.4.1)

### Modifications
Upgrade the Pulsar's RocksDB dependency to 7.9.2 to keep sync with the BookKeeper's RocksDB dependency.
@Technoboy- Technoboy- force-pushed the branch-partition-stats branch from 1ea2398 to 77ba342 Compare April 12, 2023 14:37
@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.