Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiKV v5.3.0 server is not compatible with higher version clients #646

Open
Smityz opened this issue Dec 22, 2022 · 1 comment
Open

TiKV v5.3.0 server is not compatible with higher version clients #646

Smityz opened this issue Dec 22, 2022 · 1 comment

Comments

@Smityz
Copy link
Contributor

Smityz commented Dec 22, 2022

Due to the change in the proto number, I have found it difficult to achieve a seamless upgrade when using the BatchPut interface.

you can use v5.3.0 client(2fd3841) to access v5.4.0 tikv server to reproduce this problem.
And you will get error messages like

[2022/12/22 19:53:16.458 +08:00] [INFO] [client_batch.go:697] ["batchRecvLoop re-create streaming success"] [target=127.0.0.1:20160] [forwardedHost=]
[2022/12/22 19:53:16.458 +08:00] [WARN] [region_request.go:1263] ["receive a grpc cancel signal from remote"] [error="rpc error: code = Canceled desc = CANCELLED"] [errorVerbose="rpc error: code = Canceled desc = CANCELLED\ngithub.com/tikv/client-go/v2/internal/client.sendBatchRequest\n\t/Users/yanzhao.tang/Code/client-go-1/internal/client/client_batch.go:789\ngithub.com/tikv/client-go/v2/internal/client.(*RPCClient).sendRequest\n\t/Users/yanzhao.tang/Code/client-go-1/internal/client/client.go:497\ngithub.com/tikv/client-go/v2/internal/client.(*RPCClient).SendRequest\n\t/Users/yanzhao.tang/Code/client-go-1/internal/client/client.go:540\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).sendReqToRegion\n\t/Users/yanzhao.tang/Code/client-go-1/internal/locate/region_request.go:1190\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReqCtx\n\t/Users/yanzhao.tang/Code/client-go-1/internal/locate/region_request.go:1023\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionRequestSender).SendReq\n\t/Users/yanzhao.tang/Code/client-go-1/internal/locate/region_request.go:233\ngithub.com/tikv/client-go/v2/rawkv.(*Client).doBatchPut\n\t/Users/yanzhao.tang/Code/client-go-1/rawkv/rawkv.go:903\ngithub.com/tikv/client-go/v2/rawkv.(*Client).sendBatchPut.func1\n\t/Users/yanzhao.tang/Code/client-go-1/rawkv/rawkv.go:869\nruntime.goexit\n\t/usr/local/Cellar/go/1.19.3/libexec/src/runtime/asm_amd64.s:1594"]
[2022/12/22 19:53:16.460 +08:00] [WARN] [backoff.go:120] ["tikvRPC backoffer.maxSleep 40000ms is exceeded, errors:\nsend tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 2, meta: id:2 region_epoch:<conf_ver:1 version:1 > peers:<id:3 store_id:1 > , peer: id:3 store_id:1 , addr: 127.0.0.1:20160, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-22T19:53:12.326711+08:00\nsend tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 2, meta: id:2 region_epoch:<conf_ver:1 version:1 > peers:<id:3 store_id:1 > , peer: id:3 store_id:1 , addr: 127.0.0.1:20160, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-22T19:53:13.923517+08:00\nsend tikv request error: rpc error: code = Canceled desc = CANCELLED, ctx: region ID: 2, meta: id:2 region_epoch:<conf_ver:1 version:1 > peers:<id:3 store_id:1 > , peer: id:3 store_id:1 , addr: 127.0.0.1:20160, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv, try next peer later at 2022-12-22T19:53:15.127061+08:00\nlongest sleep type: tikvRPC, time: 40039ms"]

In v5.3.0, the proto is like

// version: 0f5764a128ad
message RawBatchPutRequest {
    Context context = 1;
    repeated KvPair pairs = 2;
    string cf = 3;
    repeated uint64 ttls = 4;
    bool for_cas = 5;
}

But in the higher version, the proto is changed

// master version
message RawBatchPutRequest {
    Context context = 1;
    repeated KvPair pairs = 2;
    string cf = 3;
    uint64 ttl = 4 [deprecated=true];
    bool for_cas = 5;
    repeated uint64 ttls = 6;
}

Related change: pingcap/kvproto#844

In this case, the old client can only access the old tikv server. But it is inevitable that half of the tikv instance will be the old proto and others will be the new proto during the upgrade. This will result in unavailability time when upgrading.

I have tried to introduce a temp version of proto like

// compatible version
message RawBatchPutRequest {
    Context context = 1;
    repeated KvPair pairs = 2;
    string cf = 3;
    repeated uint64 oldttl = 4 [deprecated=true];
    bool for_cas = 5;
    repeated uint64 ttls = 6;
}

If I only set oldttl, it can work with v5.3.0, and if I only set ttls it can work with new versions. But I don't know how to change this message when meeting compatible errors.

So I would like to ask if there is a good way to upgrade the process without downtime in this version. I would be extremely grateful for any help that is offered.

@sticnarf
Copy link
Collaborator

cc @iosmanthus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants