Skip to content

fix(feishu): retry sends with a fresh tenant token#401

Open
xukp20 wants to merge 1 commit intochenhg5:mainfrom
xukp20:fix/feishu-token-retry
Open

fix(feishu): retry sends with a fresh tenant token#401
xukp20 wants to merge 1 commit intochenhg5:mainfrom
xukp20:fix/feishu-token-retry

Conversation

@xukp20
Copy link
Copy Markdown
Contributor

@xukp20 xukp20 commented Apr 1, 2026

Summary

Fix Feishu/Lark outgoing replies failing with 99991663 Invalid access token after long uptime.

Partially addresses #395.

What changed

  • add a Feishu-specific recovery path for outgoing API calls
  • detect tenant token invalid failures (99991663 / Invalid access token)
  • fetch a fresh tenant access token explicitly
  • replay the failed request once with a cache-disabled client and the fresh token
  • apply this recovery consistently to:
    • normal replies
    • new message sends
    • media sends/uploads
    • preview message create/update/delete
    • interactive card sends/replies
  • add regression tests covering:
    • reply retry after stale cached token
    • create/send retry after stale cached token
    • non-token errors do not trigger refresh
    • token-invalid error detection

Why

The current Lark SDK does retry once on 99991663, but in practice that is not always enough.

The root cause is that tenant token cache remains enabled, and on retry the SDK can still reuse the same stale cached token. In addition, request-level tenant token override is not sufficient when token cache is enabled.

So cc-connect now adds a narrow recovery layer on top of the SDK:

  • normal path still uses the default SDK client and cache
  • only when token-invalid is detected, cc-connect explicitly fetches a fresh token
  • the failed request is replayed once through a cache-disabled client

This keeps the fix small and targeted while making outgoing Feishu sends much more resilient.

Scope

This PR only fixes the Feishu/Lark outgoing message failure caused by stale tenant access tokens.

It does not attempt to address the other symptoms mentioned in #395, such as Telegram polling conflicts or unexpected restarts, which may have different root causes.

Validation

  • pnpm install --frozen-lockfile && pnpm build in web/
  • go build ./...
  • go test ./... -v -race
  • go test ./... -coverprofile=coverage.out -covermode=atomic
  • $(go env GOPATH)/bin/actionlint -color
  • go test -v -tags=smoke,no_web ./tests/e2e/...
  • go test -v -tags=regression,no_web ./tests/e2e/...
  • go test -bench=. -benchmem -tags=performance,no_web ./tests/performance/...

Copy link
Copy Markdown
Owner

@chenhg5 chenhg5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Robust fix for Feishu tenant token issues.

Review summary:

  • ✅ Targeted recovery for stale token errors
  • ✅ Fresh token fetch + cache-disabled client replay
  • ✅ Comprehensive test coverage
  • ✅ CI passes
  • ✅ Fixes #395 (partial)

Good resilience improvement for Feishu sends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants