Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bthread在rq is full时应先运行已有的job #204

Open
jamesge opened this issue Jan 15, 2018 · 7 comments
Open

bthread在rq is full时应先运行已有的job #204

jamesge opened this issue Jan 15, 2018 · 7 comments
Labels
enhancement improvements on existing features official created by brpc authors

Comments

@jamesge
Copy link
Contributor

jamesge commented Jan 15, 2018

而不是sleep-spin(可能死锁)。
Update: bthread中的ready_to_run等一系列函数接口需要重新设计,主要是要传入TaskGroup** pg,也就是说这些函数可能导致调用的bthread做上下文切换。如果这么实现的话,一方面规避了可能的死锁,另一方面对创建bthread的频率做了throttle,wsq的capacity可以进一步调小。

@jamesge jamesge added official created by brpc authors enhancement improvements on existing features labels Jan 15, 2018
@cool-colo
Copy link

E0121 00:11:10.885643 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:11.515247 22548 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:11.885922 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:12.515522 22551 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:12.886600 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:13.515552 22552 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:13.887511 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:14.515688 22552 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:14.887925 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:15.515926 22551 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:15.889062 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:16.516187 22548 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:16.889631 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:17.516273 22552 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:17.890579 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:18.516994 22551 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:18.890707 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:19.517732 22551 task_group.cpp:665] _remote_rq is full, capacity=2048
E0121 00:11:19.890990 22582 task_group_inl.h:88] _rq is full, capacity=4096
E0121 00:11:20.518440 22549 task_group.cpp:665] _remote_rq is full, capacity=2048

上周末程序core了一整天,以前也偶尔出现,机器负载高了就会这现这问题,coredump之前都是这个错误, 难道是这个bug导致的?

@jamesge
Copy link
Contributor Author

jamesge commented Jan 22, 2018

rq is full不会导致直接的问题,coredump可能还是和你的逻辑有关系

@cool-colo
Copy link

我之前统计过正在进行中的bthread的数量, 通过在函数入口为counter+1, 出口-1来实现。
发现程序出问题的时候counter固定在一个比较大的数字不变化(正常的瞬时值只有几十几百这样)
这个现象应该和你说的死锁比较吻合吧。
coredump是内存暴了, 程序是负责转发内网请求至外网的,异步实现的,bthread负责的水池出口堵死了, 入口还源源不断有请求进来。

@brianjcj
Copy link

brianjcj commented Dec 7, 2018

我们压测也遇到了死锁的问题, 一直打_rq is full和_remote_rq is full这两条log。 什么请求都处理不了。

@GOGOYAO
Copy link
Contributor

GOGOYAO commented Mar 15, 2022

这个问题,总觉得会成为brpc以后最坑的点

@serverglen
Copy link
Contributor

  1. 可以将gflags stask_group_runqueue_capacity改大一点
  2. 将bthread work线程设置大一点
  3. 有可能是因为在业务回调用使用了阻塞的pthread API导致bthread将所有的bthread work阻塞住了

@kevinfzs
Copy link

  1. 可以将gflags stask_group_runqueue_capacity改大一点
  2. 将bthread work线程设置大一点
  3. 有可能是因为在业务回调用使用了阻塞的pthread API导致bthread将所有的bthread work阻塞住了

@serverglen 想请教下如果将gflags stask_group_runqueue_capacity和bthread work线程改大一点,理论是也不能从根本上解决问题吧,当并发大的时候,还是会陷入阻塞状态吧,你们是怎么解决的?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement improvements on existing features official created by brpc authors
Projects
None yet
Development

No branches or pull requests

6 participants