Skip to content

[VPP-1398] VPP crashes when deleting tapv2 interface #2862

@vvalderrv

Description

@vvalderrv

Description

I have run into this issue where VPP crashes when VPP ML2 agent tries to delete the tapv2 interface during cleanup.

 

Here's the VPP logs:

 

Aug 01 15:27:24 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER_DELETE_BY_SW_IF_INDEX bitmap: 0, clear_all: 0

Aug 01 15:27:24 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER: thread 0, pending clear bitmap: 0

Aug 01 15:27:24 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER: thread 4294967295, pending clear bitmap: 0

Aug 01 15:27:24 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: CLEANER mains len: 2 per-worker len: 2

Aug 01 15:27:24 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_NODE_CLEAN: cleaning done

Aug 01 15:27:26 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER_DELETE_BY_SW_IF_INDEX bitmap: 0, clear_all: 0

Aug 01 15:27:26 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER: thread 0, pending clear bitmap: 0

Aug 01 15:27:26 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_CLEANER: thread 4294967295, pending clear bitmap: 0

Aug 01 15:27:26 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: CLEANER mains len: 2 per-worker len: 2

Aug 01 15:27:26 overcloud-controller-0.opnfvlf.org vnet[437543]: acl_plugin: ACL_FA_NODE_CLEAN: cleaning done

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: received signal SIGSEGV, PC 0x7f4370ce9c9a, faulting address 0xcc

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #0  0x00007f43716186a5 0x7f43716186a5

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #1  0x00007f436f9ca6d0 0x7f436f9ca6d0

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #2  0x00007f4370ce9c9a 0x7f4370ce9c9a

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #3  0x00007f432dc1ef7c dpdk_buffer_free_avx2 + 0xbdc

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #4  0x00007f43710afb92 virtio_free_used_desc + 0x92

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #5  0x00007f43710d3eab virtio_vring_free + 0x33b

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #6  0x00007f43710d7b19 tap_delete_if + 0x119

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org vnet[437543]: #7  0x00007f43710d85e6 0x7f43710d85e6

Aug 01 15:27:27 overcloud-controller-0.opnfvlf.org systemd[1]: vpp.service: main process exited, code=killed, status=6/ABRT

 

 

 

Here's the gdb backtrace:

 

Program received signal SIGSEGV, Segmentation fault.

0x00007ffff7061c9a in replication_recycle_callback (vm=0x7ffff7bacf80 <vlib_global_main>, fl=0x7fffb569b700) at /usr/src/debug/vpp-18.07/src/vnet/replication.c:181

181           feature_node_index = ctx->recycle_node_index;

(gdb) bt

#0  0x00007ffff7061c9a in replication_recycle_callback (vm=0x7ffff7bacf80 <vlib_global_main>, fl=0x7fffb569b700) at /usr/src/debug/vpp-18.07/src/vnet/replication.c:181

#1  0x00007fffb3f96f7c in vlib_buffer_free_inline (follow_buffer_next=1, n_buffers=, buffers=, vm=)

    at /w/workspace/vpp-merge-1807-centos7/build-root/rpmbuild/vpp-18.07/build-data/../src/plugins/dpdk/buffer.c:388

#2  dpdk_buffer_free_avx2 (vm=, buffers=, n_buffers=)

    at /w/workspace/vpp-merge-1807-centos7/build-root/rpmbuild/vpp-18.07/build-data/../src/plugins/dpdk/buffer.c:398

#3  0x00007ffff7427b92 in vlib_buffer_free (n_buffers=1, buffers=, vm=0x7ffff7bacf80 <vlib_global_main>) at /usr/src/debug/vpp-18.07/src/vlib/buffer_funcs.h:544

#4  virtio_free_used_desc (vm=vm@entry=0x7ffff7bacf80 <vlib_global_main>, vring=vring@entry=0x7fffb7336fc0) at /usr/src/debug/vpp-18.07/src/vnet/devices/virtio/device.c:115

#5  0x00007ffff744beab in virtio_vring_free (vm=vm@entry=0x7ffff7bacf80 <vlib_global_main>, vif=vif@entry=0x7fffb7337a80, idx=)

    at /usr/src/debug/vpp-18.07/src/vnet/devices/virtio/virtio.c:165

#6  0x00007ffff744fb19 in tap_delete_if (vm=0x7ffff7bacf80 <vlib_global_main>, sw_if_index=sw_if_index@entry=4) at /usr/src/debug/vpp-18.07/src/vnet/devices/tap/tap.c:447

#7  0x00007ffff74505e6 in vl_api_tap_delete_v2_t_handler (mp=0x30172084) at /usr/src/debug/vpp-18.07/src/vnet/devices/tap/tapv2_api.c:166

#8  0x00007ffff7bb61d3 in vl_msg_api_handler_with_vm_node (am=am@entry=0x7ffff7dda000 <api_main>, the_msg=0x30172084, vm=vm@entry=0x7ffff7bacf80 <vlib_global_main>,

    node=node@entry=0x7fffb5943000) at /usr/src/debug/vpp-18.07/src/vlibapi/api_shared.c:508

#9  0x00007ffff7bbdcb5 in void_mem_api_handle_msg_i (am=, q=, node=0x7fffb5943000, vm=0x7ffff7bacf80 <vlib_global_main>)

    at /usr/src/debug/vpp-18.07/src/vlibmemory/memory_api.c:687

#10 vl_mem_api_handle_msg_main (vm=vm@entry=0x7ffff7bacf80 <vlib_global_main>, node=node@entry=0x7fffb5943000) at /usr/src/debug/vpp-18.07/src/vlibmemory/memory_api.c:697

#11 0x00007ffff7bcce1c in vl_api_clnt_process (vm=, node=0x7fffb5943000, f=) at /usr/src/debug/vpp-18.07/src/vlibmemory/vlib_api.c:349

#12 0x00007ffff7956836 in vlib_process_bootstrap (_a=) at /usr/src/debug/vpp-18.07/src/vlib/main.c:1231

#13 0x00007ffff6483068 in clib_calljmp () at /usr/src/debug/vpp-18.07/src/vppinfra/longjmp.S:110

#14 0x00007fffb5b4ce30 in ?? ()

#15 0x00007ffff7957b29 in vlib_process_startup (f=0x0, p=0x7fffb5943000, vm=0x7ffff7bacf80 <vlib_global_main>) at /usr/src/debug/vpp-18.07/src/vlib/main.c:1253

#16 dispatch_process (vm=0x7ffff7bacf80 <vlib_global_main>, p=0x7fffb5943000, last_time_stamp=9708908937542267, f=0x0) at /usr/src/debug/vpp-18.07/src/vlib/main.c:1298

#17 0x0000000000000000 in ?? ()

(gdb)

Assignee

Mohsin Kazmi

Reporter

Onong Tayeng

Comments

  • sykazmi (Wed, 20 Feb 2019 12:54:08 +0000):

    API custom dump trace:

https://pastebin.com/PSdMynkQ

  • ot (Tue, 19 Feb 2019 15:11:12 +0000): Yes, I still see the issue with 19.01.
  • jhahn (Sun, 17 Feb 2019 23:08:31 +0000): Onong Tayeng Is this still an issue in 19.01?
  • ot (Thu, 16 Aug 2018 17:14:06 +0000): VPP version = 18.07 RC1
  • ot (Thu, 16 Aug 2018 17:13:33 +0000): The API trace which reproduces the issue.

Original issue: https://jira.fd.io/browse/VPP-1398

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions