Ticket #82 (closed defect: fixed)

Opened 2 years ago

Last modified 18 months ago

Crash in timed-out transfer

Reported by: www.google.com/accounts/o8/id?id=aitoawmz93igogrpknbehv21jdxrf37ft8ksncg Owned by: pbatard
Milestone: Component: libusb-1.0
Keywords: Cc:
Blocked By: Blocks:

Description

Not sure why I'm seeing this crash spuriously. This is a timeout of an HCI bulk-pipe transfer. This is based on 0816743fe70013739d31bac79017a64c7ceb0055 from Pete Batard's master branch. Also seeing the same thing from Ubuntu 10.10 (libusb 1.0.8-2)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
1705 usbi_err(TRANSFER_CTX(transfer),
(gdb) bt
#0 0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
#1 0x00007ffff6a529c0 in op_cancel_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1767
#2 0x00007ffff6a4d19d in libusb_cancel_transfer (transfer=0x6f8ed8) at ../../libusb/io.c:1309
#3 0x00007ffff6a4d91c in handle_timeout (itransfer=0x6f8e80) at ../../libusb/io.c:1707
#4 0x00007ffff6a4da95 in handle_timeouts_locked (ctx=0x6d0620) at ../../libusb/io.c:1760
#5 0x00007ffff6a4db65 in handle_timerfd_trigger (ctx=0x6d0620) at ../../libusb/io.c:1788
#6 0x00007ffff6a4de72 in handle_events (ctx=0x6d0620, tv=0x7fffffffd960) at ../../libusb/io.c:1878
#7 0x00007ffff6a4e07e in libusb_handle_events_timeout_check (ctx=0x6d0620, tv=0x7fffffffd9c0, completed=0x0) at ../../libusb/io.c:1970
#8 0x00007ffff6a4e143 in libusb_handle_events_timeout (ctx=0x6d0620, tv=0x7fffffffd9c0) at ../../libusb/io.c:2005

(gdb) p *transfer
$1 = {dev_handle = 0x6f6fd0, flags = 0 '00', endpoint = 0 '00', type = 0 '00', timeout = 5000, status = LIBUSB_TRANSFER_COMPLETED, length = 30, actual_length = 0,

callback = 0x4380e7 <usb::deviceio_transfer_finished(libusb_transfer*)>, user_data = 0x6f90b0, buffer = 0x6f9328 " ", num_iso_packets = 0, iso_packet_desc = 0x6f8ed8}

Attachments

0001-Add-ability-to-initialize-a-recursive-mutex.patch (1.9 KB) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[1/6]
0002-Add-some-more-transfer-status-flags.patch (1.1 KB) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[2/6]
0003-Fix-the-urb-discard-to-detect-the-device-disappearin.patch (1.0 KB) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[3/6]
0004-Add-book-keeping-for-cancelled-transfers.patch (1.0 KB) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[4/6]
0005-Initialize-the-events-lock-as-recursive.patch (924 bytes) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[5/6]
0006-Fix-the-do_close-path-to-properly-clean-up-transfer-.patch (2.7 KB) - added by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc 2 years ago.
[6/6] v3 - correct condition check logic for printing error messages. doesn't impact actual behaviour.

Download all attachments as: .zip

Change History

comment:1 Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawmz93igogrpknbehv21jdxrf37ft8ksncg

Woops - forgot to format stack trace.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
1705 usbi_err(TRANSFER_CTX(transfer),
(gdb) bt
#0 0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
#1 0x00007ffff6a529c0 in op_cancel_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1767
#2 0x00007ffff6a4d19d in libusb_cancel_transfer (transfer=0x6f8ed8) at ../../libusb/io.c:1309
#3 0x00007ffff6a4d91c in handle_timeout (itransfer=0x6f8e80) at ../../libusb/io.c:1707
#4 0x00007ffff6a4da95 in handle_timeouts_locked (ctx=0x6d0620) at ../../libusb/io.c:1760
#5 0x00007ffff6a4db65 in handle_timerfd_trigger (ctx=0x6d0620) at ../../libusb/io.c:1788
#6 0x00007ffff6a4de72 in handle_events (ctx=0x6d0620, tv=0x7fffffffd960) at ../../libusb/io.c:1878
#7 0x00007ffff6a4e07e in libusb_handle_events_timeout_check (ctx=0x6d0620, tv=0x7fffffffd9c0, completed=0x0) at ../../libusb/io.c:1970
#8 0x00007ffff6a4e143 in libusb_handle_events_timeout (ctx=0x6d0620, tv=0x7fffffffd9c0) at ../../libusb/io.c:2005
(gdb) p *transfer
$1 = {dev_handle = 0x6f6fd0, flags = 0 '00', endpoint = 0 '00', type = 0 '00', timeout = 5000, status = LIBUSB_TRANSFER_COMPLETED, length = 30, actual_length = 0, 
callback = 0x4380e7 <usb::deviceio_transfer_finished(libusb_transfer*)>, user_data = 0x6f90b0, buffer = 0x6f9328 " ", num_iso_packets = 0, iso_packet_desc = 0x6f8ed8}

comment:2 Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawmz93igogrpknbehv21jdxrf37ft8ksncg

Probably related - I can see crashes when running under gdb with breakpoints in handle_control_completion & handle_timeout. I call libusb_get_string_descriptor_ascii quite frequently too. Could it be a race condition?

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
1705                            usbi_err(TRANSFER_CTX(transfer),
(gdb) bt
#0  0x00007ffff6a52728 in cancel_control_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1705
#1  0x00007ffff6a529c0 in op_cancel_transfer (itransfer=0x6f8e80) at ../../libusb/os/linux_usbfs.c:1767
#2  0x00007ffff6a4d19d in libusb_cancel_transfer (transfer=0x6f8ed8) at ../../libusb/io.c:1309
#3  0x00007ffff6a4d91c in handle_timeout (itransfer=0x6f8e80) at ../../libusb/io.c:1707
#4  0x00007ffff6a4da95 in handle_timeouts_locked (ctx=0x6d0620) at ../../libusb/io.c:1760
#5  0x00007ffff6a4db65 in handle_timerfd_trigger (ctx=0x6d0620) at ../../libusb/io.c:1788
#6  0x00007ffff6a4de72 in handle_events (ctx=0x6d0620, tv=0x7fffffffd130) at ../../libusb/io.c:1878
#7  0x00007ffff6a4e07e in libusb_handle_events_timeout_check (ctx=0x6d0620, tv=0x7fffffffd170, completed=0x7fffffffd1dc) at ../../libusb/io.c:1970
#8  0x00007ffff6a4e17c in libusb_handle_events_check (ctx=0x6d0620, completed=0x7fffffffd1dc) at ../../libusb/io.c:2014
#9  0x00007ffff6a4ec30 in libusb_control_transfer (dev_handle=0x6f6f10, bmRequestType=128 '200', bRequest=6 '06', wValue=771, wIndex=1033, data=0x7fffffffd280 "0403	04", wLength=255, timeout=1000)
    at ../../libusb/sync.c:105
#10 0x00007ffff6a4b373 in libusb_get_string_descriptor (dev=0x6f6f10, desc_index=3 '03', langid=1033, data=0x7fffffffd280 "0403	04", length=255) at ../../libusb/libusb.h:1274
#11 0x00007ffff6a4c684 in libusb_get_string_descriptor_ascii (dev=0x6f6f10, desc_index=3 '03', data=0x7fffffffd3b0 "300325377377377177", length=255) at ../../libusb/descriptor.c:697

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

comment:3 Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

Attached 6 patches that when all applied should ensure that this won't happen. I suspect the issue here is as a result of the device disappearing (e.g. usb reset) while transfers are pending.

Valgrind reports a similar looking error:

==11780== Invalid read of size 8
==11780==    at 0x646FBA4: discard_urbs (linux_usbfs.c:1362)
==11780==    by 0x64707C0: op_cancel_transfer (linux_usbfs.c:1748)
==11780==    by 0x646B14D: libusb_cancel_transfer (io.c:1314)
==11780==    by 0x646B8CC: handle_timeout (io.c:1712)
==11780==    by 0x646BA45: handle_timeouts_locked (io.c:1755)
==11780==    by 0x646BB15: handle_timerfd_trigger (io.c:1782)
==11780==    by 0x646BE22: handle_events (io.c:1872)
==11780==    by 0x646C012: libusb_handle_events_timeout (io.c:1961)
==11780==  Address 0xaa51790 is 64 bytes inside a block of size 76 free'd
==11780==    at 0x4C27D71: free (vg_replace_malloc.c:366)
==11780==    by 0x6468629: do_close (core.c:1025)
==11780==    by 0x6468795: libusb_close (core.c:1082)

Changed 2 years ago by www.google.com/accounts/o8/id?id=aitoawkqduuilezm1tu1seo2uz1ndphzkn82lqc

[6/6] v3 - correct condition check logic for printing error messages. doesn't impact actual behaviour.

comment:4 Changed 2 years ago by pbatard

  • Owner set to pbatard
  • Status changed from new to accepted

comment:5 Changed 21 months ago by Vitali Lovich <vlovich@…>

In [c4a3b026a3a7da73101667b7bd6260ca5c70e7b9/libusb-pbatard]:

Add recursive mutexes to threading abstraction

This is necessary for the device close path which needs to attain the
events lock, but which might itself be called while handling an event.
The events lock is necessary to properly clean up transfers which might
still be pointing to the device. References #82.

comment:6 Changed 21 months ago by Vitali Lovich <vlovich@…>

In [b3a2b5104ecde345b3e8c5c824adfc4a9f52aaff/libusb-pbatard]:

Linux: Detect and report that device has gone away while discarding URBs

Return LIBUSB_ERROR_NO_DEVICE if the device is no longer available
(ENODEV) to discard the urbs. References #82.

comment:7 Changed 21 months ago by Vitali Lovich <vlovich@…>

In [1ff39e34d4cfed48182a55cc1b8033206ae42c73/libusb-pbatard]:

Add USBI_TRANSFER_CANCELLING and _DEVICE_DISAPPEARED status flags

The flags are used to indicate if a cancellation has started, and if
a cancellation has failed because the device is no longer available.
References #82.

comment:8 Changed 21 months ago by Vitali Lovich <vlovich@…>

In [5c87b9c617efe62037078b3489aab53e74dea584/libusb-pbatard]:

Clean up in-flight transfers and device handle when closing a device

Any in-flight transfers should properly invalidate their references
to device handles that are being closed. Additionally, they should be
removed from the transfer-in-flight list. This is done with the events
lock held to protect against another thread processing the same transfer.
The events lock is initialized as a recursive mutex, because the device
close code might itself be called while an event is being handled.
Fixes #82.

[stuge: Trivial rework to reduce indenting]

comment:9 Changed 18 months ago by Vitali Lovich <vlovich@…>

In [87ba8a9ead4fd9e81ae3c5a894f698779de1f2c2/libusb]:

Linux: Detect and report that device has gone away while discarding URBs

Return LIBUSB_ERROR_NO_DEVICE if the device is no longer available
(ENODEV) to discard the urbs. References #82.

comment:10 Changed 18 months ago by Vitali Lovich <vlovich@…>

In [8f1bc0659136faf312dabe8fe67380a798299d33/libusb]:

Add USBI_TRANSFER_CANCELLING and _DEVICE_DISAPPEARED status flags

The flags are used to indicate if a cancellation has started, and if
a cancellation has failed because the device is no longer available.
References #82.

comment:11 Changed 18 months ago by Vitali Lovich <vlovich@…>

In [74282582cc879f091ad1d847411337bc3fa78a2b/libusb]:

Add recursive mutexes to threading abstraction

This is necessary for the device close path which needs to attain the
events lock, but which might itself be called while handling an event.
The events lock is necessary to properly clean up transfers which might
still be pointing to the device. References #82.

[stuge: Move usbi_mutex_init_recursive() into threads_posix.c]
[stuge: Must also #define _XOPEN_SOURCE 500 to be able to build]
[pbatard: Un-inline usbi_mutex_init_recursive() to make Cygwin happy]

comment:12 Changed 18 months ago by Vitali Lovich <vlovich@…>

  • Resolution set to fixed
  • Status changed from accepted to closed

In [c775c2f43037cd235b65410583179195e25f9c4a/libusb]:

Clean up in-flight transfers and device handle when closing a device

Any in-flight transfers should properly invalidate their references
to device handles that are being closed. Additionally, they should be
removed from the transfer-in-flight list. This is done with the events
lock held to protect against another thread processing the same transfer.
The events lock is initialized as a recursive mutex, because the device
close code might itself be called while an event is being handled.
Fixes #82.

[stuge: Trivial rework to reduce indenting]

Note: See TracTickets for help on using tickets.