Ticket #38 (closed defect: needinfo)

Opened 4 years ago

Last modified 4 years ago

segfault in cancel_bulk_transfer on linux x86_64 using fedora 12 with libusb1-1.0.6-1.fc12 installed

Reported by: akohlmey Owned by:
Milestone: Component: libusb-1.0
Keywords: Cc: akohlmey@…
Blocked By: Blocks:

Description

while working on interfacing a USB haptic device to a visualization software, i ran into occasional segmentation faults like this one.

in its current incarnation, the interface opens and closes the usb device very often (up to 1000 times per second for two devices each), so i suspect that this is running into some race condition
somewhere. i am trying to rewrite my code to avoid this frequent
open/close nevertheless, there should be no segfault but just an
error condition, IMO.

i am running on fedora 12 x86_64 with all updates and
libusb1-devel-1.0.6-1.fc12 and the corresponding requirements
installed. it requires a lot of different libraries and codes
_and_ the devices to reproduce the error, so i'm not attaching
any code or executables to reproduce it.

(gdb) list
1694	
1695		tpriv->reap_action = CANCELLED;
1696		for (i = 0; i < tpriv->num_urbs; i++) {
1697			int tmp = ioctl(dpriv->fd, IOCTL_USBFS_DISCARDURB, &tpriv->urbs[i]);
1698			if (tmp && errno != EINVAL)
1699				usbi_warn(TRANSFER_CTX(transfer),
1700					"unrecognised discard errno %d", errno);
1701		}
1702		return 0;
1703	}
(gdb) where
#0  0x0000003186006663 in cancel_bulk_transfer (itransfer=<value optimized out>) at os/linux_usbfs.c:1699
#1  op_cancel_transfer (itransfer=<value optimized out>) at os/linux_usbfs.c:1737
#2  0x0000003186004ff9 in libusb_cancel_transfer (transfer=0xa475a8) at io.c:1275
#3  0x0000003186005116 in handle_timeout (ctx=<value optimized out>) at io.c:1673
#4  handle_timeouts_locked (ctx=<value optimized out>) at io.c:1726
#5  0x00000031860054cf in handle_timerfd_trigger (ctx=0x794870, tv=<value optimized out>) at io.c:1754
#6  handle_events (ctx=0x794870, tv=<value optimized out>) at io.c:1842
#7  0x0000003186005b93 in libusb_handle_events_timeout (ctx=0x794870, tv=<value optimized out>) at io.c:1931
#8  0x0000003186005c2d in libusb_handle_events (ctx=<value optimized out>) at io.c:1974
#9  0x00000031860063fd in libusb_control_transfer (dev_handle=0x816680, bmRequestType=<value optimized out>, 
    bRequest=<value optimized out>, wValue=<value optimized out>, wIndex=<value optimized out>, data=0x0, 
    wLength=<value optimized out>, timeout=<value optimized out>) at sync.c:105
#10 0x00000000004b5e1b in libnifalcon::FalconCommLibUSB::open(unsigned int) ()
#11 0x00000000004b0a03 in libnifalcon::FalconDevice::open(unsigned int) ()
#12 0x00000000004918c8 in vrpn_Tracker_NovintFalcon::get_report() ()
#13 0x0000000000490689 in vrpn_Tracker_NovintFalcon::mainloop() ()
#14 0x0000000000437cd2 in vrpn_Generic_Server_Object::mainloop() ()
#15 0x0000000000437c21 in main ()

Change History

comment:1 Changed 4 years ago by akohlmey

  • Cc akohlmey@… added

comment:2 follow-up: Changed 4 years ago by dsd

is it possible that there are still outstanding transfers while the device is being closed?

comment:3 in reply to: ↑ 2 Changed 4 years ago by akohlmey

Replying to dsd:

is it possible that there are still outstanding transfers while the device is being closed?

yes. how can i detect/debug this?

comment:4 follow-up: Changed 4 years ago by dsd

at least in the current design, you are expected to terminate all transfers (and wait for the completion of the termination) before closing the device. This could probably be documented better. Does the problem go away if you adjust your application in this manner?

comment:5 in reply to: ↑ 4 ; follow-up: Changed 4 years ago by akohlmey

Replying to dsd:

at least in the current design, you are expected to terminate all transfers (and wait for the completion of the termination) before closing the device. This could probably be documented better. Does the problem go away if you adjust your application in this manner?

i will have to dig through the documentation and see how i can adjust it.
i am not the author of the code, but use it indirectly through an abstraction
layer. it looks like it needs a bit of work anyways. ;-)

the real issue is that i cannot communicate with two haptic devices concurrently
(one of them is initialized, but then doesn't communicate) and my naiive workaround
was to open and close each device in the interface library before and after each
high-level access. that was working for most of the time, but also added (not
unexpectedly) a lot of (unwanted) latency to the process.

ultimately, i almost expect to do a partial rewrite of the abstraction library
(and teach myself USB programming) in order to bring the latencies down.
it just doesn't make sense to open and close each device hundreds of times
per second...

comment:6 in reply to: ↑ 5 Changed 4 years ago by akohlmey

Replying to akohlmey:

that all being said, i still believe that a library should not segfault
on something like this but rather fail gracefully and return with a
suitable failure status.

comment:7 Changed 4 years ago by dsd

That's going to be quite difficult given the asynchronous design. at the end of the day you're misusing the library (well, probably, neither of us actually know what's actually going on at this point). It is also outside of the design - libusb is designed to be low-level and lightweight, you have to play by its rules in the same way that you can't free a pointer twice with glibc.

either way, in order to make any kind of fix or improvement, you're going to have to understand your own software and explain exactly what's happening.

comment:8 Changed 4 years ago by stuge

  • Resolution set to needinfo
  • Status changed from new to closed

I'm closing this for now. Please re-open the ticket if/when more information becomes available. Thanks!

Note: See TracTickets for help on using tickets.