Reverse engineering USB audio gear

Earlier this year I were helping to reverse engineer the USB audio interface on the Pioneer DJM line of DJ mixers in order to add Linux support. I’d like to share my process, knowledge I picked up along the way, as well as some technical details.

Background

I’m a record collector and amateur DJ / hacker and picked up a used DJM-750 around 3 years ago without really thinking too much about the audio interface and whether it would work with the Linux kernel.

Time eventually passed and the mixer became the central hub for my desktop audio needs, living on my desk alongside my decks. I wanted to use it as a general-purpose audio interface since it has 8 input and 8 output channels and supports a sample rate of up to 96KHz. Unfortunately, although the mixer was detected by my Linux distribution, ALSA (the Linux audio subsystem), would not detect that there was an audio device present and I could not use any of the audio IO. Bummer: The only option is to reverse engineer and add support I thought.

Pioneer does provide several drivers for Windows that allow you to use their Rekordbox software but it’s designed to be locked down and to require a subscription to use and it still doesn’t allow you to access the IO directly and use it with other software or with Windows as a generic audio interface. But the proprietary driver is useful for reverse engineering efforts.

Preliminary investigations

Often we can learn a lot about devices by using standard tools. In particular: lsusb -v will list all of the devices on all busses as well as listing all of the “descriptors”. This is a technical term to mean: “A specification in the firmware of the device about how the device communicates”. It usually contains things like endpoint ID, transfer type, vendor ID, product ID and so on. For lots of devices there will be multiple “interfaces” for a single device, each will have a descriptor. Interfaces are a bit like ports with networking and are just allow frames to separated.

Tools

The silver lining is that I could set up a Windows virtual machine, install the Pioneer driver, and sniff the USB traffic between the mixer and the virtual machine to figure out what is going on. For that, I used two things:

The usbmon kernel module with Wireshark
OpenViszla: an FPGA-based USB analyzer. I used ViewSB developed by Qyriad and Kate Temkin as the frontend since Wireshark doesn’t yet work with OpenViszla. I do want to develop a driver for Wireshark support though since the OpenViszla hardware is really quite cheap compared to things like the TotalPhase Beagle 480.

Hardware capture

The protocol analyzer sits between the PC and mixer. It then is connected to a PC that will recieve the analysis data (this can be the same computer as the host):

┌───────┐   ┌───────────┐   ┌──────┐
│Host PC│◄──│Open Vizsla│──►│DJM750│
└───────┘   └─────┬─────┘   └──────┘
                  │
            ┌─────▼─────┐
            │Analysis PC│
            └───────────┘

USB Specificaiton

USB 2.0 is not only a protocol but is a complete specification covering the phyical cable and connector all the way up to the structure of frames and packets that are transmitted between host and client but we only need to focus on the higher layers for the purposes of reverse engineering the data that is sent.

For the DJM-750, we look to the high-speed mode. In full-speed/low-speed, the frame rate is 1ms and is fixed because it’s used as a timing reference for isochronous transfers. More on isochronous later. In high-speed, there are 8 microframes each with a duration of 125ms. Thus, 8 microframes add up to 1ms duration. This is important as in a mixed device tree of HS and FS/LS devices, all of the communication between hubs is done at HS. It’s still possible to retain the crucial timing though since there is a common demonimator.

Isochronous transfer

Audio is naturally time-sensitive, so [audio] interfaces generally use isochronous transfers which are continuous and periodic in nature. This is in contrast to interrupt transfers used for mice and keyboard and to bulk transfers used for mass storage.

What is vendor specific?

Although the device is quite close to being class-compliant, there are a few different control values that need to be sent to the device to make it work. The device descriptor is also a little non-standard. Below I’ll cover the discovered control values and quirks about the descriptor.

Discovered control values

Input Type
Name	Value
Control Tone LINE	0x0000
Control Tone CD/LINE	0x0001 (check)
Control Tone PHONO	0x0003
Post Fader	0x0006
Cross Fader A	0x0007
Cross Fader B	0x0008
MIC	0x0009
AUX	0x000d
REC OUT	0x000a
NONE	0x000f

Channel Mask
Mask	Name
0x0100	Channel 1
0x0200	Channel 2
0x0300	Channel 3
0x0400	Channel 4

…and so on.

This can be represented as: n << 8 in C. Where n is the channel number.

The actual control value is a combination of the __input type_ and the channel mask. For example: To set channel 2 to PHONO the value would be 0x0203 which can be calculated by taking the logical OR.

In C this is 0x0200 | 0x0003.

For each Pioneer devices we need to know: * The number of channels * The supported input types for each channel

Adding support to the kernel

We need to pay attention to the following files: - sound/usb/quirks-table.h Describing the vendor-specific interfaces - sound/usb/quirks.c Setting the sample-rate - sound/usb/mixer-quirks.c Adding alsamixer controls that send specific values to the device