User manual

Glossary

Virtual Cable

A Virtual Cable is a software emulation of a real (wireline) audio cable. It consists of a pair of audio endpoints (formerly "devices"): a playback one and a recording one. These endpoints are internally connected to each other, making a loopback. All sounds played to the output/playback endpoint can be immediately recorded from the input/recording one. See the basic principles for more details.

In VAC documentation, the "Virtual Cable" term is used as a placeholder for the appropriate endpoint pair. For example, "Virtual Cable 2" usually has endpoints named "Line 2", unless you have renamed them. The "Virtual Cable 2" name appears only in KS-aware applications like Audio Repeater KS.

Audio device

Commonly, "audio device" term means a hardware peripheral ("physical device", "adapter", "controller"), either separate (connected to an external connector), or built-in (soldered to the motherboard or even enclosed into an IC (chip)). In operating systems and applications, "audio device" often means a software resource ("logical device", "port", "connection point") that represents a particular function of the hardware device, accessible via system API.

At the dawn of Windows, the "playback device", "recording device" and "auxiliary device" terms represented system logical device that could be used to play sounds to speakers/headphones, record sounds from microphones/lines, and control audio parameters, like volume level. In most cases, all three types of such "software devices" were incorporated into a single hardware device, governed by a single driver.

Legacy Windows logical audio devices are called Waveform devices. By example, a typical audio adapter (card) has two logical waveform devices: Wave Input (Wave In) (typically microphone input) and Wave Output (Wave Out) (typically speaker output). The former is used to record (capture) audio data from hardware inputs (sources) to applications, and the latter - to play (render) audio data from applications to hardware outputs (destinations).

Logical audio devices are accessed via audio interfaces. Different interfaces may expose different logical device sets. For example, for a single WDM/KS resource (a pin), higher-level interfaces may create several endpoints that represent separate source lines served by the same pin.

Using the "device" term in documentation, it is required to distinguish between hardware devices and software/logical ones. To get rid of such confusion, starting from Vista, the "endpoint" term is used to designate a logical device.

Audio endpoint

Windows versions prior 6.x create recording (capture) logical waveform device for each capture waveform pin exposed by a driver. This recording resource usually represents all recording capabilities of the corresponding hardware device, which may have several source lines. Selecting such resource in a recording application was not enough to record audio signal from a particular source line. A device mixer should be configured to select (connect) an appropriate source line before recording starts.

To avoid confusion between software and hardware devices, and to eliminate such additional steps, Windows 6.x introduced a new resource type called endpoint, or an "endpoint device". Instead of creating a single logical recording audio "port" common for all inputs of a single hardware device, Windows now creates a separate endpoint for each source line, concatenating source line name with hardware device name, like "Mic Volume (AC97 Audio)" or "Speakers (Realtek High Definition Audio)", and exposes them as legacy logical waveform devices. When an application opens a particular endpoint for recording, Windows automatically selects the appropriate source line in device's mixer.

Since source lines in most audio devices are multiplexed, not mixed, only a single endpoint can be used for recording at a time. If a recording endpoint is used for recording, all other device endpoints multiplexed in the same group become unavailable.

Virtual Cable source lines are software-emulated and represented as "multiplexed" to look similarly to other audio devices.

Each endpoint can be either in connected (plugged in) or unconnected (not plugged in) state. Endpoint of a real hardware device can become connected when a connector is plugged into a jack. Only connected endpoints are available for playback/recording and visible in application's audio device list. VAC driver represents connection state of its recording endpoints in accordance to VAC Control Panel settings.

You can view all audio endpoints and their states in Windows Audio Properties applet, right-clicking on any item of Recording or Playback lists and enabling "Show disabled endpoints" and "Show disconnected endpoints".

Please note that endpoint creation in Windows 6.x+ is a very slow process. For example, creation of 30-40 endpoints (when a driver is initially loaded or restarted) may require up to a minute of 100% CPU load on 2 GHz machine. Therefore, avoid creation of large amount of Virtual Cables unless really needed.

Also please note that all Virtual Cables with enabled speaker pin type will have the same playback endpoint name, like "Speakers (Virtual Audio Cable)". It is by system design, not due to a bug. If you have enabled speaker pin type for two or more Virtual Cables, rename the appropriate endpoints manually (in Audio Properties applet or in System - Sound - Properties window) to distinguish them.

At Kernel Streaming level, elementary (low-level) audio endpoints are called pins, and provided by low-level device drivers. High-level (usual) endpoints are created from such pins by System Audio Endpoint Builder, and served by System Audio Engine.

Audio Stream

Digital audio stream is a sequence of audio data samples/frames, representing an audio signal. Each stream has the appropriate format.

"Fixed" (recorded) audio stream data can be stored on any type of data storage: RAM, ROM, HDD, SSD etc. Transferring a stream in real time is called real-time or live streaming.

In real-time streaming, stream is a data flow through audio hardware and software. For an audio application, it means a connection made by a client to an endpoint, with audio data flow between them.

VAC, being a KS driver, processes audio streams as KS streams.

Audio Format

A combination of audio encoding parameters (sampling rate, sample size and number of channels) is called digital audio format.

An audio format can be shortly specified as these three values: 48000/16/2 or (48000, 16, 2) means 48000 samples per second, 16 bits per sample and two channels (stereo).

Multichannel (more than 2 channels) formats also contain a channel configuration (distribution) parameter.

Audio data encoding

In digital audio, signal is represented by a sequence, or a flow, of momentary amplitude values, or samples. All samples in the stream have the same bitness, or size: 8, 16, 20, 24 bits and so on. Sample size defines sample value range and a dynamic range of the digital sound in decibels, which is near to six times larger than sample size in bits. For example, 16-bit samples can represent approximately 96 dB of dynamic range.

Audio signal can have one or more channels. Most signals are single-channel (mono) or two-channel (stereo). Modern audio hardware and software support five, eight or even more channels (for example, Dolby 5.1, 7.1 etc.). To represent these channels, audio samples are packed into blocks (frames), from left to right. A mono frame consists of a single sample value. In a single frame, all channel sample values are sampled simultaneously, at the same moment of time.

A sampling rate means how frequently sample values are measured (sampled) on recording or converted to signal amplitudes on playback. Sampling rate determines how many frames are transferred per second. This parameter is often called "samples per second" but it is correct only for a mono stream. Generally, it should be read as "frames per second" (FPS). But since all channels are samples at the same time, the "samples per second" term is correct too.

Sampling rate also defines an maximal signal frequency available for coding; it is a half of a given sampling rate, in Hertz. For example, to represent an audio signal containing frequencies up to 16 kHz, at least 32000 samples (frames) per second are required.

Sample values can be represented using various encoding methods. A simplest and most widely used method is the Pulse Code Modulation (PCM) when a numeric sample values immediately represents an absolute linear signal amplitude.

There are compressed encoding methods (ADPCM, a-law, u-law, MPEG, WMA, OGG etc.). They allow to compress audio data but require more resources to process them and frequently cause quality loss.

VAC supports only fixed-point PCM encoding.

Real-time performance

Performance of a computer system is commonly considered as a throughput, meaning how fast common operations (calculations, data transfer, image rendering etc.) are performed. Common performance is always measured averagely, by counting number of operations and dividing it by elapsed time interval.

Within the interval, operations are not always performed uniformly. There can be noticeable variations in CPU frequency, number of cores involved, power saving modes etc. so only average values are meaningful. Each computer task is performed sequentially, step by step, and the time required for each step may vary. Commonly, only total time required to complete the entire task is taken into account.

In audio processing, such overall performance is meaningful for "offline" operations only, when all audio data involved in processing are already stored in files. To process them (cut undesired parts off, adjust signal volume etc.) fast enough, it does not matter how uniformly the data are read, processed and written. For example, a processing application can read all the data first, then process them all, then write them all, or read-process-write data by small portions. In other words, it does not matter how the processing is organized internally, only a final result is important. If application's "internal" or "virtual" time flows non-uniformly, it does not affect the final result.

But in real world, the time always flows uniformly. If audio processing is related to real-world events (for example, recording a speech, musical performance, playing a sound etc.), the "internal" or "virtual" computing time must flow uniformly too, in accordance with real world time. It is called real-time computing. Real-time performance means how fast and reliably the hardware and software reacts to events that occur in real world, in real time. In other words, not only overall performance (measured on long time intervals) is important, but short-time, momentarily performance it important too.

A good example of real-time performance is a conveyor production. To maintain a uniform, uninterruptible process, the transporter must not move faster than the slowest workers can perform their operations. Therefore, even a single worker that does not complete their work in time, can break entire process. To maintain the process moving uniformly, each worker must operate fast enough, creating no excessive delays.

Unfortunately, Windows is not a real-time OS. Having good overall performance, Windows never guarantees that all live streaming will be performed reliably, with no drop-outs, glitches, cracks or pops. In general, Windows real-time audio performance is quite enough. But sometimes internal system parts or drivers may interrupt audio processing for several, or even dozens of, milliseconds. To have reliable audio streaming, especially for low-latency processing, the system may require special testing and tuning.

Audio interface

To communicate with an audio device, each application must use some interface presented by Windows. Each interface consists of a set of functions and is restricted by a set of conditions and rules.

MME (MultiMedia Extensions), or WinMM, is the oldest audio interface introduced in Windows 3.0. It is intended for a streaming audio and has relatively high latency. Under Windows 9x/ME, MME interface uses old-style 16-bit code, as some other operating system parts. But MME is simple, exists in all Windows versions since 1991 and its behavior never changed. MME supports only shared access mode. Audio data are passed via a buffer chain provided by a client. Most audio applications can use MME interface. MME limits endpoint name length to 31 characters, so names may appear truncated. MME endpoints are accessed by their numbers, so arrangement issues may occur.

DirectSound has been introduced in Windows 95 as a part of the DirectX acceleration set. It combines a low-level, hardware-close, low-latency audio operations with a high-level, device-independent programming. Audio data are passed via circular buffer. Originally intended for games, DirectSound interface quickly became very popular in sound synthesis and recording/playback applications. In Windows 5.x, DirectSound supports both shared and exclusive access modes. In Windows 6.x+,  DirectSound acceleration features (the exclusive access mode) are not supported anymore so interface efficiency became the same as MME. Later, DirectSound is replaced with XAudio2.

WASAPI has been introduced in Vista. The abbreviation stands for Windows Audio Session API. It supports both shared and exclusive access modes. In the exclusive mode (and especially with event-driven notification), it is highly efficient like hardware-accelerated DirectSound and ensures BitPerfect transmission, but in shared mode, its efficiency is comparable with shared-mode MME and DirectSound. Main WASAPI advantage is that it has a modern style, object-oriented, flexible, allows to use local and global effect processors (LFX and GFX) and provides a way to insert user-defined processing objects (APOs) into signal path. Audio data are passed via circular buffer.

Kernel Streaming (KS) interface is the basic since Windows 2000. While the above high-level interfaces work via System Audio Engine, KS is implemented immediately by a device driver. All other audio interfaces are based on top of KS. By default, KS involves no signal processing, so it can be used for BitPerfect transmission.

WASAPI in exclusive mode has almost the same efficiency as KS but may be affected by system layer implementation (for example, there are some problems in Win7 fixed in Win8).

Four interfaces described above are Windows standard ones. Additionally, there are some third-party interfaces intended for professional audio processing.

ASIO is most popular of third-party interfaces. ASIO is designed for the simplicity, extreme performance and precision, being suitable for BitPerfect transmission. VAC doesn't directly support ASIO directly.

The "low-level" and "high-level" terms are quite relative. In one case, interface level may represent its universality and usability (the simpler, the higher level). In another case, the level may represent interface features and efficiency (the more features or the higher efficiency, the lower level).

WASAPI, DirectSound and MME interfaces can be considered "high-level" only in comparison to Kernel Streaming because they are built on top of KS. In modern Windows versions, these interfaces are often considered "low-level" because higher-level ones (AudioGraph, MediaCapture, MediaElement, XAudio2) are offered. Meanwhile, XAudio2 is often called "low-level" because it offers hardware-close streaming control.

Client

A software entity (an application, a service, a part of operating system) that uses an audio endpoint and opens it, connecting to its provider, is called the client. Most low-level audio drivers allow only a single client (instance) for each pin but System Audio Engine supports unlimited number of clients, sharing device endpoints among them. VAC driver allows any number of clients to open each pin directly (in the exclusive mode).

If an application requests endpoint connection several times without closing it, it means that endpoint provider has several clients. In other words, each endpoint opening request is treated as new client appearance. Client disconnects from the provider when it closes an endpoint instance.

When an application accesses low-level audio endpoint (pin), provided immediately by a low-level driver, by KS interface, it becomes a driver's client. When an application accesses a high-level audio endpoint, it becomes System Audio Engine's client. In turn, System Audio Engine accesses a low-level pin and becomes low-level driver's client.

The process that connects to the cable clock to adjust it dynamically, becomes a cable clock client.

Audio buffers

All audio interfaces use audio buffer conception to interchange audio data between applications and devices. A buffer is a memory block containing an audio data fragment or intended to be filled with such data fragment. Data buffering is used to make data transfer more smooth and reliable.

A smallest data unit that a computer can transfer between a device and main memory is a word. In audio, each computer word contains only 1-2 frames of 16-bit stereo sound. It is extremely small amount for real-time streaming. So audio frames are packed in blocks containing thousands, hundreds or (as an exception) dozens of frames each.

In MME and legacy WDM/KS, a buffer chain algorithm is used to interchange audio data between an application and the audio subsystem; application sends several buffers to driver, driver plays them or fills them with data, and returns them back to application. An application must use more than one buffer because some time passes between a moment when a device driver inform about buffer completion and the moment when it receives a next buffer from the application.

In DirectSound, WASAPI and modern (RT Audio or WaveRT) WDM/KS, circular (ring) buffers are used; application allocates a single memory block, logically divided into several parts, tells the driver which part is ready for playback/recording, and the driver notifies the application when part processing is done.

To maintain a smooth and continuous audio stream, an application must provide enough buffering time. Most applications use 500.1000 ms of buffering time, divided into 8..12 chain buffers in MME or to 4..12 circular buffer parts in DirectSound, WDM/KS or WASAPI . But the more buffering time, the more latency is introduced. So you need to balance these parameters for each configuration used.

Kernel Streaming audio interface

Kernel Streaming (WDM/KS Audio, or KS Audio) is the lowest-level multimedia interface introduced in Windows 2000/98. The WDM stands for the Windows Driver Model, an universal driver structure and behavior making available to use a common binary driver file in Windows 98/ME/2000/XP and higher systems. The KS stands for the Kernel Streaming, the audio/video streaming technology of the Windows kernel. KS interfaces are implemented by WDM device drivers.

KS is a very sophisticated, lowest-level multimedia interface supporting a huge set of features. It allows to achieve highest audio precision and lowest latency. In Windows 2000/XP and later, all higher-level audio interfaces are implemented on top of KS.

KS supports accurate time-stamping for linking audio data to a timeline.

KS supports only exclusive access mode. Only multi-client audio drivers like VAC allow to create multiple streams through a single pin.

Only KS interface allows a user-mode client to directly communicate with an audio driver with no intermediate layers. An application having properly-implemented KS interaction can achieve highest possible stream quality, efficiency and stability. Only driver and/or hardware bugs may affect signal quality and latency.

To communicate with the driver, KS clients use KS protocols. Full KS protocol implementation is very hard for a particular driver, so Windows offers Port Class Driver to simplify multimedia driver creation.

Kernel Streaming protocols

To exchange audio data and control the stream, KS driver and client must negotiate a protocol. There are two different streaming protocols used between KS client and KS driver: "legacy" and "realtime".

Legacy (or "standard streaming") is a native KS streaming protocol, available in all KS implementations, starting from Windows 98. Audio data are passed via buffer chains, as in MME interface. To send and receive each buffer in the chain, a switch to kernel mode must be performed. The higher processing event frequency, the lower latency, the more overhead.

"Realtime" protocol ("looped streaming" or "RT Audio mode") was added in Windows 6.x+. Audio data are passed via single circular buffer (usually located in the hardware) that is directly accessible to user-mode client. No periodic kernel mode switching is required to write data on playback and read them on recording. If the driver supports a position register, no kernel mode switching is required to obtain current playback/recording position.

Inside each protocol, different processing modes can be negotiated between the driver and client.

Don't confuse the "realtime" in protocol naming with real-time performance/streaming. Since all audio streaming protocols are designed and used for playback or recording real world audio signals, they all definitely work in real time. The "realtime" term in protocol naming has a meaning like "more suitable for real time processing", "very low latency" etc.

Although standard streaming protocol is supported by all Windows versions that support KS, particular drivers support it by their own choice. Starting from Windows 6.x, RT Audio is considered preferred for most embedded hardware. Only USB audio drivers still support legacy protocol because there is no direct access from the CPU to internal circular buffer inside USB device.

VAC driver supports both legacy (in all systems) and RT (in modern systems) KS protocols. They can be chosen via PortCls port/miniport types.

Stream processing modes

Represent various peculiarities of Kernel Streaming protocol used by KS clients.

  • Looped - a looped (circular) data buffer is used.
    In legacy KS protocol, a buffer chain is normally used, when completed buffer parts are being returned to client, and new parts are being submitted in reply. In looped mode, buffer parts are never returned until explicitly requested, so the driver continuously loops submitted parts.
    In real-time protocol, a single circular buffer is the only way to interchange data between driver and client, so this mode is always indicated for RT streams.
  • Event notification - driver signals the events to notify the client about stream progress. Currently, RT protocol allows to specify up to two events that signal as the appropriate half of looped buffer is completed.
    If events are not used, clients have to poll stream position with a sufficient frequency.
  • Packet mode - stream data are submitted and completed in packets (parts of circular buffer). Packet mode is a kind of flow control. Currently, the system supports only two packets (halves) per buffer.
    In packet mode, both the driver and its client maintain packet counters to check stream integrity and detect potential data overflows/underflows.
    Without packet mode, data are submitted and completed in portions of any size, and only client can detect overflows/underflows by the stream position maintained by the driver. The driver never knows how much data are submitted or completed by the client.
  • Clock register - a hardware (or emulated) register is used by the client to read stream's clock information directly, without issuing a special API request and switching from user mode to kernel mode, and then back to user mode.
  • Position register - a hardware (or emulated) register is used by the client to read directly current stream position.

Looped mode can be used in both legacy and RT protocols. Other modes are used in RT protocol only.

Event notifications are supported by Windows 7 and later. Packet mode is supported in Windows 10 and later, and can be selectively disabled for any cable side in cable parameters.

VAC Control Panel shows stream processing modes in the "Modes" column of the cable streams list.

ASIO audio interface

ASIO (Audio Streaming Input/Output) is an audio interface designed by Steinberg for their audio hardware and software.

Standard Windows audio interfaces are frame-oriented (individual channel samples are packed into frames), and all channels in the stream are logically connected (represent the same sound picture). On the contrary, ASIO is channel-oriented (samples of each channel are placed consecutively and stored in separate memory buffers, forming multiple single-channel (mono) streams), and the channels are usually independent, carrying different sounds.

Therefore, standard interfaces are most suitable for "solid" multichannel streams (stereo, quadro, 5.1, 7.1 etc.), while ASIO is most suitable for multi-source and multi-destination streaming (voices of people and musical instruments, sound effects, mixing consoles, recorders, sequencers etc.).

ASIO supports time-stamping, allowing to determine or specify exact position of each block of audio data on the timeline.

Instead of device/endpoint enumeration used in Windows audio interfaces, ASIO uses driver enumeration. There is no way to determine how many devices are served by the driver. It is supposed that the driver serves only a single device, but the driver could serve multiple synchronized devices at a time, combining their channels into larger channel arrays.

Usually, format conversion is not supported, all samples are transmitted "as is". Client software must query ASIO driver for sampling rates and sample representation supported by the underlying device.

ASIO drivers are single-client, a driver cannot serve multiple client applications. To create complex software routing schemes, VST host applications are used. A single VST host communicates with all available ASIO drivers, making them available for every VST plug-in.

ASIO driver is a user-mode DLL implementing an in-process COM server. Because the DLL cannot communicate with the hardware directly, hardware vendors also provide a kernel-mode hardware driver. Some vendors provide a WDM/KS driver as well, allowing to use their devices from any Windows application. Other vendors provide only a special kernel-mode driver to work with ASIO driver DLL, so their devices can be used only by ASIO-aware applications.

Filter

In the WDM/KS driver technology, each device driver exposes a set of logical subdevices called filters (they are streaming filters like DirectShow ones). Each filter emulates a hardware processing unit. It can accept signal and/or data, produce them, or perform both of these operations.

Each filter exposes a set of pins.

Pin

A pin represents a connection point of a filter: recording (capture), playback (render), volume control, mixer, clock and so on. A pin is a low-level synonym of an endpoint. The term "pin" is similar to the "lead" in electronics.

To use a pin for audio streaming, Windows creates an instance of the pin by "instantiating" or "opening" it. Each pin instance forms a separate stream. If an application accesses the pin directly using WDM/KS interface, the instance is created for the application itself, nobody else can use this instance and the appropriate stream. If an application accesses the pin in shared mode through System Audio Engine, the Engine creates a single instance for itself and then shares it among all connected applications. It allows single-client drivers to be used by multiple applications at the same time.

If the driver supports multiple pin instances, multiple clients can use the appropriate pin, creating their own instances. Most KS drivers support only a single pin instance so there can be only a single client (System Audio Engine or a KS-aware application).

KS stream (a pin instance)

In general, a term "stream" in sound/audio context is used to designate any kind of audio stream.

For a WDM/KS driver, this term is used to designate a data flow connected to a particular pin instance by request from a client. Each new pin instance forms a new stream. Each stream has its own audio format. An application can be connected to a KS stream either directly or via the System Audio Engine proxy layer.

VAC processes KS streams only. Higher-level streams are maintained by the appropriate interface providers.

Formats if Virtual Cable streams can affect the cable format.

Format conversion

Audio format conversion (resampling) is a particular case of data conversion in audio streams. To convert an audio data stream with a given format to a stream with another format, the following actions should be performed:

  • Sampling rate conversion if sampling rates are different. Usually, it is a most time-consuming operation.
  • Sample size (bit depth) conversion. It is the lowest time-consuming operation, unless special smoothing measures are used.
  • Channel set conversion. Depending on the source and destination channel sets, it can consume less or more processing time.

There are some known format conversion issues.

Format attributes

Format attributes are additional format properties for KS streams. These attributes can be provided together with KSDATAFORMAT descriptor if KSDATAFORMAT_ATTRIBUTES flag is specified in the Flags field.

Format attribute support was announced in Windows 2000/98, but was actually implemented only in Windows 8.1 for signal processing modes. If a KS filter exposes format attribute support, systems prior to 8.1 may incorrectly form KSDATAFORMAT descriptor in their property requests, so the driver have to fail the request, and the format requested cannot be used. Even in Windows 8.1 and 10-11, this problem may occur with WaveCyclic ports.

VAC format attribute policies

To avoid problems caused by Windows bugs related to format attributes, VAC driver supports format attribute policies:

If format attribute support is allowed, the appropriate miniport exposes support for signal processing modes.

Format attribute policies can be configured either globally (for all cables) or for particular cables. On VAC driver startup, actual policies chosen for render/capture sides of each cable are reported in driver event log.

Signal processing modes

Windows uses signal processing modes to tell KS drivers how to process stream data (don't touch, alter signal volume, apply audio effects etc.). Signal modes are selected via format attributes.

VAC driver doesn't support signal processing modes other than raw, but supports the appropriate property requests to conform to Windows requirements.

BitPerfect audio data transmission

BitPerfect audio transmission stands for the exact transmission of the audio stream, ensuring that every audio sample is delivered intact, "as is".

By default, Windows audio subsystem prefers versatility and compatibility, to provide applications with the ability to play and record any possible sounds. For this, System Audio Engine automatically performs format conversion if the format requested by the application does not match device format.

Format range accepted by System Audio Engine is very wide: from 1000 to 384000 samples per second, from 8 to 32 bits per sample (integer or floating-point), and from 1 to 8 channels at a time.

For example, if an application has a built-in sound in format 44100/16/2, but a particular device supports only 48000/24/2, this sound cannot be directly played back to this device. In earlier versions of Windows, all applications had to convert audio formats themselves, while resampling algorithms are not simple. Now the application can simply open device's endpoint with a desired format, and the conversion is done transparently. Any application may use any format supported by the system, without querying particular devices about formats they can support.

Additionally, even if format conversion is not used, the Engine can mix multiple streams together, adjust signal volumes, apply audio effects and so on. Every operation may alter digital representation of the stream, even if the changes are not audible.

Such wide compatibility is convenient for most audio applications, but is not suitable for special ones, performing measurements, precise signal generation, transmitting audio streams between digital audio interfaces etc. Such applications require a reliable, transparent signal path. A transparent path ensures that all audio samples sent to the device, will be delivered intact, and only beginning/trailing silence may be added.

Among the standard Windows audio interfaces, BitPerfect transmission is possible only in WASAPI in Exclusive Mode and in KS. MME, DirectSound and other standard interfaces cannot ensure transparency.

ASIO ensures BitPerfect transmission by default, but some devices and/or drivers may alter audio samples.

Source line

A typical audio device has a single digital recording channel that application can use to capture audio signal but several source (input) lines like Microphone, Line, Phone, CD and others. To capture audio signal coming from a particular source line, this line should be selected first. In Windows 5.x, you need to use Windows Mixer application to select a source line. Windows 6.x+ creates audio endpoints for this purpose.

As a virtual device, VAC has no real source lines. But some applications want to connect to a given line type (Microphone or S/PDIF) so VAC provides several emulated source lines. All of them are identical but Windows can use different default settings for each line. For example, Windows 6.x+ uses less sampling rate recording from the Microphone line than from the Line In.

Virtual Cable's source lines are internally connected to a multiplexer (mux), allowing to select only a single source line for recording. The only effect is volume control (if enabled): only selected line's volume controls affect cable volume. Currently selected source line is shown in cable state window.

Source line set is a cable configuration parameter.

Cable format

When a Virtual Cable gets its first input or output client (stream), a particular audio format is chosen as the cable format. Cable format parameters are determined from both first client (stream) format and the cable format range.

Cable format is fixed while a cable is active ( has at least one client/stream).

Internal mixing is performed in cable format so all render/output stream data are converted to cable format, and mixed results are converted from the cable format to capture/input stream formats. If cable channel mixing is enabled, VAC also converts channel sets in multichannel streams.

Cable format range

A range of audio formats allowed to be selected as a cable format. The wider this range is, the more formats are available to be chosen for a cable, the more often format conversion occurs. Format range is a cable configuration parameter.

If range of each format parameter (sampling rate, bits per sample, number of channels) is narrowed to a single value (for example, 48000..48000, 24..24, 2-2), cable format will not vary, and always be the appropriate, regardless of stream formats. But some streams may not be created due to stream format limiting.

Cable channel mixing

By default, VAC converts channel sets in multichannel streams if number of channels in stream and cable formats are different. For example, when converting from stereo stream to mono, VAC mixes (sums) both left and right channels, producing a single channel stream. When converting from mono stream to stereo, VAC spreads a single channel into left and right ones. When converting 4-, 6- and 8-channel streams, more complex rules are applied.

If channel mixing is disabled, VAC performs channel scattering (placing sequentially packed channel data to specified channel configuration positions) or gathering (extracting specified channel configuration positions and placing them to a sequentially packed set) instead of mixing them. See here for details.

In some situations, disabled channel mixing may produce undesirable effects.

Channel mixing mode is a cable configuration parameter.

Channel/speaker configuration (channel distribution)

For mono and stereo audio data formats, only known, dedicated speaker placements are available: for mono, there is only a single speaker; for stereo, there are only two speakers or headphone parts. So it is not necessary to specify their placement additionally.

For multi-channel formats, there can be several speaker placement schemes for the same number of channels. For example, old 5.1 speaker configuration used back channels (the "5.1 back" scheme), while modern configuration uses side channels (the "5.1 surround") instead. Therefore, it is not enough to specify only a number of channels; you need to specify channel-to-speaker mapping as well.

VAC supports the following audio channels:

Abbreviation Location Hex mask
FL Front Left 1
FR Front Right 2
FC Front Center 4
LF Low Frequency (Subwoofer) 8
BL Back Left 10
BR Back Right 20
FLC Front Left of Center 40
FRC Front Right of Center 80
BC Back Center 100
SL Side left 200
SR Side right 400
TC Top center 800
TFL Top front left 1000
TFC Top front center 2000
TFR Top front right 4000
TBL Top back left 8000
TBC Top back center 10000
TBR Top back right 20000

Hex mask represents a bit mask corresponding to a single channel, in hexadecimal form. To get a mask for several channels, add their masks together, using Windows calculator in HEX mode. For example, a mask for FC+BR+SL channels will be 4+20+200 = 224 hex (0x224 in popular C/C++ notation).

Channel data (sample values) in the frame are always arranged within the frame in the same order as listed in the table above. So left channel samples always precede right channel samples, subwoofer channel samples always precede back channels samples, and so on.

For details, see the description on Microsoft site.

See here how to set speaker configuration for an audio device/endpoint.

Stream format limiting mode

A rule to limit a new stream format. Mostly used to control automatic format selection features of System Audio Engine. See here for details. Limiting mode is a cable configuration parameter.

VAC driver event log

VAC driver event log is a list of various internal events processed by VAC driver (stream creation, start/stop, termination etc.). Event log can represent some hidden details of driver and/or cable devices activity, helping to understand what is happening and isolate problems.

VAC driver internally keeps a relatively small amount of recent events. If Control Panel application is running, it constantly retrieves new events from the driver and places them to a drop-down list so no events are lost. But if Control Panel is not running, newer events may overwrite older ones in driver's internal memory so unread events may be lost.

In Control Panel's list, all events are kept until closing the application and event descriptions can be viewed and/or saved to a file.

Windows Audio Subsystem

Windows Audio Subsystem includes several components, the most important of which are the following:

  • ks.sys - common Kernel Streaming kernel-mode library. Provides common routines to process various KS requests and objects. Used as a helper by most KS drivers.

  • portcls.sys - kernel-mode Port Class Driver. Offers a framework to simplify KS driver development. Performs most typical KS operations, while the actual device driver (called "Miniport Driver") provides device-specific operations only.

  • ksproxy.ax - user-mode component that wraps KS filters to represent them as DirectShow filters. Thanks to this, every device that has a KS driver, automatically becomes accessible from a DirectShow filter graph, with the minimum possible overhead costs.

  • AudioDG.exe - System Audio Engine. Communicates with KS device drivers, mixes sounds played back by applications, splits sounds to be recorded by applications, performs format conversion etc.

  • Audiosrv.dll - System audio services. Perform various device/endpoint maintenance tasks.

System Audio Engine

The System Audio Engine is a system code that supports most of system audio features. It is hosted by the AudioDG (Audio Device Graph [Isolation]) process.

System Audio Engine acts as a "proxy" to each WDM/KS audio driver accessed via WASAPI, MME, DirectSound and  other higher-level interfaces in shared mode. When an application uses shared connection mode, a separate pin instance is implicitly created to the System Audio Engine. See Audio layering issues for details.

Additionally, the Engine hosts Audio Processing Objects (APOs) implementing local and global audio effects (LFX/GFX).

Before Win 6.x, the same role was played by the KMixer (kernel-mode audio mixer), a system kernel-mode audio component (a special kind of an audio driver), a part of the Windows 98/ME and 2k/XP/2k3 audio subsystem.

System audio services

Starting from Vista, the system has a dedicated Audio Service (AudioSrv), named Windows Audio in the service list. This service maintains audio endpoint properties.

Audio endpoint database is built by the Windows Audio Endpoint Builder service (AudioEndpointBuilder). This service queries all audio pins exposed by KS filters and creates an endpoint for each pin.

These services are running in the Service Host process container (svchost.exe). An instance of such process may run several different services. To help finding the appropriate service, VAC driver shows service tags in its event log.

In some cases, restarting System Audio Service may help to eliminate some audio endpoint problems without rebooting the entire system.

Service tags

Most Windows services are running in dedicated Service Host process container (svchost.exe). An instance of such process may run several different services. Each service acts on behalf of its container process. When the service accesses a device, device driver can determine only process (PID) and thread (TID) identifiers, but not the service name. To identify a particular service, the driver may access the Service Tag, a numeric identifier of the service. VAC driver shows service tags in its event log.

Unfortunately, a driver cannot access Service Manager database to identify the name of the service. To identify the service by its tag, use third-party "sctagquery" command-line utility. For example, if the PID is 184 and service tag is 12, enter the following command line under an administrator account:

sctagqry -n 12 -p 184

Shared and exclusive pin access

Most audio device drivers support only a single instance of each capture or render pin (they are single-client drivers). To allow to access these pins from several applications at the same time, an intermediate (proxy) layer is required. In Windows, this layer is provided by the System Audio Engine: MME (all) and DirectSound/WASAPI (by default) connections are established in the shared mode when the engine creates a single pin instance for itself and all clients are connected to the engine, not immediately to the filter and the pin. System Audio Engine chooses an appropriate format for the pin instance, and then converts audio data between pin format and client stream formats. This mode is convenient but often not efficient enough.

DirectSound (in Windows 5.x), WASAPI (in Windows 6.x+) and WDM/KS (in all systems) support exclusive pin access modes while the pin instance is created for a requestor application only. No other clients (applications and even system sounds) are allowed to share this instance. The pin is instantiated with the format requested and no format conversion is performed between client application and the driver. This mode is efficient but not convenient enough because there can be only a single application that can use the pin at a time. If the driver supports multiple pin instances like VAC, there is no such restriction.

Implementing multi-client pin access, VAC behaves like System Audio Engine in the shared mode, mixing playback streams together, distributing cable data among recording streams and performing format conversions. So most efficient VAC usage method is to use exclusive access modes when connecting to Virtual Cables is possible.

In WASAPI, exclusive access mode is supported in two forms: polling (also called "push" for playback and "pull" for recording) and event-driven notification. In the polling mode, the client periodically queries the status of the stream to determine when to write or read the next portion of audio data. In the notification mode, the driver raises the event every time the room/data are available. In addition to more optimal CPU resource utilization, notification mode allows to use very small KS buffers (down to 1 ms).

PortCls

PortCls stands for "Port Class Driver". It is Windows kernel-mode module (portcls.sys) implementing most common multimedia driver functions and intended to simplify drivers for particular multimedia hardware. A driver based on PortCls functionality is called "minidriver", or "miniport driver". PortCls receives all KS client and some system internal requests, translates them and passes to a miniport driver. So a miniport driver must implement only device-specific code.

In Windows XP and later systems, on a multi-CPU/core hardware, PortCls has some bugs. To avoid problems linked to them, VAC implements a workaround, processing most streaming WavePci requests without calling to PortCls. Processing can be switched back to PortCls engine for particular cables using cable configuration parameters.

Port class driver port/miniport types

VAC, as well as most other audio drivers, is built in a "miniport driver" model while driver binary module contains only code that handles driver-specific functions. Common functions are handled by standard Windows Port Class Driver module. The "port" and "miniport" terms mean internal system interfaces provided for software module communication. They are not related to hardware ports used for device connection, or I/O ports used for low-level device communications.

To communicate with audio miniport driver, Port Class Driver provides three internal port (interface) types:

  • WaveCyclic - intended for legacy audio adapters with a single circular hardware buffer common for all clients. It is the simplest (and usually most stable) interface but also the slowest one.

  • WavePci - intended for adapters with multiple bus mastering buffers individual for each client and internal hardware mixing support. Can provide less latency than WaveCyclic but port/miniport communication is much more complex and may cause problems in some cases.

  • WaveRT - intended for modern adapters having one or more circular hardware buffers directly accessible to user-mode clients. It is the most efficient interface having almost no overhead.

WaveCyclic and WavePci exist in all Kernel Streaming implementations. WaveRT was introduced in Windows Vista so it is not available in XP and older versions.

For a user-mode Kernel Streaming client (including System Audio Engine), audio drivers that use WaveCyclic or WavePci port interfaces are indistinguishable. In Windows terms, they support a "standard streaming protocol". Kernel Streaming version of Audio Repeater application calls such drivers "legacy". On the contrary, drivers using WaveRT port interface support "looped streaming protocol" and are considered "realtime". Audio Repeater calls them "RT Audio".

Most modern audio drivers for embedded (hidden under the cover) hardware support RT Audio protocol. USB audio drivers usually support legacy one.

Prior to 4.50, VAC supported only WavePci interface (and only standard/legacy KS protocol, respectively). Starting from 4.50, each side of each cable can be configured to support any of three port/interface types. Of course, WaveRT is not available in XP.

With WaveRT, VAC supports notification events (PKEY_AudioEndpoint_Supports_EventDriven_Mode property), and clock/position registers, allowing a user-mode client to maintain a stream with no periodic kernel mode switching.

Please look here how to properly choose between miniport types and KS protocols.

Latency

When an application and a driver deal with digital audio, they don't interchange single samples/frames; instead, they use memory buffers to store blocks of audio data. To respond to a real-time event, an application or a driver needs some time, from microseconds to dozens of milliseconds. Due to buffering and processing delays, there are some time between an audio signal arrives at a device input and its digitized value arrives in application memory. This time period is called latency.

You can easily hear sound latency while talking to someone near you on the phone. The sounds made by the interlocutor, you will first hear directly, and after a fraction of a second - in the speaker of the phone, like an echo. This effect occurs due to the fact that the voice is converted into digital form, recorded in the memory of communication equipment, transmitted as data packets over communication lines, and a small delay is added at each stage.

Timing event (formerly the "interrupt")

Due to a discrete nature of the digital audio, continuous digital audio stream transfer is performed in series of blocks. To transfer a stream from each cable's output endpoint to its input endpoint, VAC has internal timing clock that generates system events allowing VAC to be called to transfer a next data block of the stream. Earlier VAC versions used timer interrupts for that. Current versions use timer events but the "interrupt" term was kept for a succession.

This is a cable configuration parameter, you can control it with VAC Control Panel GUI.

The more is event frequency, the shorter is event period, the less is block size, the smoother is stream transfer, and the less is latency. But decreasing interrupt/event period causes increase of system timer resolution and timer interrupt frequency so system overhead is increased too. VAC driver sets system timer resolution to a half of the "MS per int" parameter specified.

Large (more than 10-15 ms) values may significantly disrupt stream uniformity and even cause stream breaking.

Mixer (volume control tool)

In general, a mixer stands for a device that mixes several audio signals together and allows to control their volume, balance, timbre and other parameters. Almost each audio adapter has its own mixer that allows to select recording sources, change recording/playback levels, and so on. These features are called a "adapter mixer" or a "driver mixer".

Windows prior to Vista/Win7 have a standard mixer control application that is often called "Windows mixer". It can be invoked by double-clicking at the speaker icon in the system tray. A window with a set of sliders and check boxes is displayed. You can view the mixer in two modes: playback (the default) and recording. In playback mode, the mixer controls output audio lines available for playback and repeated (monitored) input lines routed to the output (speakers). In recording mode, the mixer controls input lines available for recording.

To configure mixer panel, open Options menu and select Properties. In the dialog, select a device you want to control, select a mode and check input/output lines you want to see in the panel.

In Windows 6.x+, system mixer became much more simpler. It can be opened by right-clicking the speaker icon in the system tray and selecting "Open Volume Mixer". You can change playback/recording levels for devices selected under an icon or for system/application sounds. Clicking on speaker icons at the bottom, you can mute/unmute audio sources.

Don't confuse Windows mixer control application with System Audio Service and System Audio Engine Windows component.

Start menu

The Windows Start menu is a hierarchical menu used to run applications and tools. Before Windows 8, can be open by the "Start" button located at the bottom left corner. In Windows 8, was replaced by the Tile Interface (Metro), but still available for the bottom-left corner click. In 8.1, a small Windows logo button is placed to the bottom left corner. In Win10, had been reverted back to a hierarchical menu view.

System tray

The System tray (notification area) is a rightmost area of the horizontal taskbar or a lowest area of the vertical taskbar, where system clock and application icons are located. In Win 5.x, this area always shows all existing icons. Starting from Win 6.x, some icons can be hidden. If there are hidden icons, double up arrow is displayed in the notification area.

Command line

The Command Line is a legacy way to specify the operations to be performed by typing textual commands. A command line consists of a command (action name) and optional parameters (arguments).

In Windows, any executable file can be invoked as a command. For example, Audio Repeater can be started with a command line, supplying the parameters.

To enter the commands, Windows offers a feature named "Command Prompt" or "Console Session". Commands are executed by the command interpreter (cmd.exe). Applications designed for a console session are named "console applications". Unlike GUI applications, they only can "print" their results as a text inside the console window. Some useful system and third-party utilities exist as a console applications only, there are no GUI analogs with the same functionality.

To work with console applications, open the Command Prompt item in the Windows System or Accessories folder from the Start Menu, or enter "cmd" in the "Run" dialog invoked by Win-R. To open a privileged console session, right-click the Command Prompt item and choose "Run as administrator".

Although you can run a console application directly from the "Run" dialog, typing or pasting the command line, the console window will be closed immediately after the console application exits. If you need to see the results, open a console window permanently.

To paste a command line into the console window, try Shift-Ins or Ctrl-V combinations. If they don't work, left-click the upper left corner, then click "Edit" and select "Paste". Press Enter to execute the command.

To execute a command/application, the Command Interpreter must know where it is located. If the application is not installed into the system but just downloaded, it is better to place it in a separate folder with a short name containing no spaces (for example, c:\tmp). To run the application (for example, ScTagQuery) from the console session, place the path prefix before the application name:

c:\tmp\sctagqry -p 1752

To run the application several times, change the default folder first:

cd c:\tmp

Then you can run the application by entering only its file name, followed by the optional arguments.

To exit the session, type "exit" and press Enter.

Command can be grouped into a script (batch/command file).

Windows Control Panel

Windows Control Panel is a set of Windows built-in control and management applets. In Windows XP, Vista or Windows 7, you can open it by clicking Start - Settings - Control Panel. In Windows 8 and 10-11, Control Panel is replaced by the Settings window that can be opened by right-click on left bottom screen corner and choosing "Settings" from the menu.

Don't confuse Windows Control Panel with VAC Control Panel.

Windows Device Manager

Device Manager is a built-in Windows Management Console applet, displaying a device list and allowing to configure/reinstall them. Device Manager can be opened in any of the following ways:

  • Right-click on My Computer and select Manage.
  • Right-click on My Computer, select Properties, open Hardware tab and click Device Manager.
  • Open Windows Control Panel, open Administrative Tools then open Computer Management.

If started from a non-privileged account, Device Manager warns that it can only show device parameters. In a privileged mode, Device Manager can perform various device control operations.

Windows Task Manager

Windows Task Manager is a standard system application that shows running applications, processes and services, CPU usage and other useful information.

Task Manager can be quickly invoked by the Ctrl-Shift-Esc key combination. Alternative way is to open the Run Dialog by pressing Win-R, and enter executable file name (taskmgr) in the "Open" field.

In default non-privileged mode, Task Manager shows only processes belonging to the current user. To show all processes running in the system, run it from a privileged (administrator's) account.

Windows Resource Monitor

Windows Resource Monitor is a standard system application that shows system resource (CPU, memory, storage, network) consumption by running processes.

To run Resource Monitor, press Win-R and enter executable file name (resmon) in the "Open" field. If your account is not a privileged one, you will get a privilege elevation prompt.

Sysinternals utilities

Sysinternals is a long-time project started and maintained by several experienced system software developers (firstly independent, then joined Microsoft developer team). They have created a lot of useful system utilities helping to collect various information and troubleshoot the system.

Some utilities that may help in VAC troubleshooting:

  • Process Explorer - shows running processes, threads, services, loaded DLLs, open handles and much more.
  • Process Monitor - shows file, registry, network and process operations performed by the processes being traced.
  • Handle (command-line) - shows open handles to files or devices.

Windows Sound Settings

Windows Sound Settings is the main audio/sound settings page introduced in Windows 10.

This page can be opened two ways:

  • Right-click the bottom left corner of the desktop. Left-click Settings, then System, then Sound.
  • Right-click the speaker icon in the system tray (bottom right area). Click Open Sound settings.

The Audio Properties Applet can be accessed via the "Sound Control Panel" link.

Open Windows Sound Settings now (Windows 10-11 only)

Microphone Privacy Settings

Windows Microphone Privacy Settings is the new settings page introduced in Windows 10 version 1803. It can be found in Settings - Privacy - Microphone.

By the description, Microphone Privacy feature should prevent application access to microphone endpoints. But actually it blocks application access to all input/recording endpoints. Windows 10-11 may turn microphone access off automatically during the upgrade, so you need to turn it back on manually.

Open Microphone Privacy Settings now (Windows 10-11 only)

If privacy settings block input/recording endpoint access, VAC Control Panel shows a red border around the Privacy Settings button.

Audio Properties Applet

Audio Properties Applet is a built-in Windows Control Panel applet that manages and controls multimedia and audio devices. Do not confuse it with the Sound Settings Page of Windows 10-11 settings.

The applet can be opened by three ways:

On the Audio tab, you can view and change default playback and recording devices, adjust DirectSound acceleration level, set speaker configuration and so on.

Clock

Each streaming device, both hardware and software, needs a clock source to transfer streaming data smoothly and uniformly. After each time period measured by the clock, a portion of streaming data is sent or received.

An asynchronous device (a computer, audio card, radio etc.) has its own clock source (a clock generator). A synchronous device (also called "slave") receives a clock signal from a master device having its own clock generator. Some audio devices (for example, professional audio cards) having their own clocks and normally working asynchronously, allow to use an external clock source to work synchronously with en external device.

When the CPU interacts with an audio adapter (on-board chip, a card, an external USB adapter) to send/receive a command or data block, CPU clock is used as a source, and the audio adapter acts as a slave (synchronous) device. But in audio streaming mode, adapter's clock is used (except for professional adapters that support external clock source) and CPU acts as a slave device.

In other words, the CPU determines when to send/receive a command or data block but an adapter determines when playback/recording of data portion completes. As the streaming has been started, CPU cannot instruct the adapter to move streaming data slower or faster; it only can change the sampling rate if the adapter supports sampling rate changing on the fly.

Emulating a virtual audio adapter, VAC driver has no its own hardware clock generator, it uses system timer to generate periodic events and form time intervals.

Clock rate difference

Two asynchronous devices, each of them has its own clock source, cannot have exactly the same clock frequency (rate, or speed). Since their clocks are different, clock rates are always slightly different, and actual sampling rates are slightly different too. As a pair of a real watches, two asynchronous audio devices cannot have their data streams to flow in a strict match, and one stream slowly forestalls while other stream is late. As a result, there will be some gaps or losses in the overall audio data stream.

VAC supports cable clock correction features to minimize this effect.

Cable clock correction

To compensate clock rate difference effects, VAC driver offers cable clock correction features. There are two kinds of clock correction types: permanent and temporary.

The permanent correction amount can be set via VAC Control Panel and will persist until it is changed next time. It can be used to adjust cable clock to be closest to the clock of a particular hardware device, when a third-party application uses its endpoint together with a Virtual Cable endpoint at the same time. Unfortunately, being set permanently, it cannot fully eliminate clock rate difference effect, it can just reduce it. VAC-aware applications might adjust it dynamically via VAC driver API, but it is dangerous: if the application that had changed the amount, forgets to restore it back, cable clock remains to run at a reduced or increased rate, affecting all subsequent operations.

The temporary correction (so-called client clock control, because it can be controlled only by a single client process) is available only for VAC-aware applications (for example, Audio Repeater). It is implemented in VAC filter API, allowing the client to adjust cable clock according to the clock rate of another audio device in the pair. As the client is disconnected, the temporary correction is automatically eliminated, so only the permanent correction remains in effect. The status of client clock correction can be seen in the cable information window of VAC Control Panel.

Both clock correction parameters are specified as factors/multipliers. The final correction amount is their product. For example, if the permanent correction parameter is specified as 1.02 (102% or +2%), and client clock correction parameter is specified as 0.97 (97% or -3%), the actual correction amount will be 1.02 * 0.97 = 0.9894 (98.94% or -1.06%).

System timer (system clock)

To keep world time and synchronize various tasks, Windows maintains internal system time using hardware clock generators. A piece of kernel code responsible for timekeeping is called system timer or system clock.

Drivers of pure virtual (software-only) devices having no their own hardware clock generators, use system timer to generate periodic events and form desired time intervals. Since the hardware audio devices use their own clocks, there always are clock rate differences between any two devices. Even the virtual devices served by different drivers have such rate differences because they don't use a common, standardized calculation algorithm, and the calculation accuracy is always limited.

A shortest time period that can be formed using a timer, is called timer resolution (granularity). Windows timer resolution can be changed by drivers and/or applications. The less granularity the timer has, the more exact time intervals can be formed but the more overhead is introduced, and vice versa.

VAC driver sets system timer resolution to a quarter of the smallest time event duration among all cables. Current resolution requested by the driver is displayed in VAC Control Panel application. System timer resolution changes are registered in the event log. Actual resolution set by the system is also displayed in VAC Control Panel.

System default (preferred) audio device/endpoint

Windows supports up to 256 different MME/DirectSound/WASAPI audio endpoints for audio input and output. Any of them can be assigned as system default, or preferred device, that is used by default. Such assignment can be performed using Audio Properties applet. In application's device selection menus, default device appears as the "Microsoft Sound Mapper", "Wave Mapper", "System Default" or something like.

If an application issues a request to a default device, Windows routes it to the actual device that was previously set as a default.

In latest Windows 10-11 releases, default devices can be set on a per-application basis. In Windows 6.x+, default devices are set on a per-system basis. In Windows 5.x, default audio devices are set on a per-user basis.

See the system default device issues for details.

Default audio format for a device/endpoint

To support shared access mode, System Audio Engine creates a single, common pin instance for all client applications, using a common audio format. But the engine does not know which formats will be used in the future so it cannot choose a best format automatically.

In Windows 5.x, common format selection is based on the first connection request. For a recording request, the common format is the same as requestor's format. For a playback request, the common format is most "wide" format supported by the pin. If the pin supports high sampling rates, bit depths and number of channels, using most usual formats involves unnecessary format conversions, overhead and audio degradation.

To control these issues, Windows 6.x+ maintain a default audio format for each playback and recording device. When a device is opened in shared mode, System Audio Engine always uses its default format. So you can freely choose between default formats to achieve either best audio quality, or best performance, or best compatibility.

Default formats can be chosen in the "Sound" applet from Windows Control Panel. Open "Playback" or "Recording" tab, double-click a device and select "Advanced" tab.

The "Listen" feature

Starting from Win7, Windows implements the Listen feature intended to monitor (hear a signal from) a recording audio device. When this feature is turned on for a particular recording device, System Audio Engine starts to continuously record from the device and immediately play back recorded signal to the given playback device. Thereby this feature is similar to Audio Repeater applications, but does not allow to utilize dynamic cable clock adjustment features.

To use Listen feature for a device, open the Audio Properties Applet, select Recording tab, double-click an endpoint, open the Listen tab, check the "Listen to this device" checkbox and select desired output device (usually speakers).

With this feature, you can either listen for a signal coming from a Virtual Cable (enabling Listen feature for a source endpoint for the cable and choosing headphones/speakers as a target device) or supply a Virtual Cable with a signal (enabling Listen feature for a source device and choosing Virtual Cable as a target device).

Once you turned Listen feature on, it remains active until explicitly turned off. Since there is no visible activity indicator, it is easy to forgot about it. If you have wrong/unwanted audio signals played back and/or recorded, make sure that there is no Listen feature activated by mistake for some devices used.

Speaker pin type

WDM/KS drivers of most audio adapters provide KSNODETYPE_SPEAKER type for their playback pins. Such pin type allows to configure channel distribution with the "Configure" button of Windows Audio Properties Applet but Audio Endpoint Builder always assigns the "Speakers" name for endpoints linked to these pins. If an audio adapter (real or virtual) has more than a single output line, it is impossible to distinguish them by name.

As a workaround, VAC uses KSNODETYPE_LINE_CONNECTOR for playback pins by default. It allows to use an unique name like "Line N" for each output line but does not allow to use system channel configuration features. Additionally, some channel processing problems may occur in applications using DirectSound.

To control playback pin type, use the "Enable spk pin" parameter in cable configuration section of VAC Control Panel.

Please note that enabling speaker pin type for some cables under Win 6.x causes playback endpoints of these cables to have the same "Speakers" name. Most audio applications distinguish between audio devices only by their names, not unique internal identifiers. Therefore, if you enable speaker pin types for two Virtual Cables and both these cables have the same "Speakers (Virtual Audio Cable)" name, audio application might confuse between these cables. Most probably an application will lose proper cable selection between running sessions. 

Administrator account

A privileged (superuser) account in the system. Applications that run from this account can perform various administrative actions. Since VAC is a device driver, a special privilege is required to install it into the system or to restart it, applying a configuration change.

In Windows 5.x, any member of the "Administrators" group has such privilege and can perform these actions.

In Windows 6.x+, there is UAC feature (enabled by default). Only built-in administrator account (named "Administrator") can perform all privileged actions without additional measures. Other accounts marked as "administrator" are "virtually privileged" and can perform privileged operations only by privilege elevation. The difference is only that a plain user account requires administrator password for privilege elevation while account marked as administrator does not.

If UAC is enabled, Windows tries to automatically elevate privilege level for applications marked as privileged (such applications cannot work from a non-privileged account). VAC installer and uninstaller are marked as privileged applications so Windows will ask for privilege elevation if you don't have enough privileges.

But VAC Control Panel is not marked as privileged because it can perform both privileged and non-privileged actions. So if you run VAC Control Panel from an account other than "Administrator" (even marked as administrator), only non-privileged actions will be available. You will not be able to perform actions that require driver or System Audio Service restart.

To start an application behalf on the built-in Administrator account, either log in using "Administrator" as a user name or right-click an application icon and select "Run as administrator" item.

Remote session

Initially, Windows supported only local work session when a user sits immediately at the console (display, keyboard, and mice) of a computer running Windows. Starting from Windows 2000, Remote Desktop Sessions (RDS) were introduced, when a user sits at a "terminal" (or a "client") computer but all user's actions are sent to a remote "terminal server" computer running Windows and the results (screen contents, sounds etc.) are sent back to a client computer. Terminal connections are established via Remote Desktop Protocol (RDP). Using such connection, a user can work with a remote computer as if he/she worked with it locally. But there are some limitations.

There are two types of Windows remote connections: separate (isolated) and direct ("console"). For a separate connection, Windows establishes a new login session, isolating user environment from other users. For a direct connection, remote user is connected to a main console session environment.

With desktop-sharing products like TightVNC, TeamViewer, LogMeIn, Remote Administrator, connections are established directly by translating screen updates from the server to your computer and translating your mouse/keyboard actions from your computer to the server. No separate logon session is created and your computer acts as a "remote console" for the server computer. Having such connection, you can use all server resources, feeling yourself like sitting at the server immediately.

Remote Connection/Session support in Windows allows to transfer sounds from/to the remote computer. By default, as you establish a session, all native audio endpoints of the remote system are hidden, and a virtual endpoints named like "Remote audio" or "RDP Audio" are created instead. If a remote audio application plays a sound to such endpoint, the sound is transferred to your local system, and you can hear it in your speakers/headphones. If an application records a sound from such endpoint, it is transferred from your local microphone or source line to the remote system.

You can turn such behavior off in the "Local resources - Remote audio" settings of remote connection dialog before establishing a remote session. In such case, no virtual audio endpoints are created for the session, and all audio applications will see the same audio endpoint set as in a typical local session.

There are some compatibility issues and even troubles related to remote sessions/connections.

Virtualized environment

An environment created by virtualization tools is called "virtualized environment".

Different virtual machine software offer different layers of platform virtualization. Some platform virtualization products offer full hardware virtualization (VMware, Virtual Box, Parallels Workstation, QEMU, Bochs) while most other products like Parallels Virtuozzo (mostly used in Windows VPS/VDS technology) offer only a partial virtualization (for example, OS level virtualization).

There are some compatibility issues related to virtualized environment.

Digital signature

Digital signature is a special digital sequence to check data authenticity and consistency. If data are illegally modified (tampered with), their signature becomes invalid and it can be easily checked.

Windows uses three types of digital signatures for executable (PE) files:

  • Publisher signature is applied directly by a software publisher (vendor) using a publisher certificate, and certifies the file as authentic one. A publisher certificate is obtained once and then used to sign all developed software. VAC executable modules are signed by the EV code signing certificate. This signature is enough to use a kernel-mode driver in Windows Vista, 7, 8 and 8.1.
  • Attestation signature is applied by Microsoft Hardware Dev Center after registering a kernel-mode driver in the database and performing some basic checks. This signature is required to use a kernel-mode driver in Windows 10 and later.

  • Windows Logo signature is applied by Microsoft Windows Hardware Quality Lab (WHQL) and certifies the file as passed Windows compliance tests. A software package is submitted to WHQL to test and if tests are successful, WHQL signs executable files as Windows-compliant. This signature is required to use a kernel-mode driver in Server 2012 and later server systems in Secure Boot mode. To use in desktop systems, or in Legacy Boot mode, this signature is not mandatory. VAC driver is not signed by it.

If you have digital signature problems with VAC driver, please read the guidelines.

Buffer overflows/underflows

A buffer overflow occurs if the buffer has no room for data should be placed into it (some data will be lost from a flow). An underflow occurs if the buffer has not enough data should be retrieved from it (data flow will contain a gap). If a buffer is filled at one side and drained at another side, overflows will occur if the filling flow is more "thick" than the draining one, and vice versa.

Compared with car fuel tank usage, an overflow occurs if your car tank has no more room to accommodate fuel provided via the filling hose. If the hose has no special protective cutoff, fuel excess will flow out and you lose it. On the contrary, if you don't fill a tank at the proper time, it runs out and your car will stop suddenly, requiring a maintenance.

In a multi-node or multi-layer chain, stream data overflow/underflow does not necessary mean a crack, pop or glitch. On the contrary, no data overflows/underflows do not mean that there are no cracks, pops or glitches. Data processing node/layer (a driver, an application, a service, a plug-in etc.) that registers an overflow/underflow knows nothing about other nodes/layers in the chain; if they use extensive buffering, it may prevent audio stream from breaking. Therefore, if a node/layer had successfully passed data portion to another node/layer in the chain, there may be buffering problems in other nodes/layers, causing stream breaks.

If packet mode is not used in RT Audio protocol, the client doesn't inform the driver about reading from or writing to the buffer, so the driver doesn't count overflows/underflows at all. If packet mode is used, both parties inform each other of their buffer operations.

Affinity restriction

PortCls system driver has a bug causing a significant drop of audio data processing efficiency on multi-CPU or multi-core hardware with WavePci port/miniport type. VAC implements multiple subdevices where most audio drivers implement only a single subdevice so this bug does not occur with most hardware audio drivers.

The problem is described in details in microsoft.public.development.device.drivers newsgroup. You can find the post, entering "skype portcls getmapping" in Google search string. Unfortunately, there is no stable link.

The best way to avoid this problem is to use VAC internal data processing engine instead PortCls one (internal engine is used by default).

If PortCls engine should be used for some reason, only possible workaround is to disable concurrent request processing in PortCls. For this, VAC offers some driver parameters and cable parameters to restrict CPU affinity of worker threads, belonging to System Audio Engine or to any client. Create the appropriate driver parameter value in the registry, setting it to a number of cables causing the problem, or create affinity restriction parameters for selected cables. Then restart VAC driver to propagate parameter changes.

WaveRT and WaveCyclic port types are not affected by this bug.

Worker thread

A worker thread is a driver's thread that performs all stream processing work (data transfer, volume control, format conversion etc.). Different threads can be executed on different CPUs/cores, achieving optimal performance and load distribution. By default, VAC driver starts one worker thread per physical CPU/core so all available CPU power can be used to process stream data.

By default, worker thread scheduling priority is relatively high (between a normal thread priority and highest possible priority value). It guarantees that stream data processing will take advantage over most regular threads but not to consume all possible CPU time. If the priority is set to Auto, VAC driver uses audio stream resource management offered by the system Port Class Driver.

You can limit number of these threads and/or change their priority using VAC Control Panel application.

Cable and/or stream problem

Cable or stream problem means a condition that prevents (or may prevent) normal cable/stream functioning.

A problem is indicated by red exclamation point. Cable problems are indicated in cable list window, left to cable number. Stream problems are indicated in stream list window, left to stream identifier.

If one or more cable streams have problems, cable problem is also indicated. If cable problem is indicated but there are no stream problems, the problem concerns the cable itself.

If there are no explicit messages describing the problem, see driver event log for details. Problem related messages are indicated by "!".

Windows boot process

Once, each Windows shutdown supposed a complete destruction of all system components stored in RAM. Respectively, every system startup (boot) was performed "from scratch", by reading (loading) all required code and data from the disk, and initializing them. A system restarted (rebooted) this way was always clean from any temporary objects that don't persist on the disk. This is often called the "full" or "clean" restart.

Later, Windows began to support hibernation and hybrid shutdown modes, while the computer looks like completely turned off, but the live system state is not built from scratch. Instead, some (or all) code/data in the memory are saved to the disk during shutdown, and read back during boot process in the same state, without initialization, to make boot process quicker.

In case of such optimized, partial boot/startup, not all internal system and/or application data structures are rebuilt. If some actions (for example, software product installation/uninstallation, system update etc.) require a complete system restart, and the system is configured for hibernation or hybrid shutdown, such partial  shutdown/startup sequence will not perform full system restart, and the configuration process may not be finished correctly.

Therefore, the only way to perform a complete (cold) Windows restart, is choosing the "restart" option from the Power Menu. Other ways (choosing "shutdown", pressing the power button etc.) don't guarantee full system restart. You can use alternate ways only if you are absolutely sure that your system is configured for a full shutdown followed by a full, complete startup process.

Windows versions

Modern Windows NT releases are the following:

  • 5.0 - Windows 2000 (Win2k)
  • 5.1 - Windows XP (WinXP)
  • 5.2 - Windows 2003 Server (Win2k3)
  • 6.0 - Windows Vista, Server 2008
  • 6.1 - Windows 7 (Win7), Server 2008 R2
  • 6.2 - Windows 8 (Win8)
  • 6.3 - Windows 8.1 (Win8.1)
  • 10.0 - Windows 10 (Win10), Server 2012/2016/2019, Windows 11 (Win11), Server 2022

So the "Win 5.x" means Windows 2000/XP/2003 and "Win 6.x" means Vista, Server 2008, Windows 7 and Windows 8. "Win 6.x+" means Vista and all later versions (Win 7, Win 8 and so on).

Windows 11 (10.0.22000, 21H2) is in fact a next version of Windows 10, it has no significant differences from the previous release (21H1).

Windows 95/98/ME have 4.x versions and don't belong to Windows NT family.

In general, a property added or removed in a particular version, persists in next versions too, unless explicitly stated.