introductiondownload.microsoft.com/.../mfaudio_walkthrough.docx · web viewthe topology usually...

MFAudioFilter Walkthrough: C++Capturing Audio Streams with a Media Foundation Audio Filter

About This Walkthrough In the Kinect™ for Windows® Software Development Kit (SDK), the MFAudioFilter sample shows how to capture an audio stream from the microphone array of the Kinect for Xbox 360® sensor by using the MSRKinectAudio Microsoft DirectX® media object (DMO) in filter mode in a Windows Media® Foundation topology. This document is a walkthrough of the MFAudioFilter sample.Resources For a complete list of documentation for the Kinect for Windows SDK Beta, plus related reference and links to the online forums, see the beta SDK website at: http://kinectforwindows.orgContentsIntroduction....................................................................................................................................2Program Basics...............................................................................................................................2Create and Configure the MSRKinectAudio DMO............................................................................3

Configure System Mode...............................................................................................................4Configure Source Mode................................................................................................................5Configure the Array Descriptor....................................................................................................5Configure Feature Mode..............................................................................................................6Configure Noise Suppression.......................................................................................................6Configure Automatic Gain Control...............................................................................................6Configure Input and Output Types...............................................................................................7

Incorporate the MSRKinectAudio DMO into a Media Foundation Topology.....................................8Create the Encoder and Sink Objects........................................................................................10Create the Source Object..........................................................................................................12Create the Topology..................................................................................................................15

Capture the Audio Stream............................................................................................................18Resources.....................................................................................................................................21License: The Kinect for Windows SDK Beta is licensed for non-commercial use only. By installing, copying, or otherwise using the beta SDK, you agree to be bound by the terms of its license. Read the license.Disclaimer: This document is provided “as-is”. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.© 2011 Microsoft Corporation. All rights reserved.Microsoft, DirectShow, DirectX, Kinect, MSDN, Windows, and Windows Media are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.

http://kinectforwindows.org/KinectSDK-ToU

http://kinectforwindows.org/

MFAudioFilter Walkthrough: C++ – 2

IntroductionThe audio component of the Kinect™ for Xbox 360® sensor is a four-element microphone array. An array provides some significant advantages over a single microphone, including more sophisticated acoustic echo cancellation and noise suppression, and the ability to determine the direction of a sound source.The primary way for C++ applications to access the Kinect sensor’s microphone array is through the MSRKinectAudio Microsoft® DirectX® media object (DMO). A DMO is a standard COM object that can be incorporated into a Microsoft DirectShow® graph or a Windows Media® Foundation topology. The Kinect for Windows® Software Development Kit (SDK) Beta includes an extended version of the Windows microphone array DMO—referred to here as the MSRKinectAudio DMO—to support the Kinect microphone array.The MFAudioFilter sample shows how to capture an audio stream from the Kinect sensor’s microphone array by using the MSRKinectAudio DMO in filter mode in a Windows Media Foundation topology. This document is a walkthrough of the MFAudioFilter sample. To prepare for and understand this walkthrough, read “MicArrayEchoCancellation Walkthrough,” which describes how to use the MSRKinectAudio DMO in source mode.Note Media Foundation is COM-based, so this document assumes that you are familiar with the basics of how to use COM objects and interfaces. You do not need to know how to implement COM objects. For the basics of how to use COM objects, see “Programming DirectX with COM” on the Microsoft Developer Network (MSDN®) website. That MSDN topic is written for DirectX programmers, but the basic principles apply to all COM-based applications.

Program BasicsMFAudioFilter is installed with the Kinect for Windows Software Development Kit (SDK) Beta samples in %KINECTSDK_DIR%\Samples\KinectSDKSamples.zip. MFAudioFilter is a C++ console application that is implemented in MFAudioFilter.cpp. The basic program flow is as follows:1. Initialize and configure the MSRKinectAudio DMO.2. Incorporate the DMO into a Media Foundation topology.3. Record an audio stream from the microphone array and write the data to a Windows

Media Audio (.wma) file.The following is a lightly edited version of the MFAudioFilter output:Recording using Media FoundationSound output will be written to file: C:\Code_Projects\NUI\samples\unmanaged\MFAudioFilter\MFAudioFilter.wmaMESessionTopologySetMedia session event: 112MESessionTopologyStatus: MF_TOPOSTATUS_READYMESessionStarted

Recording. Press 's' to stop.MESessionStopped.MESessionClosed

http://msdn.microsoft.com/en-us/library/ee419007(VS.85).aspx

http://msdn.microsoft.com/en-us/library/ee419007(VS.85).aspx

http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/docs/MicArrayEchoCancellation_Walkthrough.docx

http://research.microsoft.com/en-us/um/redmond/projects/kinectsdk/docs/MicArrayEchoCancellation_Walkthrough.docx


Beginning with “MESessionTopologySet” in this output, most of the output represents various Media Foundation events. The exception is the line beginning with “Recording,” which informs the user how to stop the session.The remainder of this document walks you through the application.Note This document includes code examples, most of which have been edited for brevity and readability. In particular, most routine error correction code has been removed. For the complete code, see the MFAudioFilter sample. Hyperlinks in this walkthrough refer to content on the MSDN website.

Create and Configure the MSRKinectAudio DMOThe application’s entry point is _tmain, which manages the overall program execution with most of the details handled by private functions. The first step is to create an instance of the MSRKinectAudio DMO, as follows:int __cdecl _tmain(int argc, const TCHAR ** argv){ HRESULT hr = S_OK; CoInitialize(NULL); IMediaObject* pDMO = NULL; IPropertyStore* pPS = NULL; IMFTransform* pMFT = NULL;

SetPriorityClass (GetCurrentProcess(), HIGH_PRIORITY_CLASS);

CoCreateInstance(CLSID_CMSRKinectAudio, NULL, CLSCTX_INPROC_SERVER, IID_IMediaObject, (void**)&pDMO); pDMO->QueryInterface(IID_IPropertyStore, (void**)&pPS); pDMO->QueryInterface(IID_IMFTransform, (void**)&pMFT); ...}MFAudioFilter first calls the SetPriorityClass function and sets the process’s priority to HIGH_PRIORITY_CLASS. This helps ensure that the microphone is not preempted during the capture process.MFAudioFilter calls the CoCreateInstance function to create an instance of the MSRKinectAudio DMO and obtain its IMediaObject interface, which supports the methods that control the DMO. MFAudioFilter then calls the DMO’s QueryInterface method to obtain the following two additional interface pointers: The IPropertyStore interface provides access to the DMO’s property store, which

contains a set of key-value pairs.You configure the DMO by setting the appropriate keys.

The IMFTransform interface is used to incorporate the DMO into a Media Foundation topology, as discussed later in this document.

http://msdn.microsoft.com/en-us/library/ms696260(VS.85).aspx

http://msdn.microsoft.com/en-us/library/bb761474(VS.85).aspx

http://msdn.microsoft.com/en-us/library/dd406926(VS.85).aspx




The next section of _tmain is a series of code blocks that configure the DMO by assigning values—as PROPVARIANT structures—to the appropriate property keys. The general procedure for setting a key is as follows:1. Declare a PROPVARIANT structure and initialize it by calling the PropVariantInit

function.2. Specify the key’s data type by assigning the appropriate VARENUM value to

PROPVARIANT.vt.For example, VT_I4 specifies a 4-byte signed int.

3. Assign a value to the structure’s value member.The member name depends on PROPVARIANT.vt. For VT_I4, the corresponding value member is PROPVARIANT.lVal.

4. Call the DMO’s IPropertyStore::SetValue method to add the key-value pair to the property store.

5. Call the PropVariantClear function to free the PROPVARIANT structure.MFAudioFilter configures the DMO, as described in the following sections.

Configure System ModeThe system mode key, MFPKEY_WMAAECMA_SYSTEM_MODE, determines the DMO’s basic operating mode and is set as follows:int __cdecl _tmain(int argc, const TCHAR ** argv){ ... PROPVARIANT pvSysMode; PropVariantInit(&pvSysMode); pvSysMode.vt = VT_I4; pvSysMode.lVal = (LONG)(2); pPS->SetValue(MFPKEY_WMAAECMA_SYSTEM_MODE, pvSysMode); PropVariantClear(&pvSysMode); ...}Four system modes are available, each of which has a corresponding value, as shown in the following table.Mode ValueSingle-channel with acoustic echo cancellation (AEC)

0

Microphone array 2Microphone array with AEC 4Single-channel with automatic gain control (AGC)

5

MFAudioFilter configures the DMO for a microphone array without AEC.Note The MSRKinectAudio DMO currently cannot be used in a Media Foundation topology with AEC enabled, because AEC requires input from both the microphone array and the speaker. The microphone array signal can be represented in a topology by a standard Media Foundation audio source object, but currently no such object is available to represent the speaker signal.

http://msdn.microsoft.com/en-us/library/ff819427(VS.85).aspx

http://msdn.microsoft.com/en-us/library/aa380073(VS.85).aspx

http://msdn.microsoft.com/en-us/library/bb761475(VS.85).aspx

http://msdn.microsoft.com/en-us/library/42da4048-ad98-45b4-8541-d6c912e8f5cf(VS.85)




Configure Source ModeThe source-mode key, MFPKEY_WMAAECMA_DMO_SOURCE_MODE, specifies whether the DMO is used as a source or filter, as follows: PROPVARIANT pvSourceMode; PropVariantInit(&pvSourceMode); pvSourceMode.vt = VT_BOOL; pvSourceMode.boolVal = VARIANT_FALSE; pPS->SetValue(MFPKEY_WMAAECMA_DMO_SOURCE_MODE, pvSourceMode);MFAudioFilter specifies filter mode, which allows the DMO to be treated as a Media Foundation transform and incorporated into a topology.

Configure the Array DescriptorThe array descriptor key, MFPKEY_WMAAECMA_MICARRAY_DESCPTR, specifies the array geometry, as follows: ... PROPVARIANT pvGeometry; PropVariantInit(&pvGeometry); pvGeometry.vt = VT_BLOB; pvGeometry.blob.cbSize = sizeof(maKinectGeometry); pvGeometry.blob.pBlobData = (BYTE*)&maKinectGeometry; pPS->SetValue(MFPKEY_WMAAECMA_MICARRAY_DESCPTR, pvGeometry); ...The geometry is described by a KSAUDIO_MIC_ARRAY_GEOMETRY structure, which is a VT_BLOB type as far as PROPVARIANT structures are concerned. The microphone coordinates are contained in a KSAUDIO_MICROPHONE_COORDINATES structure. However, KSAUDIO_MIC_ARRAY_GEOMETRY specifies coordinates for only one microphone, so MFAudioFilter defines a KINECT_GEOMETRY structure and manually allocates space for the other three microphones, as follows:typedef struct{ KSAUDIO_MIC_ARRAY_GEOMETRY geometry; KSAUDIO_MICROPHONE_COORDINATES coordinates[3];} KINECT_GEOMETRY;

KINECT_GEOMETRY maKinectGeometry ={ 256, 0, 0, 0, -8726, 8726, 200, 7200, 4, {2, 0, -113, 0, 0, 0}, {{2, 0, 36, 0, 0, 0}, {2, 0, 76, 0, 0, 0}, {2, 0, 113, 0, 0, 0}}};The structure definition can be found near the beginning of MFAudioFilter.cpp.

Configure Feature ModeThe feature mode key, MFPKEY_WMAAECMA_FEATURE_MODE, toggles feature mode, as follows: ... PROPVARIANT pvFeatrModeOn; PropVariantInit(&pvFeatrModeOn);



http://msdn.microsoft.com/en-us/library/ff537087(v=VS.85).aspx

http://msdn.microsoft.com/en-us/library/ff819423(v=VS.85).aspx





pvFeatrModeOn.vt = VT_BOOL; pvFeatrModeOn.boolVal = VARIANT_TRUE; pPS->SetValue(MFPKEY_WMAAECMA_FEATURE_MODE, pvFeatrModeOn); PropVariantClear(&pvFeatrModeOn); ...To override the default settings on several properties, feature mode must be enabled. MFAudioFilter turns on feature mode so that it can change the noise suppression and automatic gain control values.

Configure Noise SuppressionThe noise suppression key, MFPKEY_WMAAECMA_FEATR_NS, toggles noise suppression, as follows: ... PROPVARIANT pvNoiseSup; PropVariantInit(&pvNoiseSup); pvNoiseSup.vt = VT_I4; pvNoiseSup.lVal = 1; pPS->SetValue(MFPKEY_WMAAECMA_FEATR_NS, pvNoiseSup); PropVariantClear(&pvNoiseSup); ...A value of 1 enables noise suppression, and a value of 0 disables it. MFAudioFilter disables noise suppression.

Configure Automatic Gain ControlThe AGC key, MFPKEY_WMAAECMA_FEATR_AGC, toggles AGC, as follows: ... PROPVARIANT pvAGC; PropVariantInit(&pvAGC); pvAGC.vt = VT_BOOL; pvAGC.boolVal = VARIANT_TRUE; CHECKHR(pPS->SetValue(MFPKEY_WMAAECMA_FEATR_AGC, pvAGC)); PropVariantClear(&pvAGC); ...MFAudioFilter turns on AGC.

Configure Input and Output TypesTo use a DMO in a Media Foundation topology, you must specify the DMO’s input and output types. MFAudioFilter specifies these types, as follows: ... IMFMediaType* pOutputMediaType; MFCreateMediaType(&pOutputMediaType); pOutputMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio); pOutputMediaType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_PCM); pOutputMediaType->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, 1); pOutputMediaType->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, 16000); pOutputMediaType->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 32000); pOutputMediaType->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, 2); pOutputMediaType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, 16);

http://msdn.microsoft.com/en-us/library/ff819412

http://msdn.microsoft.com/en-us/library/ff819420


pMFT->SetOutputType(0, pOutputMediaType, 0);

IMFMediaType* pInputMediaType; MFCreateMediaType(&pInputMediaType); pInputMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio); pInputMediaType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float); pInputMediaType->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, 4); pInputMediaType->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, 16000); pInputMediaType->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 256000); pInputMediaType->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, 16); pInputMediaType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, 32); pMFT->SetInputType(0, pInputMediaType, 0); ...To create an input or output type, call the MFCreateMediaType function and pass it the address of a pointer to an IMFMediaType interface. MFCreateMediaType returns an empty object, and you use the IMFMediaType interface’s IMFAttributes::SetGUID or IMFAttributes::SetUINT32 method to specify the various settings as key-value pairs. MFAudioFilter uses the settings in the following table.Setting ValueMajor media type AudioMedia subtype (which defines the audio format) Pulse modulation control (PCM)Number of channels 1 (input) or 4 (output)Average samples/second 16,000Average bytes/second 32,000Block alignment (specifies the number of bytes that the DMO processes at a time)

2

Resolution 32 bitsFor a list of available keys, see “Alphabetical List of Media Foundation Attributes” on the MSDN website.After configuring the DMO, MFAudioFilter calls the private MFRecord method, which incorporates the DMO into a Media Foundation topology, captures an audio stream from the Kinect sensor’s microphone array, and writes the processed stream to a .wma file, as described in the following sections.

Incorporate the MSRKinectAudio DMO into a Media Foundation TopologyA Media Foundation topology is essentially a processing pipeline that consists of a series of objects that are called nodes. The input stream is passed from one node to the next, with each node performing one step in the overall process. There are three node types: Source node

Source nodes represent an audio source, such as a microphone. They get an audio stream from the source and pass it to a downstream transform or sink node.Source nodes are represented by an IMFMediaSource interface.


http://msdn.microsoft.com/en-us/library/dd317906(v=VS.85).aspx




Transform nodeTransform nodes take an audio stream from an upstream node, either a source node or another transform node. They process the stream—for example, by filtering or encoding the stream—and pass the processed stream to a downstream transform or sink node.Transform nodes are represented by an IMFTransform interface.

Sink nodeSink nodes send the processed to its final destination, usually to a render device such as a speaker or to a file.Sink nodes are represented by an IMFMediaSink interface.

Applications use the nodes standard interface to manage the node. For example, applications can use a source node’s IMFMediaSource interface to—among other things—start, stop, or pause the audio stream.The topology usually starts with a source node, which gets a stream from an audio source. It ends with a sink node, which sends the stream to an output device such as a speaker or a file on the hard disk drive. In between, the topology can have any number of nodes, called transform nodes, to handle tasks such as filtering, de-multiplexing, and so on. The topology is represented by an IMFTopology interface, which applications can use for purposes such as constructing the topology.In general, topologies can have multiple branches and an arbitrary number of nodes. MFAudioFilter uses a simple linear topology with four nodes, as shown in Figure 1.

Figure 1. MFAudioFilter Topology

The nodes serve the following purposes:1. The audio source object gets the audio stream from the Kinect sensor’s microphone

array.2. The MSRKinectAudio DMO operates as a transform node and performs noise

suppression and AGC on the audio stream from the source object.3. The WMA encoder node converts the output stream from the MSRKinectAudio DMO

to .wma file format.4. The audio sink node takes the stream from the WMA encoder and writes it to a .wma

file._tmain passes the following to MFRecord: The MSRKinectAudio DMO’s IMFTransform interface (pMFT). An IMFMediaType interface for the DMO’s output media type (pOutputMediaType). The output file name.

Audio Source(IMFMediaSource)Audio Source

(IMFMediaSource)

MSRKinectAudio DMO

(IMFTransform)

MSRKinectAudio DMO

(IMFTransform)

WMA Encoder(IMFMediaTransform)

WMA Encoder(IMFMediaTransform)

Audio Sink(IMFMediaSink)Audio Sink

(IMFMediaSink)


http://msdn.microsoft.com/en-us/library/ms694262(v=VS.85).aspx



The first part of MFRecord sets up the topology, as follows:HRESULT MFRecord(IMFTransform* pFilter, IMFMediaType* pFilteredMediaType, const TCHAR* szOutfile){ HRESULT hr; IMFTransform *pEncoder = NULL; IMFMediaSink *pSink = NULL; IMFMediaSource *pMicSource = NULL; IMFTopology *pTopology = NULL; int iMicDevIdx = -1; TCHAR szOutfileFullName[MAX_PATH];

MFStartup(MF_VERSION);

DWORD dwRet = GetFullPathName(szOutfile, (DWORD)ARRAYSIZE(szOutfileFullName), szOutfileFullName,NULL);

CreateEncoderAndSink(pFilteredMediaType, szOutfileFullName, &pEncoder, &pSink);

hr = GetMicArrayDeviceIndex(&iMicDevIdx); CreateSourceFromAudioDevice(iMicDevIdx, &pMicSource);

hr = CreateTopology(pMicSource, pFilter,\ pEncoder, pSink, &pTopology); ...}MFRecord calls the MFStartup function to initialize Media Foundation and then creates a fully-qualified path for the output file. MFRecord calls several private methods to create objects for the remaining nodes—audio source, WMA encoder, and audio sink—and constructs the topology, as described in the following sections.

Create the Encoder and Sink ObjectsMFRecord passes CreateEncoderAndSink the interfaces for the DMO’s output media type, the file name, and the addresses of pointers to the IMFTransform and IMFMediaSink interfaces that will represent the encoder and sink nodes, respectively. CreateEncoderAndSink creates encoder and sink objects and returns their interfaces.The Encoder NodeCreateEncoderAndSink starts with the encoder node, as follows:HRESULT CreateEncoderAndSink(IMFMediaType *pFilteredMediaType, const TCHAR *szOutfileFullName, IMFTransform **ppEncoder, IMFMediaSink **ppSink){ IMFTransform *pEncoder = NULL; ...

CoCreateInstance(__uuidof(CWMAEncMediaObject), NULL,



CLSCTX_INPROC_SERVER, __uuidof(IMFTransform), (void**)&pEncoder); pEncoder->SetInputType(0, pFilteredMediaType,0));

IMFMediaType* pOutputMediaType = NULL; int iType = 0; while(true) { pEncoder->GetOutputAvailableType(0,iType++,&pOutputMediaType); GUID guidSubtype; pOutputMediaType->GetGUID(MF_MT_SUBTYPE, &guidSubtype); if (guidSubtype == MFAudioFormat_WMAudioV8) break; SAFE_RELEASE(pOutputMediaType); } pEncoder->SetOutputType(0, pOutputMediaType,0); ...}CreateEncoderAndSink creates a Windows Media Audio Encoder object and obtains its IMFTransform interface. It then calls the node’s IMFTransform::SetInputType method to specify the input type. The input type is the media type for the upstream node—the MSRKinectAudio DMO in this case—so the media type is the one specified earlier for the DMO’s output stream.CreateEncoderAndSink iterates through the encoder’s available output types until it finds one that has the desired format—MFAudioFormat_WMAudioV8 in this example. CreateEncoderAndSink then passes the corresponding type object’s IMFMediaType interface to IMFTransform::SetOutputType to specify the encoder’s output type.The Sink ObjectThe remainder of CreateEncoderAndSink creates the sink object, as follows:HRESULT CreateEncoderAndSink(...){ IMFByteStream *pStream = NULL; IMFMediaSink *pSink = NULL; IMFASFContentInfo *pContentInfo = NULL; IMFASFProfile *pProfile = NULL; IMFASFStreamConfig *pStreamConfig = NULL;

...

MFCreateFile(MF_ACCESSMODE_WRITE, MF_OPENMODE_DELETE_IF_EXIST, MF_FILEFLAGS_NONE, szOutfileFullName, &pStream); MFCreateASFMediaSink(pStream,&pSink); pSink->QueryInterface(__uuidof(IMFASFContentInfo), (void**)&pContentInfo); MFCreateASFProfile(&pProfile); pProfile->CreateStream(pOutputMediaType, &pStreamConfig); pStreamConfig->SetStreamNumber(1); pProfile->SetStream(pStreamConfig); pContentInfo->SetProfile(pProfile);





// Clean up and return.}CreateRecorderAndSink first calls the MFCreateFile function to create the output file. It calls the MFCreateASFMediaSink function to create a sink object and obtain the node’s IMediaSink interface and then calls QueryInterface to get the node’s IMFASFContentInfo interface, which is used to configure the node.To configure the sink node, CreateRecorderAndSink:1. Calls the MFCreateASFProfile function to create an Advanced System Format (ASF)

profile object.2. Passes the output media type to the profile object’s IMFASFProfile::CreateStream

method to create an ASF stream configuration object.3. Calls the stream configuration object’s IMFASFStreamConfig::SetStreamNumber

method to set the stream number.4. Calls the profile object’s IMFASFProfile::SetStream method to add the stream to the

profile.5. Calls the sink object’s IMFASFContentInfo::SetProfile method to configure the sink.

Create the Source ObjectThe source node obtains an audio stream from the Kinect sensor’s microphone array and passes it to the MSRKinectAudio DMO node for further processing.Enumerate Capture DevicesIn general, a system can have multiple capture devices, which are identified by a zero-based device index. MFRecord calls GetMicArrayDeviceIndex to get the index that is assigned to the Kinect sensor’s microphone array, as follows:HRESULT GetMicArrayDeviceIndex(int *piDevice){ HRESULT hr = S_OK; UINT index, dwCount; IMMDeviceEnumerator* spEnumerator; IMMDeviceCollection* spEndpoints;

*piDevice = -1;

CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&spEnumerator); spEnumerator->EnumAudioEndpoints(eCapture, DEVICE_STATE_ACTIVE, &spEndpoints); ...}GetMicArrayDeviceIndex then does the following:1. Creates a device enumerator object and gets a pointer to its IMMDeviceEnumerator

interface.2. Enumerates the system’s capture devices by calling the enumerator object’s

IMMDeviceEnumerator::EnumAudioEndpoints method, which enumerates the specified types of audio endpoints.



http://msdn.microsoft.com/en-us/library/ms699846









The EnumAudioEndpoints parameter values are as follows: A value from the EDataFlow enumeration that indicates the device type.

eCapture directs EnumAudioEndpoints to enumerate only capture devices. A DEVICE_STATE_XXX constant that specifies which device states to enumerate.

DEVICE_STATE_ACTIVE directs EnumAudioEndpoints to enumerate only active devices.

The address of an IMMDeviceCollection interface pointer that contains the enumerated capture devices.

Determine the Device IndexGetMicArrayDeviceIndex then determines the Kinect sensor’s microphone array device index, as follows:HRESULT GetMicArrayDeviceIndex(int *piDevice){ ... spEndpoints->GetCount(&dwCount));

for (index = 0; index < dwCount; index++) { IMMDevice* spDevice;

spEndpoints->Item(index, &spDevice);

GUID subType = {0}; GetJackSubtypeForEndpoint(spDevice, &subType); if (subType == KSNODETYPE_MICROPHONE_ARRAY) { *piDevice = index; break; } } ... // Clean up and return}To determine the device index, GetMicArrayDeviceIndex:1. Calls the IMMDeviceCollection::GetCount method to determine the number of devices

in the collection.2. Calls the IMMDeviceCollection::Item method for each capture device to get its

IMMDevice interface.3. For each capture device, passes the IMMDevice interface to the private

GetJackSubtypeForEndpoint method to determine the device subtype.The KSNODETYPE_MICROPHONE_ARRAY subtype corresponds to a microphone array, presumably belonging to the Kinect sensor.

When GetMicArrayDeviceIndex finds this subtype, it returns the associated device index to _tmain.GetJackSubtypeForEndpoint determines the device’s subtype, as follows:HRESULT GetJackSubtypeForEndpoint(IMMDevice* pEndpoint, GUID* pgSubtype)








{ ... IDeviceTopology* spEndpointTopology; IConnector* spPlug; IConnector* spJack; IPart* spJackAsPart;

pEndpoint->Activate(__uuidof(IDeviceTopology), CLSCTX_INPROC_SERVER, NULL, (void**)&spEndpointTopology);

spEndpointTopology->GetConnector(0, &spPlug); spPlug->GetConnectedTo(&spJack); spJack->QueryInterface(__uuidof(IPart), (void**)&spJackAsPart); hr = spJackAsPart->GetSubType(pgSubtype); ...}To determine a capture device’s subtype, you must determine what the capture device is connected to and query that connector for the capture device’s subtype. GetJackSubtypeForEndpoint calls the following:1. The IMMDevice::Activate method to get the object’s IDeviceTopology interface.2. The IDeviceTopology::GetConnector method to get the device’s connector.3. IConnector::GetConnectedTo to determine what the connector from step 2 is

connected to.4. QueryInterface on the object from step 3 to get its IPart interface.5. The IPart::GetSubType method to get the capture device’s subtype globally unique

identifier (GUID).Create a Media Source ObjectMFRecord calls CreateSourceFromAudioDevice to create a media source object for the Kinect sensor’s microphone array, as follows:HRESULT CreateSourceFromAudioDevice(UINT32 iDevice, IMFMediaSource **ppSource){ IMFAttributes* pAttrs = NULL; IMFActivate **ppDevices = NULL; UINT32 count = 0;

MFCreateAttributes(&pAttrs, 1);

pAttrs->SetGUID(MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE, MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_AUDCAP_GUID);

MFEnumDeviceSources(pAttrs, &ppDevices, &count);

ppDevices[iDevice]->ActivateObject(IID_PPV_ARGS(ppSource)); ...// Clean up and return IMFMediaSource}








CreateSourceFromAudioDevice calls the following:1. The MFCreateAttributes function to create an empty attribute store.2. The IMFAttributes::SetGUID method to specify the source type as an audio capture

device.3. The MFEnumDeviceSources function, which enumerates the capture sources and

returns an array of IMFActivate interface pointers, which are indexed by the device index.

4. The IMFActivate::ActivateObject method, which creates a media source object to represent the Kinect sensor’s microphone array and returns a pointer to the source object’s IMFMediaSource interface.

Create the TopologyAfter creating and configuring the required nodes, MFRecord “snaps” them together to form the topology by calling the private CreateTopology method. MFRecord passes in the interfaces that represent the four nodes and the address of a pointer to an IMFTopology interface that will represent the completed topology, as follows:HRESULT CreateTopology(IMFMediaSource *pSource, IMFTransform* pFilter, IMFTransform *pEncoder, IMFMediaSink *pSink, IMFTopology **ppTopology){ IMFTopology *pTopology = NULL; IMFPresentationDescriptor *pPD = NULL; IMFStreamDescriptor *pSD = NULL;

DWORD cStreams = 0;

MFCreateTopology(&pTopology); pSource->CreatePresentationDescriptor(&pPD); pPD->GetStreamDescriptorCount(&cStreams); ... for (DWORD iStream = 0; iStream < cStreams; iStream++) { pPD->GetStreamDescriptorByIndex(iStream, &fSelected, &pSD); if (!fSelected) { continue; } GetStreamMajorType(pSD, &majorType); if (majorType != MFMediaType_Audio) { pPD->DeselectStream(iStream); continue; } CreateTopologyBranch(pTopology, pSource, pPD, pSD, pFilter, pEncoder, pSink)); break; } // Clean up and return IMFTopology}





http://msdn.microsoft.com/en-us/library/bb970530(v=VS.85).aspx



CreateTopology creates a topology object by calling the MFCreateTopology function, which returns the object’s IMFTopology interface. CreateTopology then:1. Calls the source object’s IMFMediaSource::CreatePresentationDescriptor method to

retrieve the presentation descriptor.2. Calls the stream descriptor’s IMFPresentationDescriptor::GetStreamDescriptorCount

method to get the number of stream descriptors.3. Iterates through the stream descriptors until it reaches the one for the Kinect sensor’s

microphone array.CreateTopology then calls the private CreateTopologyBranch method to assemble the topology. CreateTopology passes in the interfaces that represent the four nodes, the presentation and stream descriptors, and the topology’s IMFTopology interface, as follows:HRESULT CreateTopologyBranch( IMFTopology *pTopology, IMFMediaSource *pSource, IMFPresentationDescriptor *pPD, IMFStreamDescriptor *pSD, IMFTransform *pFilter, IMFTransform *pEncoder, IMFMediaSink *pSink ){

IMFTopologyNode *pSourceNode = NULL; IMFTopologyNode *pOutputNode = NULL; IMFTopologyNode *pFilterTransformNode = NULL; IMFTopologyNode *pEncoderTransformNode = NULL;

CreateSourceNode(pSource, pPD, pSD, &pSourceNode); CreateOutputNode(pSink, 0, &pOutputNode); CreateTransformNode(pFilter, &pFilterTransformNode); CreateTransformNode(pEncoder, &pEncoderTransformNode);

pTopology->AddNode(pSourceNode); pTopology->AddNode(pOutputNode); pTopology->AddNode(pFilterTransformNode); pTopology->AddNode(pEncoderTransformNode);

pSourceNode->ConnectOutput(0, pFilterTransformNode, 0); pFilterTransformNode->ConnectOutput(0, pEncoderTransformNode, 0); pEncoderTransformNode->ConnectOutput(0, pOutputNode, 0); ... // Clean up and return}From the topology’s perspective, the source, sink, and transform objects are packaged as node objects, which are each represented by an IMFTopologyNode interface. To create the topology, CreateTopologyBranch calls the following:1. Several private methods to create the nodes, as discussed in the following sections.2. The IMFTopology::AddNode method to add the nodes to the topology.







3. The IMFTopologyNode::ConnectOutput method on the source node and passes in the MSRKinectAudio node’s IMFTopologyNode interface.

4. The MSRKinectAudio DMO node’s ITopologyNode::ConnectOutput method and passes in the WMA encoder’s IMFTransform interface.

5. The WMA encoder node’s ITopologyNode::ConnectOutput method and passes in the sink node’s IMFMediaSink interface.

Create the Source NodeCreateSourceNode creates the topology’s source node, as follows:HRESULT CreateSourceNode( IMFMediaSource *pSource, IMFPresentationDescriptor *pPD, IMFStreamDescriptor *pSD, IMFTopologyNode **ppNode ){ IMFTopologyNode *pNode = NULL;

MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);

pNode->SetUnknown(MF_TOPONODE_SOURCE, pSource); pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, pPD); pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, pSD);

*ppNode = pNode; (*ppNode)->AddRef(); ... // Clean up and return}CreateSourceNode creates a new source node by calling the MFCreateTopologyNode function and passing in an MF_TOPOLOGY_SOURCESTREAM_NODE value from the MF_TOPOLOGY_TYPE enumeration. CreateSourceNode then configures the node by calling IMFTopologyNode::SetUnknown to set the following attributes: The MF_TOPONODE_SOURCE attribute specifies the node’s source object. The MF_TOPONODE_PRESENTATION_DESCRIPTOR attribute specifies the node’s

presentation descriptor. The MF_TOPONODE_STREAM_DESCRIPTOR attribute specifies the node’s stream

descriptor.CreateSourceNode then calls AddRef on the node object to hold a reference to the object, which ensures that the object is not deleted until the application is finished.Create the Transform NodesCreateTransformNode creates the DMO and WMA encoder nodes, as follows:HRESULT CreateTransformNode(IMFTransform *pMft, IMFTopologyNode **ppNode){ IMFTopologyNode *pNode = NULL;

MFCreateTopologyNode(MF_TOPOLOGY_TRANSFORM_NODE, &pNode); pNode->SetObject(pMft);









*ppNode = pNode; (*ppNode)->AddRef(); ... // Clean up and return}CreateTransformNode creates a new transform node by calling the MFCreateTopologyNode function with the first parameter set to an MF_TOPOLOGY_TRANSFORM_NODE value from the MF_TOPOLOGY_TYPE enumeration. CreateTransformNode then calls the IMFTopologyNode::SetObject method to specify the transform object and calls AddRef to hold a reference on the node object.Create the Sink NodeCreateOutputNode creates the sink node, as follows:HRESULT CreateOutputNode(IMFMediaSink *pSink, DWORD iStream, IMFTopologyNode **ppNode){ IMFTopologyNode *pNode = NULL; IMFStreamSink *pStream = NULL;

pSink->GetStreamSinkByIndex(iStream, &pStream); MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pNode); pNode->SetObject(pStream);

*ppNode = pNode; (*ppNode)->AddRef(); ... // Clean up and return}CreateTransformNode first calls IMFMediaSink and passes in the stream index to get the associated IMFStreamSink interface. CreateTransformNode creates a new sink node by calling the MFCreateTopologyNode function with the first parameter set to an MF_TOPOLOGY_OUTPUT_NODE value from the MF_TOPOLOGY_TYPE enumeration. CreateTransformNode then calls the IMFTopologyNode::SetObject method to specify the stream sink object and calls AddRef to hold a reference on the node object.

Capture the Audio StreamAs shown in the following code example, MFRecord calls the private RunMediaSession method to run a Media Foundation session to capture the Kinect sensor’s audio stream by using the topology that was created in the previous sections. The session runs until the user stops it by entering ‘s’ or ‘S’.HRESULT RunMediaSession(IMFTopology *pTopology){ IMFMediaSession *pSession = NULL; IMFMediaEvent *pEvent = NULL; PROPVARIANT varStartPosition;

PropVariantInit(&varStartPosition);

MFCreateMediaSession(NULL, &pSession); pSession->SetTopology(0, pTopology); ...}









RunMediaSession starts by calling the MFCreateMediaSession function to create a session object and return the object’s IMFMediaSession interface. RunMediaSession then calls the IMFMediaSession::SetTopology method to specify the topology to be used for the session.The session object maintains a queue of events. The session itself is handled by a while loop that retrieves the top event in the queue and responds appropriately, as follows:HRESULT RunMediaSession(IMFTopology *pTopology){ ... while (bGetAnotherEvent) { MediaEventType meType = MEUnknown;

MF_TOPOSTATUS TopoStatus = MF_TOPOSTATUS_INVALID;

pSession->GetEvent(0, &pEvent); pEvent->GetStatus(&hrEventStatus); pEvent->GetType(&meType); ...}The loop first calls the IMFMediaSession::GetEvent method to get the top event from the queue and then calls the event object’s IMFMediaEvent::GetStatus and IMFMediaEvent::GetType methods to get the event’s status and type, respectively.The following switch handles the various event types and prints notifications to the console window for all events, but that code is omitted here for brevity:HRESULT RunMediaSession(IMFTopology *pTopology){ ... switch (meType) { case MESessionTopologySet: break;

case MESessionTopologyStatus: pEvent->GetUINT32(MF_EVENT_TOPOLOGY_STATUS, (UINT32*)&TopoStatus); switch (TopoStatus) { case MF_TOPOSTATUS_READY: hr = pSession->Start(&GUID_NULL, &varStartPosition); break; case MF_TOPOSTATUS_ENDED: break; } break; case MESessionStarted: while(TRUE) { int ch = _getch(); if (ch == 's' || ch == 'S') {








hr = pSession->Stop(); break; } } break; ... }}

case MESessionEnded: hr = pSession->Stop(); break;

case MESessionStopped: hr = pSession->Close(); break; case MESessionClosed: bGetAnotherEvent = FALSE; break; default: break; }

SAFE_RELEASE(pEvent); }

exit: if (pSession != NULL) { pSession->Shutdown(); } PropVariantClear(&varStartPosition); ... // Clean up and return}There are a number of possible Media Foundation events. The loop handles the events in the following ways: MESessionTopologySet occurs after the topology has been resolved and is ready for

playback.The loop prints a notification to the console window, but takes no action.

MESessionTopologyStatus occurs when the topology status changes.The loop calls the IMFMediaEvent::GetUINT32 method to get the status code, which is one of the following: MF_TOPOSTATUS_READY indicates that the topology is ready, and the loop

calls the IMFMediaSession::Start method to start the session. MF_TOPOSTATUS_ENDED indicates that capture is complete.

The session is not over until the session object raises an MESessionEnded event, so the loop simply prints a notification to the console window.



http://msdn.microsoft.com/en-us/library/aa372148(v=VS.85).aspx




MESessionStarted indicates that the session has started.While the capture session runs on a background thread, the loop waits until the user enters ‘s’ or ‘S’ and then calls the IMFMediaSession::Stop method to stop the session.

MESessionEnded is raised after the past presentation in the queue has been handled.The loop calls IMFMediaSession::Stop to stop the session.

MESessionStopped indicates that IMFMediaSession::Stop has completed and the session has stopped.The loop calls the IMFMediaSession::Close method to close the session and release resources.

MESessionClosed indicates that the session has been closed.The loop sets the loop variable to FALSE, which terminates the loop.

RunMediaSession then cleans up, and the application terminates.

ResourcesThe following topics on the MSDN website provide additional information about Windows audio and related topics:

Multiple Channel Audio Data and WAVE FilesProgramming DirectX with COM

For more information about implementing audio and related samples, see the documentation and samples contained within the Kinect for Windows SDK.

http://msdn.microsoft.com/en-us/library/ee419007(v=VS.85).aspx

http://msdn.microsoft.com/en-us/windows/hardware/gg463006.aspx







introductiondownload.microsoft.com/.../mfaudio_walkthrough.docx · web viewthe topology usually...

Documents