RIFF WAVE (.WAV) file format
----------------------------

The following is taken from RIFFMCI.RTF, "Multimedia Programming Interface
and Data Specification v1.0", a Windows RTF (Rich Text Format) file contained
in the .zip file, RMRTF.ZRT.  The original document is quite long and this
constitutes pages 83-95 of the text format version (starting on roughly
page 58 of the RTF version). 


About the RIFF Tagged File Format

     
     RIFF (Resource Interchange File Format) is the tagged file structure
     developed for multimedia resource files.  The structure of a RIFF file
     is similar to the structure of an Electronic Arts IFF file. RIFF is
     not actually a file format itself (since it does not represent a
     specific kind of information), but its name contains the words
     ``interchange file format'' in recognition of its roots in IFF. Refer
     to the EA IFF definition document, EA IFF 85 Standard for Interchange
     Format Files, for a list of reasons to use a tagged file format.
     
     RIFF has a counterpart, RIFX, that is used to define RIFF file formats
     that use the Motorola integer byte- ordering format rather than the
     Intel format. A RIFX file is the same as a RIFF file, except that the
     first four bytes are `RIFX' instead of `RIFF', and integer byte
     ordering is represented in Motorola format.

Notation Conventions

     
     The following table lists some of the notation conventions used in
     this document. Further conventions and the notation for documenting
     RIFF forms are presented later in the document in the section
     ``Notation for Representing Sample RIFF Files.''
     
                        
     Notation               Description                            
     <element label>        RIFF file element with the label       
                            ``element label''                      
     <element label: TYPE>  RIFF file element with data type                   
			    ``TYPE''                               
     [<element label>] 	    Optional RIFF file element             

     <element label>...     One or more copies of the              
 			    specified element                      

     [<element label>]...   Zero or more copies of the             
 			    specified element                            
                        

Chunks

     
     The basic building block of a RIFF file is called a
     chunk. Using C syntax, a chunk can be defined as
     follows:
      
      typedef unsigned long DWORD;
      typedef unsigned char BYTE;
      
      typedef DWORD FOURCC;        // Four-character code
      
      typedef FOURCC CKID;         // Four-character-code chunk identifier
      typedef DWORD CKSIZE;        // 32-bit unsigned size
      value
      
      typedef struct {             // Chunk structure
          CKID     ckID;           // Chunk type identifier
          CKSIZE   ckSize;         // Chunk size field (size of ckData)
          BYTE     ckData[ckSize]; // Chunk data
      } CK;
     
     A FOURCC is represented as a sequence of one to four ASCII
     alphanumeric characters, padded on the right with blank characters
     (ASCII character value 32) as required, with no embedded blanks.
     
     For example, the four-character code `FOO' is stored as
     a sequence of four bytes: 'F', 'O', 'O', ' ' in
     ascending addresses. For quick comparisons, a four-
     character code may also be treated as a 32-bit number.
     
     The three parts of the chunk are described in the
     following table:
     
               
     Part      Description
     ckID      A four-character code that identifies the
               representation of the chunk data data. A
               program reading a RIFF file can skip over
               any chunk whose chunk ID it doesn't
               recognize; it simply skips the number of
               bytes specified by ckSize plus the pad
               byte, if present.
     ckSize    A 32-bit unsigned value identifying the
               size of ckData. This size value does not
               include the size of the ckID or ckSize
               fields or the pad byte at the end of
               ckData.
     ckData    Binary data of fixed or variable size. The
               start of ckData is word-aligned with
               respect to the start of the RIFF file. If
               the chunk size is an odd number of bytes, a
               pad byte with value zero is written after
               ckData. Word aligning improves access speed
               (for chunks resident in memory) and
               maintains compatibility with EA IFF. The
               ckSize value does not include the pad byte.
               
     
     We can represent a chunk with the following notation
     (in this example, the ckSize and pad byte are
     implicit):
      
      <ckID> ( <ckData> )
     
     Two types of chunks, the `LIST' and `RIFF' chunks, may
     contain nested chunks, or subchunks. These special
     chunk types are discussed later in this document. All
     other chunk types store a single element of binary data
     in <ckData>.


     Using the notation for representing a chunk, a RIFF form looks like
     the following:
      
      RIFF ( <formType> <ck>... )
     
     The first four bytes of a RIFF form make up a chunk ID with values
     `R', `I', `F', `F'. The ckSize field is required, but for simplicity
     it is omitted from the notation.
     
     The first DWORD of chunk data in the `RIFF' chunk (shown above as
     <formType>) is a four-character code value identifying the data
     representation, or form type, of the file. Following the form-type
     code is a series of subchunks. Which subchunks are present depends on
     the form type.



          Waveform Audio File Format (WAVE)


               This section describes the Waveform format, which is used to
               represent digitized sound.

               The WAVE form is defined as follows. Programs must expect
               (and ignore) any unknown chunks encountered, as with all
               RIFF forms. However, <fmt-ck> must always occur before
               <wave-data>, and both of these chunks are mandatory in a
               WAVE file.

                <WAVE-form> ->
                      RIFF( 'WAVE'
                           <fmt-ck>               // Format
                           [<fact-ck>]            // Fact chunk
                           [<cue-ck>]             // Cue points
                           [<playlist-ck>]        // Playlist
                           [<assoc-data-list>]    // Associated data list
                           <wave-data>   )        // Wave data

               The WAVE chunks are described in the following sections.


          WAVE Format Chunk


               The WAVE format chunk <fmt-ck> specifies the format of the
               <wave-data>. The <fmt-ck> is defined as follows:

                <fmt-ck> ->   fmt( <common-fields>
                                 <format-specific-fields> )

                <common-fields> ->
                      struct
                      {
                         WORD wFormatTag;         // Format category
                         WORD wChannels;          // Number of channels
                         DWORDdwSamplesPerSec;    // Sampling rate
                         DWORDdwAvgBytesPerSec;   // For buffer estimation
                         WORD wBlockAlign;        // Data block size
                      }

               The fields in the <common-fields> chunk are as follows:



               Field             Description
				
               wFormatTag     	 A number indicating the WAVE format
                              	 category of the file. The content of
                              	 the <format-specific-fields> portion
                              	 of the `fmt' chunk, and the
                              	 interpretation of the waveform data,
                              	 depend on this value.
				
                              	 You must register any new WAVE format
                              	 categories. See ``Registering
                              	 Multimedia Formats'' in Chapter 1,
                              	 ``Overview of Multimedia
                              	 Specifications,'' for information on
                              	 registering WAVE format categories.
				
                              	 ``Wave Format Categories,'' following
                              	 this section, lists the currently
                              	 defined WAVE format categories.
				
               wChannels      	 The number of channels represented in
                              	 the waveform data, such as 1 for mono
                              	 or 2 for stereo.
				
               dwSamplesPerSec   The sampling rate (in samples per
                              	 second) at which each channel should
				 be played.

               dwAvgBytesPerSec  The average number of bytes per second
                              	 at which the waveform data should be
                              	 transferred. Playback software can
				 estimate the buffer size using this value.
                              	
               wBlockAlign 	 The block alignment (in bytes) of the
                              	 waveform data. Playback software needs
                              	 to process a multiple of wBlockAlign
                              	 bytes of data at a time, so the value
				 of wBlockAlign can be used for buffer
				 alignment.

               The <format-specific-fields> consists of zero or more bytes
               of parameters. Which parameters occur depends on the WAVE
               format category-see the following section for details.
               Playback software should be written to allow for (and
               ignore) any unknown <format-specific-fields> parameters that
               occur at the end of this field.



          WAVE Format Categories


               The format category of a WAVE file is specified by the value
               of the wFormatTag field of the `fmt' chunk. The
               representation of data in <wave-data>, and the content of
               the <format-specific-fields> of the `fmt' chunk, depend on
               the format category.

               The currently defined open non-proprietary WAVE format
               categories are as follows:



               wFormatTag Value         Format Category


               WAVE_FORMAT_PCM (0x0001) Microsoft Pulse Code
                                        Modulation (PCM) format



               The following are the registered proprietary WAVE format
               categories:



               wFormatTag Value         Format Category


               IBM_FORMAT_MULAW         IBM mu-law format
               (0x0101)

               IBM_FORMAT_ALAW (0x0102) IBM a-law format

               IBM_FORMAT_ADPCM         IBM AVC Adaptive
               (0x0103)                 Differential Pulse Code
                                        Modulation format



               The following sections describe the Microsoft
               WAVE_FORMAT_PCM format.


               Pulse Code Modulation (PCM) Format


               If the wFormatTag field of the <fmt-ck> is set to
               WAVE_FORMAT_PCM, then the waveform data consists of samples
               represented in pulse code modulation (PCM) format. For PCM
               waveform data, the <format-specific-fields> is defined as
               follows:

                <PCM-format-specific> ->
                      struct
                      {
                         WORD wBitsPerSample;      // Sample size
                      }

               The wBitsPerSample field specifies the number of bits of
               data used to represent each sample of each channel. If there
               are multiple channels, the sample size is the same for each
               channel.

               For PCM data, the wAvgBytesPerSec field of the `fmt' chunk
               should be equal to the following formula rounded up to the
               next whole number:

                                              wBitsPerSample
                 wChannels x wBitsPerSecond x --------------
                                                     8

               The wBlockAlign field should be equal to the following
               formula, rounded to the next whole number:

                             wBitsPerSample
                 wChannels x --------------
                                    8

               Data Packing for PCM WAVE Files

               In a single-channel WAVE file, samples are stored
               consecutively. For stereo WAVE files, channel 0 represents
               the left channel, and channel 1 represents the right
               channel. The speaker position mapping for more than two
               channels is currently undefined. In multiple-channel WAVE
               files, samples are interleaved.

               The following diagrams show the data packing for a 8-bit
               mono and stereo WAVE files:


                     Sample 1     Sample 2     Sample 3    Sample 4


                     Channel 0    Channel 0   Channel 0    Channel 0



                             Data Packing for 8-Bit Mono PCM



                            Sample 1                 Sample 2

                     Channel 0    Channel 1   Channel 0    Channel 0
                      (left)       (right)      (left)      (right)



                            Data Packing for 8-Bit Stereo PCM



               The following diagrams show the data packing for 16-bit mono
               and stereo WAVE files:


                            Sample 1                 Sample 2

                     Channel 0    Channel 0   Channel 0    Channel 0

                     low-order   high-order   low-order   high-order
                       byte         byte         byte        byte


                             Data Packing for 16-Bit Mono PCM



                                        Sample 1

                     Channel 0    Channel 0   Channel 1    Channel 1
                      (left)       (left)      (right)      (right)
                     low-order   high-order   low-order   high-order
                       byte         byte         byte        byte


                            Data Packing for 16-Bit Stereo PCM



               Data Format of the Samples

               Each sample is contained in an integer i. The size of i is
               the smallest number of bytes required to contain the
               specified sample size. The least significant byte is stored
               first. The bits that represent the sample amplitude are
               stored in the most significant bits of i, and the remaining
               bits are set to zero.

               For example, if the sample size (recorded in nBitsPerSample)
               is 12 bits, then each sample is stored in a two-byte
               integer. The least significant four bits of the first (least
               significant) byte is set to zero.

               The data format and maximum and minimums values for PCM
               waveform samples of various sizes are as follows:



               Sample Size  Data Format Maximum Value  Minimum Value


               One to       Unsigned    255 (0xFF)     0
               eight bits   integer

               Nine or      Signed      Largest        Most negative
               more bits    integer i   positive       value of i
                                        value of i


               For example, the maximum, minimum, and midpoint values for
               8-bit and 16-bit PCM waveform data are as follows:

               Format       Maximum     Minimum Value  Midpoint
                            Value                      Value


               8-bit PCM    255 (0xFF)  0              128 (0x80)

               16-bit PCM   32767       -32768         0
                            (0x7FFF)    (-0x8000)


               Examples of PCM WAVE Files

               Example of a PCM WAVE file with 11.025 kHz sampling rate,
               mono, 8 bits per sample:

                RIFF( 'WAVE'     fmt(1, 1, 11025, 11025, 1, 8)
                              data( <wave-data> ) )

               Example of a PCM WAVE file with 22.05 kHz sampling rate,
               stereo, 8 bits per sample:

                RIFF( 'WAVE'     fmt(1, 2, 22050, 44100, 2, 8)
                              data( <wave-data> ) )

               Example of a PCM WAVE file with 44.1 kHz sampling rate,
               mono, 20 bits per sample:

                RIFF( 'WAVE'     INFO(INAM("O Canada"Z))
                              fmt(1, 1, 44100, 132300, 3, 20)
                              data( <wave-data> ) )


          Storage of WAVE Data


               The <wave-data> contains the waveform data. It is defined as
               follows:

                <wave-data>  ->  { <data-ck> : <data-list> }

                <data-ck>    ->  data( <wave-data> )

                <wave-list>  ->  LIST( 'wavl' { <data-ck> :  // Wave samples
                                           <silence-ck> }... )   // Silence

                <silence-ck> ->  slnt( <dwSamples:DWORD> )   // Count of
                                                             // silent samples

               Note:  The `slnt' chunk represents silence, not necessarily
               a repeated zero volume or baseline sample. In 16-bit PCM
               data, if the last sample value played before the silence
               section is a 10000, then if data is still output to the D to
               A converter, it must maintain the 10000 value. If a zero
               value is used, a click may be heard at the start and end of
               the silence section. If play begins at a silence section,
               then a zero value might be used since no other information
               is available. A click might be created if the data following
               the silent section starts with a nonzero value.


          FACT Chunk


               The <fact-ck> fact chunk stores important information about
               the contents of the WAVE file. This chunk is defined as
               follows:

                <fact-ck> -> fact( <dwFileSize:DWORD> )     // Number
                                                            // of samples

               The `fact'' chunk is required if the waveform data is
               contained in a `wavl'' LIST chunk and for all compressed
               audio formats. The chunk is not required for PCM files using
               the `data'' chunk format.

               The "fact" chunk will be expanded to include any other
               information required by future WAVE formats. Added fields
               will appear following the <dwFileSize> field. Applications
               can use the chunk size field to determine which fields are
               present.


          Cue-Points Chunk


               The <cue-ck> cue-points chunk identifies a series of
               positions in the waveform data stream. The <cue-ck> is
               defined as follows:

                <cue-ck> ->   cue( <dwCuePoints:DWORD>    // Count of cue points
                                   <cue-point>... )       // Cue-point
                table

                <cue-point> ->   struct {
                                 DWORD  dwName;
                                 DWORD  dwPosition;
                                 FOURCC fccChunk;
                                 DWORD  dwChunkStart;
                                 DWORD  dwBlockStart;
                                 DWORD  dwSampleOffset;
                              }

               The <cue-point> fields are as follows:



               Field          Description

               dwName         Specifies the cue point name. Each
                              <cue-point> record must have a unique
                              dwName field.

               dwPosition     Specifies the sample position of the
                              cue point. This is the sequential
                              sample number within the play order.
                              See ``Playlist Chunk,'' later in this
                              document, for a discussion of the play
                              order.

               fccChunk       Specifies the name or chunk ID of the
                              chunk containing the cue point.

               dwChunkStart   Specifies the file position of the
                              start of the chunk containing the cue
                              point. This is a byte offset relative
                              to the start of the data section of
                              the `wavl' LIST chunk.

               dwBlockStart   Specifies the file position of the
                              start of the block containing the
                              position. This is a byte offset
                              relative to the start of the data
                              section of the `wavl' LIST chunk.

               dwSampleOffset Specifies the sample offset of the cue
                              point relative to the start of the
                              block.




               Examples of File Position Values


               The following table describes the <cue-point> field values
               for a WAVE file containing multiple `data' and `slnt' chunks
               enclosed in a `wavl' LIST chunk:



               Cue Point     Field         Value
               Location


               In a `slnt'   fccChunk      FOURCC value `slnt'.
               chunk

                             dwChunkStart  File position of the
                                           `slnt' chunk relative to
                                           the start of the data
                                           section in the `wavl' LIST
                                           chunk.

                             dwBlockStart  File position of the data
                                           section of the `slnt'
                                           chunk relative to the
                                           start of the data section
                                           of the `wavl' LIST chunk.

                             dwSampleOffs  Sample position of the cue
                             et            point relative to the
                                           start of the `slnt' chunk.

               In a PCM      fccChunk      FOURCC value `data'.
               `data' chunk

                             dwChunkStart  File position of the
                                           `data' chunk relative to
                                           the start of the data
                                           section in the `wavl' LIST
                                           chunk.

                             dwBlockStart  File position of the cue
                                           point relative to the
                                           start of the data section
                                           of the `wavl' LIST chunk.

                             dwSampleOffs  Zero value.
                             et

               In a          fccChunk      FOURCC value `data'.
               compressed
               `data' chunk

                             dwChunkStart  File position of the start
                                           of the `data' chunk
                                           relative to the start of
                                           the data section of the
                                           `wavl' LIST chunk.

                             dwBlockStart  File position of the
                                           enclosing block relative
                                           to the start of the data
                                           section of the `wavl' LIST
                                           chunk. The software can
                                           begin the decompression at
                                           this point.

                             dwSampleOffs  Sample position of the cue
                             et            point relative to the
                                           start of the block.



               The following table describes the <cue-point> field values
               for a WAVE file containing a single `data' chunk:

               Cue Point     Field         Value
               Location


               Within PCM    fccChunk      FOURCC value `data'.
               data

                             dwChunkStart  Zero value.

                             dwBlockStart  Zero value.

                             dwSampleOffs  Sample position of the cue
                             et            point relative to the
                                           start of the `data' chunk.

               In a          fccChunk      FOURCC value `data'.
               compressed
               `data' chunk

                             dwChunkStart  Zero value.

                             dwBlockStart  File position of the
                                           enclosing block relative
                                           to the start of the `data'
                                           chunk. The software can
                                           begin the decompression at
                                           this point.

                             dwSampleOffs  Sample position of the cue
                             et            point relative to the
                                           start of the block.



          Playlist Chunk


               The <playlist-ck> playlist chunk specifies a play order for
               a series of cue points. The <playlist-ck> is defined as
               follows:

                <playlist-ck> ->   plst(
                                 <dwSegments:DWORD>    // Count of play
                segments
                                 <play-segment>... )   // Play-segment
                table

                <play-segment> ->  struct {
                                   DWORD dwName;
                                   DWORD dwLength;
                                   DWORD dwLoops;
                                 }

               The <play-segment> fields are as follows:

               Field          Description


               dwName         Specifies the cue point name. This
                              value must match one of the names
                              listed in the <cue-ck> cue-point
                              table.

               dwLength       Specifies the length of the section in
                              samples.

               dwLoops        Specifies the number of times to play
                              the section.




          Associated Data Chunk


               The <assoc-data-list> associated data list provides the
               ability to attach information like labels to sections of the
               waveform data stream. The <assoc-data-list> is defined as
               follows:

                <assoc-data-list> ->  LIST('adtl'
                                        <labl-ck>                // Label
                                        <note-ck>                // Note
                                        <ltxt-ck>                // Text
                with data length
                                        <file-ck> )              // Media
                file

                <labl-ck> ->       labl(<dwName:DWORD>
                                        <data:ZSTR> )

                <note-ck> ->       note(<dwName:DWORD>
                                        <data:ZSTR> )

                <ltxt-ck> ->       ltxt(<dwName:DWORD>
                                        <dwSampleLength:DWORD>
                                        <dwPurpose:DWORD>
                                        <wCountry:WORD>
                                        <wLanguage:WORD>
                                        <wDialect:WORD>
                                        <wCodePage:WORD>
                                        <data:BYTE>... )

                <file-ck> ->       file(<dwName:DWORD>
                                        <dwMedType:DWORD>
                                        <fileData:BYTE>...)




               Label and Note Information


               The `labl' and `note' chunks have similar fields. The `labl'
               chunk contains a label, or title, to associate with a cue
               point. The `note' chunk contains comment text for a cue
               point. The fields are as follows:



               Field          Description


               dwName         Specifies the cue point name.  This
                              value must match one of the names
                              listed in the <cue-ck> cue-point
                              table.

               data           Specifies a NULL-terminated string
                              containing a text label (for the
                              `labl' chunk) or comment text (for the
                              `note' chunk).




               Text with Data Length Information


               The ``ltxt'' chunk contains text that is associated with a
               data segment of specific length. The chunk fields are as
               follows:



               Field          Description


               dwName         Specifies the cue point name.  This
                              value must match one of the names
                              listed in the <cue-ck> cue-point
                              table.

               dwSampleLength Specifies the number of samples in the
                              segment of waveform data.

               dwPurpose      Specifies the type or purpose of the
                              text. For example, dwPurpose can
                              specify a FOURCC code like `scrp' for
                              script text or `capt' for close-
                              caption text.

               wCountry       Specifies the country code for the
                              text. See ``Country Codes'' in Chapter
                              2, ``Resource Interchange File
                              Format,'' for a current list of
                              country codes.

               wLanguage,     Specify the language and dialect codes
               wDialect       for the text. See ``Language and
                              Dialect Codes'' in Chapter 2,
                              ``Resource Interchange File Format,''
                              for a current list of language and
                              dialect codes.

               wCodePage      Specifies the code page for the text.




               Embedded File Information


               The `file' chunk contains information described in other
               file formats (for example, an `RDIB' file or an ASCII text
               file). The chunk fields are as follows:



               Field          Description


               dwName         Specifies the cue point name.  This
                              value must match one of the names
                              listed in the <cue-ck> cue-point
                              table.

               dwMedType      Specifies the file type contained in
                              the fileData field. If the fileData
                              section contains a RIFF form, the
                              dwMedType field is the same as the
                              RIFF form type for the file.

                              This field can contain a zero value.

               fileData       Contains the media file.
 