Matroska Media Container Codec Specifications

Introduction Matroska is a multimedia container format. It stores interleaved and timestamped audiovisual data using various codecs. To interpret the codec data, a mapping between the way the data is stored in Matroska and how it is understood by such a codec is necessary. This document defines this mapping for many commonly used codecs in Matroska.

Notation and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Codec Mappings A Codec Mapping is a set of attributes to identify, name, and contextualize the format and characteristics of encoded data that can be contained within Matroska Clusters. Each TrackEntry used within Matroska MUST reference a defined Codec Mapping using the CodecID to identify and describe the format of the encoded data in its associated Clusters. This CodecID is an identifier, uniquely defined in the Matroska Codec IDs Registry , that represents the encoding stored within the Track. Certain encodings MAY also require some form of codec initialization to provide its decoder with context and technical metadata. The intention behind this list is not to list all existing audio and video codecs, but rather to list those codecs that are currently supported in Matroska and therefore need a well defined CodecID so that all developers supporting Matroska will use the same CodecID. If you feel we missed support for a very important codec, please tell us on our development mailing list (cellar at ietf.org).

Defining Matroska Codec Support Support for a codec is defined in Matroska with the following values.

Codec ID Each codec supported for storage in Matroska MUST have a unique CodecID. Each CodecID MUST be prefixed with the string from the following table according to the associated Track Type of the codec. All characters of a Codec ID Prefix MUST be capital letters (A-Z) except for the last character of a Codec ID Prefix which MUST be an underscore ("_"). Codec ID Prefix by Track Type

Track Type	Codec ID Prefix
Video (1)	"V_"
Audio (2)	"A_"
Complex (3)	"O_"
Logo (16)	"L_"
Subtitle (17)	"S_"
Button (18)	"B_"
Control (32)	"C_"
Metadata (33)	"M_"

Each CodecID MUST include a Major Codec ID immediately following the Codec ID Prefix. A Major Codec ID MAY be followed by an OPTIONAL Codec ID Suffix to communicate a refinement of the Major Codec ID. If a Codec ID Suffix is used, then the CodecID MUST include a forward slash ("/") as a separator between the Major Codec ID and the Codec ID Suffix. The Major Codec ID MUST be composed of only capital letters (A-Z) and numbers (0-9). The Codec ID Suffix MUST be composed of only capital letters (A-Z), numbers (0-9), underscore ("_"), and forward slash ("/"). The following table provides examples of valid Codec IDs and their components: Codec ID Components

Codec ID Prefix	Major Codec ID	Separator	Codec ID Suffix	Codec ID
A_	AAC	/	MPEG2/LC/SBR	A_AAC/MPEG2/LC/SBR
V_	MPEG4	/	ISO/ASP	V_MPEG4/ISO/ASP
V_	MPEG1			V_MPEG1

Codec Name Each encoding supported for storage in Matroska MUST have a Codec Name. The Codec Name provides a readable label for the encoding.

Description An optional description for the encoding. This value is only intended for human consumption.

Initialization Each encoding supported for storage in Matroska MUST have a defined Initialization. The Initialization MUST describe the storage of data necessary to initialize the decoder, which MUST be stored within the CodecPrivate element. When the Initialization is updated within a track, then that updated Initialization data MUST be written into the CodecState element of the first Cluster to require it. If the encoding does not require any form of Initialization, then none MUST be used to define the Initialization and the CodecPrivate element SHOULD NOT be written and MUST be ignored. If the TrackEntry contains a CodecPrivate element, its data MUST be provided to the decoder.

Codec BlockAdditions Additional data that contextualizes or supplements a Block can be stored within the BlockAdditional element of a BlockMore element . Each BlockAdditional is coupled with a BlockAddID that identifies the kind of data it contains. A BlockAddID of 1 means the data in the BlockAdditional element are tied to the codec. This BlockAdditional data with a BlockAddID of 1 MAY be passed to the associated decoder alongside the Block content . A codec definition MUST contain a "Codec BlockAdditions" section if the codec can use BlockAdditional data with a BlockAddID of 1. The BlockAddID values are defined in .

Citation Documentation of the associated normative and informative references for the codec is RECOMMENDED.

Superseded By When a Superseded By is set, the superseding CodecID value MUST be used instead of the superseded CodecID. Files MAY exist with the superseded CodecID and MAY be supported by Matroska Players.

Recommendations for the Creation of New Codec Mappings Creators of new Codec Mappings to be used in the context of Matroska:

MUST assume that all Codec Mappings they create might become standardized, public, commonly deployed, or usable across multiple implementations.
MUST employ meaningful values for CodecID and Codec Name that are not already included in the Matroska Codec IDs Registry, and are not otherwise known or suspected to be in use, even if they are not already registered.
MUST NOT prefix their CodecID with "X_" or similar constructs.

These recommendations are based on .

Video Codec Mappings All codecs described in this section MUST have a TrackType () value of "1" for video. The track using these codecs MUST contain a Video element -- i.e., EBML Path \Segment\Tracks\TrackEntry\Video. Most video codec contain meta information about the data they contain, like encoded width and height, chroma subsampling, etc. Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Video and \Segment\Tracks\TrackEntry elements. These values MUST be valid for the whole Segment.

V_AV1 Codec ID: V_AV1 Codec Name: Alliance for Open Media AV1 Video codec Description: Only one Sequence Header OBU, as defined in section 6.4 of , is supported per Matroska Segment. Each Block contains one Temporal Unit containing one or more OBUs. Each OBU stored in the Block MUST contain its header and its payload. The OBUs in the Block follow the Low Overhead Bitstream Format syntax. They MUST have the obu_has_size_field set to 1 except for the last OBU in the frame, for which obu_has_size_field MAY be set to 0, in which case it is assumed to fill the remainder of the frame. A SimpleBlock MUST NOT be marked as a Keyframe if it doesn't contain a Frame OBU. A SimpleBlock MUST NOT be marked as a Keyframe if the first Frame OBU doesn't have a frame_type of KEY_FRAME. A SimpleBlock MUST NOT be marked as a Keyframe if it doesn't contains a Sequence Header OBU. A Block inside a BlockGroup MUST use ReferenceBlock elements if the first Frame OBU in the Block has a frame_type other than KEY_FRAME. A Block inside a BlockGroup MUST use ReferenceBlock elements if the Block doesn't contain a Sequence Header OBU. A Block with frame_header_obu where the frame_type is INTRA_ONLY_FRAME MUST use a ReferenceBlock with a value of 0 to reference itself. Initialization: The CodecPrivate consists of the AV1CodecConfigurationRecord described in section 2.3 of . PixelWidth: MUST be max_frame_width_minus_1+1 of the Sequence Header OBU. PixelHeight: MUST be max_frame_height_minus_1+1 of the Sequence Header OBU.

V_AVS2 Codec ID: V_AVS2 Codec Name: AVS2-P2/IEEE.1857.4 Description: Individual pictures of AVS2-P2 stored as described in the second part of . Initialization: none

V_AVS3 Codec ID: V_AVS3 Codec Name: AVS3-P2/IEEE.1857.10 Description: Individual pictures of AVS3-P2 stored as described in the second part of . Initialization: none

V_CAVS Codec ID: V_CAVS Codec Name: AVS1-P2, JiZhun profile Description: Individual pictures of AVS1-P2 stored as described in . Initialization: none

V_DIRAC Codec ID: V_DIRAC Codec Name: BBC Dirac Description: A video codec developed by the BBC . The Intra-only version of Dirac, also known as Dirac Pro, resulted in SMPTE VC-2 . Each Matroska frame corresponds to a Sequence as defined in . Initialization: none

V_FFV1 Codec ID: V_FFV1 Codec Name: FF Video Codec 1 Description: FFV1 is a lossless intra-frame video encoding format designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. Initialization: For FFV1 versions 0 or 1, CodecPrivate SHOULD NOT be written. For FFV1 version 3 or greater, the CodecPrivate MUST contain the FFV1 Configuration Record structure, as defined in , and no other data.

V_JPEG2000 Codec ID: V_JPEG2000 Codec Name: JPEG 2000 Description: Each Matroska frame corresponds to a JPEG 2000 image, as defined in . Initialization: none

V_MJPEG Codec ID: V_MJPEG Codec Name: Motion JPEG Description: Motion JPEG is a video compression format in which each video frame or interlaced field is compressed separately as a image. Initialization: none

V_MPEGH/ISO/HEVC Codec ID: V_MPEGH/ISO/HEVC Codec Name: HEVC/H.265 Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of HEVC/H.265 stored as described in . Initialization: The CodecPrivate contains a HEVCDecoderConfigurationRecord structure, as defined in .

V_MPEGI/ISO/VVC Codec ID: V_MPEGI/ISO/VVC Codec Name: VVC/H.266 Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of VVC/H.266 stored as described in . Initialization: The CodecPrivate contains a VVCDecoderConfigurationRecord structure, as defined in .

V_MPEG1 Codec ID: V_MPEG1 Codec Name: MPEG 1 Description: Frames correspond to a Video Sequence as defined in . Initialization: none

V_MPEG2 Codec ID: V_MPEG2 Codec Name: MPEG 2 Description: Frames correspond to a Video Sequence as defined in . Initialization: none

V_MPEG4/ISO/AVC Codec ID: V_MPEG4/ISO/AVC Codec Name: AVC/H.264 Description: Individual pictures (which could be a frame, a field, or 2 fields having the same timestamp) of AVC/H.264 stored as described in . Initialization: The CodecPrivate contains an AVCDecoderConfigurationRecord structure, as defined in . For legacy reasons, because Block Additional Mappings are preferred; see , the AVCDecoderConfigurationRecord structure MAY be followed by an extension block beginning with a 4-byte extension block size field in big-endian byte order which is the size of the extension block minus 4 (excluding the size of the extension block size field) and a 4-byte field corresponding to a BlockAddIDType of "mvcC" followed by a content corresponding to the content of BlockAddIDExtraData for mvcC; see .

V_MPEG4/ISO/AP Codec ID: V_MPEG4/ISO/AP Codec Name: MPEG4 ISO Advanced Profile Description: Frames correspond to frames defined in . Stream was created via improved codec API (UCI) or transmuxed from MP4, not simply transmuxed from AVI. Note there are differences how b-frames are handled in these original streams, when being compared to a VfW created stream, as here there are no dummy frames inserted. Initialization: none

V_MPEG4/ISO/ASP Codec ID: V_MPEG4/ISO/ASP Codec Name: MPEG4 ISO Advanced Simple Profile (DivX5, XviD) Description: Frames correspond to frames defined in . Stream was created via improved codec API (UCI) or transmuxed from MP4, not simply transmuxed from AVI. Note there are differences how b-frames are handled in these original streams, when being compared to a VfW created stream, as here there are no dummy frames inserted. Initialization: none

V_MPEG4/ISO/SP Codec ID: V_MPEG4/ISO/SP Codec Name: MPEG4 ISO Simple Profile (DivX4) Description: Frames correspond to frames defined in . Stream was created via improved codec API (UCI) or even transmuxed from AVI (no b-frames in Simple Profile). Initialization: none

V_MPEG4/MS/V3 Codec ID: V_MPEG4/MS/V3 Codec Name: Microsoft MPEG4 V3 Description: Microsoft MPEG4 V3 and derivatives, means DivX3, Angelpotion, SMR, etc.; stream was created using VfW codec or transmuxed from AVI; note that V1/V2 are covered by the "V_MS/VFW/FOURCC" CodecID . Initialization: none

V_MS/VFW/FOURCC Codec ID: V_MS/VFW/FOURCC Codec Name: Microsoft Video Codec Manager (VCM) Description: Video frames originating from Video For Windows, using the Microsoft Video Codec Manager codecs. This is a codec designed to be transmuxed back and forth from AVI sources. Initialization: The CodecPrivate contains the VCM structure BITMAPINFOHEADER including the extra private bytes, as defined in . The data are stored in little-endian format (like on IA32 machines).

V_QUICKTIME Codec ID: V_QUICKTIME Codec Name: Video taken from QuickTime files Description: Several codecs as stored in QuickTime (e.g., Sorenson or Cinepak). Initialization: The CodecPrivate contains all additional data that is stored in the 'stsd' (sample description) atom in the QuickTime file after the mandatory video descriptor structure (starting with the size and FourCC fields). For an explanation of the QuickTime file format read .

V_PRORES Codec ID: V_PRORES Codec Name: Apple ProRes Initialization: The CodecPrivate contains the FourCC as found in MP4 movies:

ap4x: ProRes 4444 XQ
ap4h: ProRes 4444
apch: ProRes 422 High Quality
apcn: ProRes 422 Standard Definition
apcs: ProRes 422 LT
apco: ProRes 422 Proxy
aprh: ProRes RAW High Quality
aprn: ProRes RAW Standard Definition

ProRes is defined as .

V_REAL/RV10 Codec ID: V_REAL/RV10 Codec Name: RealVideo 1.0 aka RealVideo 5 Description: Individual slices from the Real container are combined into a single frame. Initialization: The CodecPrivate contains a real_video_props_t structure in big-endian byte order as found in .

V_REAL/RV20 Codec ID: V_REAL/RV20 Codec Name: RealVideo G2 and RealVideo G2+SVT Description: Individual slices from the Real container are combined into a single frame. Initialization: The CodecPrivate contains a real_video_props_t structure in big-endian byte order as found in .

V_REAL/RV30 Codec ID: V_REAL/RV30 Codec Name: RealVideo 8 Description: Individual slices from the Real container are combined into a single frame. Initialization: The CodecPrivate contains a real_video_props_t structure in big-endian byte order as found in .

V_REAL/RV40 Codec ID: V_REAL/RV40 Codec Name: rv40 : RealVideo 9 Description: Individual slices from the Real container are combined into a single frame. Initialization: The CodecPrivate contains a real_video_props_t structure in big-endian byte order as found in .

V_THEORA Codec ID: V_THEORA Codec Name: Theora Description: Frames correspond to a Theora Frame as defined in . Initialization: The CodecPrivate contains the first three Theora packets in order. The lengths of the packets precedes them. The actual layout is:

Byte 1: number of distinct packets #p minus one inside the CodecPrivate block. This MUST be "2" for current (as of 2016-07-08) Theora headers.
Bytes 2..n: lengths of the first #p packets, coded in Xiph-style lacing. The length of the last packet is the length of the CodecPrivate block minus the lengths coded in these bytes minus one.
Bytes n+1..: The Theora identification header, followed by the commend header followed by the codec setup header. Those are described in the .

V_UNCOMPRESSED Codec ID: V_UNCOMPRESSED Codec Name: Video, raw uncompressed video frames Description: The codec doesn't use any form of compression. All the relevant fields of the TrackEntry\Video element MUST be filled to play this content correctly. In addition the packing of RGB, YUV, etc. pixels MUST be declared with a TrackEntry\Video\UncompressedFourCC element with values defined in . Initialization: none

V_VP8 Codec ID: V_VP8 Codec Name: VP8 Codec format Description: VP8 is an open and royalty free video compression format developed by Google and created by On2 Technologies as a successor to VP7. Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions. The BlockAddID of the BlockMore containing these data MUST be 1. Initialization: none

V_VP9 Codec ID: V_VP9 Codec Name: VP9 Codec format Description: VP9 is an open and royalty free video compression format developed by Google as a successor to VP8. Codec BlockAdditions: A single-channel encoding of an alpha channel MAY be stored in BlockAdditions. The BlockAddID of the BlockMore containing these data MUST be 1. Initialization: Encoders are strongly encouraged to provide a CodecPrivate that contains a list of specific VP9 codec features as described in the "VP9 Codec Feature Metadata" section of . This piece of data helps to select a decoder on playback, but as many muxers don't provide the CodecPrivate for "V_VP9" it is not a hard requirement. It is possible for the decoder to reconstruct the "VP9 Codec Feature Metadata" from the first frame in case the CodecPrivate is not present. Note that the format differs from the VPCodecConfigurationRecord structure, as defined in .

Audio Codec Mappings All codecs described in this section MUST have a TrackType () value of "2" for audio. The track using these codecs MUST contain an Audio element -- i.e., EBML Path \Segment\Tracks\TrackEntry\Audio. Most audio codec contain meta information about the data they contain, like encoded sampling frequency, channel count, etc. Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Audio and \Segment\Tracks\TrackEntry elements. These values MUST be valid for the whole Segment.

A_AAC Codec ID: A_AAC Codec Name: Advanced Audio Coding (AAC) Description: Individual frames of AAC raw_data_block(), stored as defined in subpart 4 of . Initialization: The CodecPrivate contains an AudioSpecificConfig structure, as defined in .

A_AAC/MPEG2/LC Codec ID: A_AAC/MPEG2/LC Codec Name: Low Complexity Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG2/LC/SBR Codec ID: A_AAC/MPEG2/LC/SBR Codec Name: Low Complexity with Spectral Band Replication Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG2/MAIN Codec ID: A_AAC/MPEG2/MAIN Codec Name: MPEG2 Main Profile Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG2/SSR Codec ID: A_AAC/MPEG2/SSR Codec Name: Scalable Sampling Rate Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG4/LC Codec ID: A_AAC/MPEG4/LC Codec Name: Low Complexity Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG4/LC/SBR Codec ID: A_AAC/MPEG4/LC/SBR Codec Name: Low Complexity with Spectral Band Replication Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG4/LTP Codec ID: A_AAC/MPEG4/LTP Codec Name: Long Term Prediction Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG4/MAIN Codec ID: A_AAC/MPEG4/MAIN Codec Name: MPEG4 Main Profile Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AAC/MPEG4/SSR Codec ID: A_AAC/MPEG4/SSR Codec Name: Scalable Sampling Rate Description: The audio stream is stripped from ADTS headers and normal Matroska frame based muxing scheme is applied. Initialization: none Superseded By: A_AAC ()

A_AC3 Codec ID: A_AC3 Codec Name: Dolby Digital / AC-3 Description: Individual frames of AC-3 syncframe() stored as described in or when the value of the bsid field defined in Section 5.4.2.1 of or Section 4.4.2.1 of is 10 or below. Channel number have to be read from the corresponding audio element Initialization: none

A_AC3/BSID9 Codec ID: A_AC3/BSID9 Codec Name: Dolby Digital / AC-3 Description: Individual frames of AC-3 syncframe() stored as described in or when the value of the bsid field defined in Section 5.4.2.1 of or Section 4.4.2.1 of is 9. Note that the value 9 in the bsid field is not standard but it is defacto used for dividing the sampling rate defined in Section 5.4.1.3 of or Section 4.4.2.1 of by 2. Using this Codec ID is NOT RECOMMENDED as many Matroska Players don't support it. The generic A_AC3 Codec ID SHOULD be used instead as it supports a bsid of 9 as well. Initialization: none

A_AC3/BSID10 Codec ID: A_AC3/BSID10 Codec Name: Dolby Digital / AC-3 Description: Individual frames of AC-3 syncframe() stored as described in or when the value of the bsid field defined in Section 5.4.2.1 of or Section 4.4.2.1 of is 10. Note that the value 10 in the bsid field is not standard but it is defacto used for dividing the sampling rate defined in Section 5.4.1.3 of or Section 4.4.2.1 of by 4. Using this Codec ID is NOT RECOMMENDED as many Matroska Players don't support it. The generic A_AC3 Codec ID SHOULD be used instead as it supports a bsid of 10 as well. Initialization: none

A_ALAC Codec ID: A_ALAC Codec Name: ALAC (Apple Lossless Audio Codec) Initialization: The CodecPrivate contains ALAC's magic cookie (both the codec specific configuration as well as the optional channel layout information). Its format is described in the "Magic Cookie" defined in .

A_ATRAC/AT1 Codec ID: A_ATRAC/AT1 Codec Name: Sony ATRAC1 Codec Description: The original ATRAC codec by Sony, mainly used in MiniDisc platforms. The core technical details on ATRAC1 can be found in . An example encoder/decoder can be found at . Initialization: None

A_DTS Codec ID: A_DTS Codec Name: Digital Theatre System Description: Supports DTS, DTS-ES, DTS-96/26, DTS-HD High Resolution Audio and DTS-HD Master Audio. It corresponds to the base codec defined in . Initialization: none

A_DTS/EXPRESS Codec ID: A_DTS/EXPRESS Codec Name: Digital Theatre System Express Description: DTS Express (a.k.a. LBR) audio streams. It corresponds to the LBR extension of the DTS codec defined in section 9 of . Initialization: none

A_DTS/LOSSLESS Codec ID: A_DTS/LOSSLESS Codec Name: Digital Theatre System Lossless Description: DTS Lossless audio that does not have a core substream. It corresponds to the Lossless extension (XLL) of the DTS codec defined in section 8 of . Initialization: none

A_EAC3 Codec ID: A_EAC3 Codec Name: Dolby Digital Plus / E-AC-3 Description: Individual frames of E-AC-3 syncframe() stored as described in or when the value of the bsid field defined in Annex E Section 2.1 of or Section E.1.3.1.6 of is 11, 12, 13, 14, 15 or 16. Initialization: none

A_FLAC Codec ID: A_FLAC Codec Name: FLAC (Free Lossless Audio Codec) Description: The mapping of the FLAC framing and CodecPrivate is described in . Initialization: All FLAC data before the first audio frame; see .

A_MLP Codec ID: A_MLP Codec Name: Meridian Lossless Packing / MLP Description: A lossless audio codec used in DVD-Audio discs. The format is similar to Dolby TrueHD () but with fewer channels. Initialization: none

A_MPEG/L1 Codec ID: A_MPEG/L1 Codec Name: MPEG Audio 1, 2 Layer I Description: Frames correspond to Audio Frames of a Layer I bitstream as defined in . Initialization: none

A_MPEG/L2 Codec ID: A_MPEG/L2 Codec Name: MPEG Audio 1, 2 Layer II Description: Frames correspond to Audio Frames of a Layer II bitstream as defined in . Initialization: none

A_MPEG/L3 Codec ID: A_MPEG/L3 Codec Name: MPEG Audio 1, 2, 2.5 Layer III Description: Frames correspond to Audio Frames of a Layer III bitstream as defined in . Initialization: none

A_MS/ACM Codec ID: A_MS/ACM Codec Name: Microsoft Audio Codec Manager (ACM) Description: The data are stored in little-endian format (like on IA32 machines). Initialization: The CodecPrivate contains the structure including the extra format information bytes. The structure is stored without packing or padding bytes. A WORD corresponds to a signed 2 octets integer, DWORD corresponds to a signed 4 octets integer. The extra format information are appended after the WAVEFORMATEX octets.

A_REAL/14_4 Codec ID: A_REAL/14_4 Codec Name: Real Audio 1 Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_REAL/28_8 Codec ID: A_REAL/28_8 Codec Name: Real Audio 2 Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_REAL/ATRC Codec ID: A_REAL/ATRC Codec Name: Sony Atrac3 Codec Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_REAL/COOK Codec ID: A_REAL/COOK Codec Name: Real Audio Cook Codec (codename: Gecko) Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_REAL/RALF Codec ID: A_REAL/RALF Codec Name: Real Audio Lossless Format Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_REAL/SIPR Codec ID: A_REAL/SIPR Codec Name: Sipro Voice Codec Initialization: The CodecPrivate contains either the "real_audio_v4_props_t" or the "real_audio_v5_props_t" structure (differentiated by their "version" field; big-endian byte order) as found in [librmff].

A_OPUS Codec ID: A_OPUS Codec Name: Opus interactive speech and audio codec Description: The OPUS audio codec defined by using a similar encapsulation as the Ogg Encapsulation . Initialization: The track CodecPrivate MUST be present and contain the Identification Header defined in . Channels: The track Channels element value MUST be the "Output Channel Count" value of the Identification Header. SamplingFrequency: The track SamplingFrequency element value MUST be the "Input Sample Rate" value of the Identification Header. CodecDelay: The track CodecDelay element MUST be present and set to the "Pre-skip" value of the Identification Header translated to Matroska Ticks. The "Pre-skip" value is in samples at 48,000 Hz. The formula to get the CodecDelay is: SeekPreRoll: The track SeekPreRoll element SHOULD be present and set to 80,000,000 -- 80 ms in Matroska Ticks -- in order to ensure that the output audio is correct by the time it reaches the seek target.

A_PCM/FLOAT/IEEE Codec ID: A_PCM/FLOAT/IEEE Codec Name: Floating-Point, IEEE compatible Description: The audio bit depth MUST be read and set from the BitDepth element (32 bits in most cases). The floats are stored as defined in and in little-endian order. Initialization: none

A_PCM/INT/BIG Codec ID: A_PCM/INT/BIG Codec Name: PCM Integer Big Endian Description: The audio bit depth MUST be read and set from the BitDepth element. Audio samples MUST be considered as signed values, unless the audio bit depth is 8 which MUST be interpreted as unsigned values. Initialization: none

A_PCM/INT/LIT Codec ID: A_PCM/INT/LIT Codec Name: PCM Integer Little Endian Description: The audio bit depth MUST be read and set from the BitDepth element. Audio samples MUST be considered as signed values, unless the audio bit depth is 8 which MUST be interpreted as unsigned values. Initialization: none

A_QUICKTIME Codec ID: A_QUICKTIME Codec Name: Audio taken from QuickTime files Description: Several codecs as stored in QuickTime (e.g., QDesign Music v1 or v2). Initialization: The CodecPrivate contains all additional data that is stored in the 'stsd' (sample description) atom in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields). For an explanation of the QuickTime file format read .

A_QUICKTIME/QDMC Codec ID: A_QUICKTIME/QDMC Codec Name: QDesign Music Description: Initialization: The CodecPrivate contains all additional data that is stored in the 'stsd' (sample description) atom in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields). For an explanation of the QuickTime file format read . Superseded By: A_QUICKTIME ()

A_QUICKTIME/QDM2 Codec ID: A_QUICKTIME/QDM2 Codec Name: QDesign Music v2 Description: Initialization: The CodecPrivate contains all additional data that is stored in the 'stsd' (sample description) atom in the QuickTime file after the mandatory sound descriptor structure (starting with the size and FourCC fields). For an explanation of the QuickTime file format read . Superseded By: A_QUICKTIME ()

A_TRUEHD Codec ID: A_TRUEHD Codec Name: Dolby TrueHD Description: Lossless audio codec from Dolby. Each Matroska frame corresponds to a single Access Unit as defined in . Initialization: none

A_TTA1 Codec ID: A_TTA1 Codec Name: The True Audio lossless audio compressor Description: The format is described in . Each frame is kept intact, including the CRC32. The header and seektable are dropped. SamplingFrequency, Channels and BitDepth are used in the TrackEntry. Initialization: The CodecPrivate contains the TTA Header Structure, as defined in .

A_VORBIS Codec ID: A_VORBIS Codec Name: Vorbis Initialization: The CodecPrivate contains the first three Vorbis packet in order. The lengths of the packets precedes them. The actual layout is:

Byte 1: number of distinct packets #p minus one inside the CodecPrivate block. This MUST be "2" for current (as of 2016-07-08) Vorbis headers.
Bytes 2..n: lengths of the first #p packets, coded in Xiph-style lacing. The length of the last packet is the length of the CodecPrivate block minus the lengths coded in these bytes minus one.
Bytes n+1..: The "Identification Header" as defined in Section 4.2.2 of , followed by the "Comment Header" as defined in Section 5 of , followed by the "Setup Header" as defined in Section 4.2.4 of .

A_WAVPACK4 Codec ID: A_WAVPACK4 Codec Name: WavPack lossless audio compressor Description: The WavPack packets consist of a block defined in with a WavpackHeader header. For multichannel (> 2 channels) a frame consists of many packets. For more details, check the WavPack muxing description . Codec BlockAdditions: For hybrid A_WAVPACK4 encodings (that include a lossy encoding with a supplemental correction to produce a lossless encoding), the correction part is stored in BlockAdditional. The BlockAddID of the BlockMore containing these data MUST be 1. Initialization: The CodecPrivate contains the version 16-bit integer from the WavpackHeader of stored in little-endian.

Subtitle Codec Mappings All codecs described in this section MUST have a TrackType () value of "17" for subtitles. Subtitle codec often contain meta information about the data they contain, like expected output dimension, language, etc. Whenever possible these information inside the codec SHOULD be extracted and repeated at the Matroska level with the appropriate element(s) inside the \Segment\Tracks\TrackEntry\Video and \Segment\Tracks\TrackEntry elements. These values MUST be valid for the whole Segment.

S_ARIBSUB Codec ID: S_ARIBSUB Codec Name: ARIB STD-B24 subtitles Description: This is the textual subtitle format used in the ISDB/ARIB broadcasting standard. For more information see on ARIB (ISDB) subtitles. Initialization: The CodecPrivate data are defined in .

S_DVBSUB Codec ID: S_DVBSUB Codec Name: Digital Video Broadcasting (DVB) subtitles Description: This is the graphical subtitle format used in the Digital Video Broadcasting standard. For more information see on Digital Video Broadcasting (DVB). Initialization: The CodecPrivate data are defined in .

S_HDMV/PGS Codec ID: S_HDMV/PGS Codec Name: HDMV presentation graphics subtitles (PGS) Description: This is the graphical subtitle format used on Blu-rays. For more information, see on HDMV text presentation. Initialization: none

S_HDMV/TEXTST Codec ID: S_HDMV/TEXTST Codec Name: HDMV text subtitles Description: This is the textual subtitle format used on Blu-rays. For more information, see on HDMV graphics presentation. Initialization: The CodecPrivate data are defined in .

S_KATE Codec ID: S_KATE Codec Name: Karaoke And Text Encapsulation Description: A subtitle format developed for ogg. The mapping for Matroska is described on the "Matroska mapping" section of . The codec MAY use embedded fonts from attachments, as defined in , in that case the TrackEntry MUST contain an AttachmentLink element. Initialization: Kate headers are stored in the CodecPrivate as xiph-laced packets. The length of the last packet isn't encoded, it is deduced from the sizes of the other packets and the total size of the CodecPrivate.

S_IMAGE/BMP Codec ID: S_IMAGE/BMP Codec Name: Bitmap Description: Basic image based subtitle format; The subtitles are stored as images, like in the DVD . The timestamp in the block header of Matroska indicates the start display time, the duration is set with the BlockDuration element. The full data for the subtitle bitmap is stored in the Block's data section. Initialization: none

S_TEXT/ASS Codec ID: S_TEXT/ASS Codec Name: Advanced SubStation Alpha Format Description: Each event is stored in its own Block. For more information see on SSA/ASS. This codec ID MUST be used when "ScriptType: v4.00+" or "[V4+ Styles]" sections are found in the original SSA script. The codec MAY use embedded fonts from attachments, as defined in , in that case the TrackEntry MUST contain an AttachmentLink element. The codec MAY also be found with the Codec ID S_ASS in legacy media containers, but using that value is NOT RECOMMENDED. Initialization: The "[Script Info]" and "[V4 Styles]" sections are stored in the CodecPrivate.

S_TEXT/ASCII Codec ID: S_TEXT/ASCII Codec Name: ASCII Plain Text Description: Basic text subtitles with only ASCII characters allowed. Initialization: none

S_TEXT/SSA Codec ID: S_TEXT/SSA Codec Name: SubStation Alpha Format Description: Each event is stored in its own Block. For more information see on SSA/ASS. This codec ID MUST NOT be used when "ScriptType: v4.00+" or "[V4+ Styles]" sections are found in the original SSA script. The codec MAY use embedded fonts from attachments, as defined in , in that case the TrackEntry MUST contain an AttachmentLink element. The codec MAY also be found with the Codec ID S_SSA, but using that value is NOT RECOMMENDED. Initialization: The "[Script Info]" and "[V4+ Styles]" sections are stored in the CodecPrivate.

S_TEXT/USF Codec ID: S_TEXT/USF Codec Name: Universal Subtitle Format Description: An XML based subtitle format. Each BlockGroup contains XML data from a "subtitle" XML element as defined in section 3.4 of , without the "subtitle" element itself and with the start, stop duration mapped to the BlockGroup timestamp and BlockDuration element. The "image" XML elements are turned into Matroska attachments and replaced in the stream with their attachment filename. The codec MAY use embedded fonts from attachments, as defined in , in that case the TrackEntry MUST contain an AttachmentLink element. Initialization: The CodecPrivate element MAY be present. If present it MAY contain "metadata", "styles" and "effects" XML elements usable in the whole stream inside a parent "USFSubtitles" XML parent element, similar to the "USFSubtitles" element of a standalone USF file but without the "subtitles" XML element.

S_TEXT/UTF8 Codec ID: S_TEXT/UTF8 Codec Name: UTF-8 Plain Text Description: Basic text subtitles. For more information see on Subtitles. Initialization: none

S_TEXT/WEBVTT Codec ID: S_TEXT/WEBVTT Codec Name: Web Video Text Tracks Format (WebVTT) Description: Advanced text subtitles defined by . For more information see . Initialization: The CodecPrivate contains the WebVTT file body up to the first WebVTT cue block. Codec BlockAdditions: Intermediate non-Cue Blocks SHOULD be stored in BlockAdditions. The BlockAddID of the BlockMore containing these data MUST be 1.

S_VOBSUB Codec ID: S_VOBSUB Codec Name: VobSub subtitles Description: Uses data from files. The data represent subtitle data used on DVDs . VobSubs consist of two files, the .idx containing information, and the .sub, containing the actual data. Only version 7 and newer of VobSubs files are supported. The line of the .idx file beginning with "id:" MUST be transformed into the appropriate Matroska track language element. For each line of the .idx file containing a "timestamp:" and "filepos:" data is read from the appropriate position in the .sub file. This data consists of a MPEG program stream which in turn contains SPU packets. The MPEG program stream data is discarded, and each SPU packet is put into one Matroska frame. Initialization: The CodecPrivate contains the "palette:" and "size:" lines from the .idx file. Other lines from the .idx file not containing empty lines, comments, or starting with "alt:"; "langidx:", "id:", or "timestamp:" MAY be added in the CodecPrivate data for preservation.

Button Codec Mappings All codecs described in this section MUST have a TrackType () value of "18" for buttons.

B_VOBBTN Codec ID: B_VOBBTN Codec Name: VobBtn Buttons Description: Based on MPEG/VOB PCI packets. The frame contains a header consisting of the string "butonDVD" followed by the width and height in pixels (16-bit unsigned integer each) and 4 reserved bytes. The rest is a full PCI packet described in .

Block Addition Mappings This section describes the various types of BlockAdditionMapping that can be stored in Matroska. These help the player interpret the multiple BlockAdditions that can be added to each Matroska BlockGroup. More details can be found in section .

Defining Block Addition Mappings Support for a Block Addition mapping is defined in Matroska with the following values.

Block Type Identifier Each BlockAdditionMapping supported in Matroska MUST have a unique BlockAddIDType. It MUST be defined for each Block Addition Mapping.

Block Type Name Each BlockAdditionMapping supported in Matroska MAY have a BlockAddIDName. The BlockAddIDName provides a readable label for the encoding.

Description An optional description for the encoding. This value is only intended for human consumption.

Initial Block Addition Mappings

Use BlockAddIDValue Block type identifier: 0 Block type name: "Use BlockAddIDValue" Description: This value indicates that the actual type is stored in BlockAddIDValue instead. This value is used when it is important to have a strong compatibility with players or derived formats not supporting BlockAdditionMapping but using BlockAdditions with an unknown BlockAddIDValue, and SHOULD NOT be used if it is possible to use another value.

Opaque Data Block type identifier: 1 Block type name: "Opaque data" Description: the BlockAdditional data is interpreted as opaque additional data passed to the codec with the Block data. The usage of these BlockAdditional data is defined in the "Codec BlockAdditions" section of the codec; see .

ITU T.35 Metadata Block type identifier: 4 Block type name: "ITU T.35 metadata" Description: the BlockAdditional data is interpreted as ITU T.35 metadata, as defined by terminal codes. BlockAddIDValue MUST be 4. HDR10+ dynamic metadata can be stored as ITU T.35 terminal codes as defined in Table 8 of .

SMPTE ST 12-1 Timecode Block type identifier: 121 Block type name: "SMPTE ST 12-1 timecode" Description: the BlockAdditional data is defined in .

avcE Block type identifier: 0x61766345 Block type name: Dolby Vision enhancement-layer AVC configuration Description: the BlockAddIDExtraData data is interpreted as the Dolby Vision enhancement-layer AVC configuration box as described in . This extension MUST NOT be used if CodecID is not V_MPEG4/ISO/AVC.

hvcE Block type identifier: 0x68766345 Block type name: "Dolby Vision enhancement-layer HEVC configuration" Description: the BlockAddIDExtraData data is interpreted as the Dolby Vision enhancement-layer HEVC configuration as described in . This extension MUST NOT be used if CodecID is not V_MPEGH/ISO/HEVC.

dvcC Block type identifier: 0x64766343 Block type name: "Dolby Vision configuration dvcC" Description: the BlockAddIDExtraData data is interpreted as DOVIDecoderConfigurationRecord structure, as defined in , for Dolby Vision profiles 0 to 7 inclusive.

dvvC Block type identifier: 0x64767643 Block type name: "Dolby Vision configuration dvvC" Description: the BlockAddIDExtraData data is interpreted as DOVIDecoderConfigurationRecord structure, as defined in , for Dolby Vision profiles 8 to 10 inclusive and 20.

dvwC Block type identifier: 0x64767743 Block type name: "Dolby Vision configuration dvwC" Description: the BlockAddIDExtraData data is interpreted as DOVIDecoderConfigurationRecord structure, as defined in , for Dolby Vision profiles 11 to 19 inclusive.

mvcC Block type identifier: 0x6D766343 Block type name: "MVC configuration" Description: the BlockAddIDExtraData data is interpreted as MVCDecoderConfigurationRecord structure, as defined in . This extension MUST NOT be used if CodecID is not V_MPEG4/ISO/AVC.

Audio Codecs

WavPack WavPack is an audio codec primarily designed for lossless audio, but it can also be used as a lossy codec. stores each data in variable length frames. That means each frame can have a different number of samples. Each WavPack block starts with a WavpackHeader header as defined in , stored in little-endian. To save space and avoid redundant information in Matroska some data from the WavpackHeader header are removed, when saved in Matroska. All the data from the WavpackHeader are kept in little-endian. The CodecPrivate contains the version 16-bit integer from the WavpackHeader of stored in little-endian. Depending on the number of audio channels and whether the hybrid mode is kept or not, the storage of WavPack blocks in Matroska differ.

Lossless And Lossy Storage For multichannel files (more than 2 channels, like for 5.1), a frame consists of multiple WavPack blocks. The first one having the INITIAL_BLOCK (bit 11) flag set and the last one the FINAL_BLOCK (bit 12) flag set. For a mono or stereo file, both flags are set in each WavPack block.

Mono/Stereo A Block or SimpleBlock frame contains the following header with the some fields taken from the WavpackHeader of a single WavPack block followed by the data of that WavPack block.

Multichannel For multichannel files, a WavPack file uses multiple WavPack block to store all channels of a frame. The WavPack blocks for each channels of a frame are stored consecutively into a Matroska Block or SimpleBlock. Each WavPack block is preceded by a header. The header for the first WavPack block is similar to the mono/stereo one () with the addition of a "blocksize" field, which is the size of the first WavPack block minus the WavpackHeader size. The header for the following WavPack blocks use the "flags" and "crc" of the WavpackHeader of each respective WavPack block, followed with the size of each respective WavPack block minus the WavpackHeader size.

Hybrid Storage WavPack has a hybrid mode that splits the audio frames between lossy and correction packets. Adding both gives a lossless version of the original audio. It is possible to only store the lossy part in Matroska or both together. Storing only the lossy part is equivalent to the format described in . This section explains how to store all hybrid data in Matroska. Hybrid WavPack is encoded in 2 files. The first one has a lossy part and the second file has the correction part to reconstruct the original audio losslessly. Each WavPack block in the correction file corresponds to a WavPack block in the lossy file with the same number of samples, that's also true for a multichannel file. This means that if a frame is made of 4 WavPack blocks, the correction file will have 4 WavPack blocks in the corresponding frame. The header of the correction WavPack block is exactly the same as in the lossy WavPack block, except for the CRC. In Matroska, the correction part is stored as an additional data available to the Block (see ). This way a file could be remuxed and not keep the Block Additional data and still be usable as a lossy WavPack file. The Block data of the lossy file are stored exactly the same as for lossy storage defined in . A BlockAdditionMapping MUST be used for hybrid WavPack TrackEntry'. The BlockAddIDType of that BlockAdditionMapping MUST be set to 1 for hybrid WavPack, corresponding to Opaque data; see . Each WavPack frame is stored in a BlockGroup that MUST have at least a BlockMore to hold the correction data. The BlockAddID of that BlockMore MUST be 1, i.e., the default value.

Mono/Stereo The BlockAdditional element of the correction data BlockMore contains the following header with the "crc" field from the WavpackHeader of the WavPack block of the correction file matching the WavPack block of the lossy frame used to fill the Block data, followed by the data of that correction file WavPack block.

Multichannel The BlockAdditional element of the correction data BlockMore contains the following header with the data from the each WavpackHeader of the WavPack block of the correction file matching the WavPack block in the lossy file used to fill the Block data, followed by the data of the correction file WavPack block.

Subtitles Here is a list of guidelines for storing subtitles in Matroska:

As a general rule of thumb for all codecs, information that is global to an entire stream SHOULD be stored in the CodecPrivate element, although not all codec mappings are designed this way.
As subtitles usually come with a start and stop timestamps or a start timestamp and a duration, SimpleBlock is usually not used as it doesn't allow storing the BlockDuration. One exception would be if the subtitle track has a DefaultDuration which doesn't require a BlockDuration.
Start and stop timestamps that are used in a timestamps original storage format SHOULD be removed when being placed in Matroska as they could interfere if the file is edited afterwards. Instead, the Block's timestamp and BlockDuration SHOULD be used to say when the timestamp is displayed.
Because a "subtitle" stream is actually just an overlay stream, anything with a transparency layer could be used, including video.

Images Subtitles A common image format imported into Matroska is the VobSub subtitle format. This subtitle type is generated by exporting the subtitles from a DVD . If the subtitle version in the .IDX file is less than v7, the content has to be remuxed as the S_VOBSUB CodecID only supports version 7 and newer of VobSubs files; see . One way to remux the subtitles is to use the SubResync utility from VobSub 2.23 (or MPC) into v7 format. Generally any newly created subs will be in v7 format. The .IFO file will not be used at all. If there is more than one subtitle stream in the VobSub set, each stream is separated into separate tracks for storage in Matroska. E.g. the VobSub file contains streams for both English and German subtitles. Then the resulting Matroska file will contain multiple tracks, and language information can be mapped to Matroska's language tags and dropped from the streams. The .IDX file is reformatted (see below) and placed in the CodecPrivate. Each .BMP will be stored in its own Block. The Timestamp will be stored in the Block timestamp and the duration will be stored in the Default Duration. Here is an example .IDX file: First, lines beginning with "#" are removed. These are comments to make text file editing easier, and as this is not a text file, they aren't needed. Next remove the "langidx" and "id" lines. These are used to differentiate the subtitle streams and define the language. As the streams will be stored separately anyway, there is no need to differentiate them here. Also, the language setting will be stored in the Matroska tags, so there is no need to store it here. Finally, the "timestamp" will be used to set the Block's timestamp. Once it is set there, there is no need for it to be stored here. Also, as it may interfere if the file is edited, it SHOULD NOT be stored here and it MUST NOT be used by the decoder. Once all of these items are removed, the data to store in the CodecPrivate SHOULD look like this: There SHOULD also be two Blocks containing one image each with the timestamps "00:00:01:101" and "00:00:08:708".

SRT Subtitles SRT is perhaps the most basic of all subtitle formats. It consists of four parts, all in text:

A number indicating which subtitle it is in the sequence.
The time that the subtitle appears on the screen, and then disappears.
The subtitle itself.
A blank line indicating the start of a new subtitle.

When placing SRT in Matroska, part 3 is converted to UTF-8 (S_TEXT/UTF8) and placed in the data portion of the Block. Part 2 is used to set the timestamp of the Block, and BlockDuration element. Nothing else is used. Here is an example SRT file: 00:02:20,375 Senator, we're making our final approach into Coruscant. 2 00:02:20,476 --> 00:02:22,501 Very good, Lieutenant. ]]> In this example, the text "Senator, we're making our final approach into Coruscant." would be converted into UTF-8 and placed in the Block. The timestamp of the block would be set to "00:02:17,440". And the BlockDuration element would be set to "00:00:02,935". The same is repeated for the next subtitle. Because there are no general settings for SRT, the CodecPrivate is left blank.

SSA/ASS Subtitles SSA stands for Sub Station Alpha. It's the file format used by the popular subtitle editor SubStation Alpha. It allows you to do some advanced display features, like positioning, karaoke, or style managements... For detailed information on SSA/ASS, see the SSA specs . It includes an SSA specs description and the advanced features added by ASS format (standing for Advanced SSA). Because SSA and ASS are so similar, they are treated the same here. Like SRT, this format is text based with a particular syntax. A file consists of 4 or 5 parts, declared similar to an INI file. The first, "[Script Info]" contains some information about the subtitle file, such as its title, who created it, type of script and "PlayResY", which is very important, because everything in your script (font size, positioning) is scaled by it. Sub Station Alpha uses your desktops Y resolution to write this value, so if a friend with a large monitor and a high screen resolution gives you an edited script, you can mess everything up by saving the script in SSA with your low-resolution monitor. The second, "[V4 Styles]" or "[V4+ Styles]", is a list of style definitions. A style describes how a text will look on the screen. It defines font, font size, primary/.../outline color, position, alignment, etc. For example, this: The third, "[Events]", is the list of text you want to display at the right timing. You can specify some attributes here, such as the style to use for this event (MUST be defined in the list), the position of the text (Left, Right, Vertical Margin), or some effect. The Name is used by the translator to know who said this sentence. Timing is in h:mm:ss.cc (centisec). "[Pictures]" or "[Fonts]" part can be found in some SSA files. These parts contain UUE-encoded pictures/font. These features are only used by Sub Station Alpha -- i.e., no filter (Vobsub/Avery Lee Subtiler filter) uses them. Now, how are they stored in Matroska?

All text is converted to UTF-8
All the headers, "[Script Info]" and the "[V4 Styles]"/"[V4+ Styles]" list, are stored in CodecPrivate.
Start & End field are used to set TimeStamp and the BlockDuration element. the data stored is:
Events are stored in the Block in this order: ReadOrder, Layer, Style, Name, MarginL, MarginR, MarginV, Effect, Text (Layer comes from ASS specs ... it's empty for SSA.) "ReadOrder field is needed for the decoder to be able to reorder the streamed samples as they were placed originally in the file."

Here is an example of an SSA file. Here is what would be placed into the CodecPrivate element. And here are the two blocks that would be generated. Block's timestamp: 00:02:40.650 BlockDuration: 00:00:01.140 Block's timestamp: 00:02:42.420 BlockDuration: 00:00:01.730

WebVTT The "Web Video Text Tracks Format" (short: WebVTT) is developed by the World Wide Web Consortium (W3C). Its specifications are freely available at . The guiding principles for the storage of WebVTT in Matroska are:

Consistency: store data in a similar way to other subtitle codecs
Simplicity: making decoding and remuxing as easy as possible for existing infrastructures
Completeness: keeping as much data as possible from the original WebVTT file

Track Parameters The CodecID to use is S_TEXT/WEBVTT. This CodecPrivate contains all global blocks before the first subtitle entry. This starts at the "WEBVTT" file identification marker but excludes the optional byte order mark.

Storage of non-global WebVTT blocks Non-global WebVTT blocks (e.g., "NOTE") before a WebVTT caption or subtitle cue text are stored in Matroska's BlockAddition element together with the Matroska Block containing the WebVTT caption or subtitle cue text these blocks precede (see below for the actual format).

Storage of Cues in Matroska blocks Each WebVTT caption or subtitle cue text is stored directly in the Matroska Block. A muxer MUST change all WebVTT cue timestamp(s) present within the WebVTT caption or subtitle cue text to be relative to the Matroska Block's timestamp. The Cue's start timestamp is used as the Matroska Block's timestamp. The difference between the Cue's end timestamp and its start timestamp is used as the Matroska BlockDuration.

BlockAdditions Each Matroska Block may be accompanied by one BlockAdditions element. Its format is as follows:

The first line contains the WebVTT caption or subtitle cue text's optional WebVTT cue settings list followed by one line feed character (U+0x000a). The WebVTT cue settings list may be empty, in which case the line consists of the line feed character only.
The second line contains the WebVTT caption or subtitle cue text's optional WebVTT cue identifier followed by one line feed character (U+0x000a). The line may be empty indicating that there was no WebVTT cue identifier in the source file, in which case the line consists of the line feed character only.
The third and all following lines contain all WebVTT comment block(s) that precede the current WebVTT cue block. These may be absent. Each WebVTT comment block includes its WebVTT line terminator and is followed by one line feed character (U+0x000a). The last WebVTT comment block MAY omit the WebVTT line terminator and the line feed character.

If there is no Matroska BlockAddition element stored together with the Matroska Block, then WebVTT cue settings list, WebVTT cue identifier, and WebVTT comment block(s) MUST be assumed to be absent.

Example of Matroska Muxing Here's an example how a WebVTT is transformed. Consider the following example WebVTT file: 00:00:10.000 Example entry 1: Hello world. NOTE style blocks can't appear after the first cue. 00:00:25.000 --> 00:00:35.000 Example entry 2: Another entry. This one has multiple lines. 00:01:03.000 --> 00:01:06.500 position:90% align:right size:35% Entry 3: That stuff to the right of the \ timestamps are cue settings. 00:03:10.000 --> 00:03:20.000 Entry 4: Entries can even include timestamps. For example:<00:03:15.000>This becomes visible five seconds after the first part. ]]>

CodecPrivate The following XML depicts the CodecPrivate element contains the UTF-8 text of all global WebVTT blocks before the first subtitle entry: WEBVTT with text after the signature STYLE ::cue { background-image: linear-gradient(to bottom, dimgray, lightgray); color: papayawhip; } /* Style blocks cannot use blank lines nor "dash dash greater \ than" */ NOTE comment blocks can be used between style blocks. STYLE ::cue(b) { color: peachpuff; } REGION id:bill width:40% lines:3 regionanchor:0%,100% viewportanchor:10%,90% scroll:up NOTE Notes always span a whole block and can cover multiple lines. Like this one. An empty line ends the block. ]]>

Cue Block 1 The following XML depicts the nested elements of a BlockGroup element with of the first WebVTT cue block. The cue block timings are turned into Matroska timestamps. The last line feed character (U+0x000a) is stripped. The BlockAddition content starts with one empty line as there's no WebVTT cue settings list: Example entry 1: Hello world. 10000 1 hello ]]>

Cue Block 2 The following XML depicts the nested elements of a BlockGroup element with of the second WebVTT cue block. The last line feed character (U+0x000a) is stripped. The BlockAddition content starts with two empty lines as there's neither a WebVTT cue settings list nor a WebVTT cue identifier, Then follows the content of the WebVTT comment block(s). The last line feed character (U+0x000a) is stripped. Example entry 2: Another entry. This one has multiple lines. 10000 1 NOTE style blocks can't appear after the first cue. ]]>

Cue Block 3 The following XML depicts the nested elements of a BlockGroup element with of the third WebVTT cue block. The last line feed character (U+0x000a) is stripped. The BlockAddition content ends with an empty line as there is no WebVTT cue identifier and there were no WebVTT comment block. Entry 3: That stuff to the right of the \ timestamps are cue settings. 3500 1 position:90% align:right size:35% ]]>

Cue Block 4 The following XML depicts the nested elements of a BlockGroup element with of the fourth WebVTT cue block. The last line feed character (U+0x000a) is stripped. No BlockAddition is used. Entry 4: Entries can even include timestamps. For example:<00:03:15.000>This becomes visible five seconds after the first part. 10000 ]]>

Storage of WebVTT in Matroska vs. WebM Note: the storage of WebVTT in Matroska is not the same as the design document for storage of WebVTT in WebM . There are several reasons for this including but not limited to: the WebM document is old (from February 2012) and was based on an earlier draft of WebVTT and ignores several parts that were added to WebVTT later; WebM does still not support subtitles at all ; the proposal suggests splitting the information across multiple tracks making demuxer's and remuxer's life very difficult. WebM uses the "D_WEBVTT/SUBTITLES", "D_WEBVTT/CAPTIONS", "D_WEBVTT/DESCRIPTIONS", and "D_WEBVTT/METADATA" CodecID with different tracks depending on the data type and without a CodecPrivate.

HDMV Presentation Graphics Subtitles The specifications for the HDMV Presentation Graphics Subtitle format (short: HDMV PGS) can be found in in section 9.14 "HDMV graphics streams" of the Blu-ray specifications .

Track Parameters The CodecID to use is S_HDMV/PGS. A CodecPrivate element is not used.

Matroska Blocks Each HDMV PGS Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.14.2.1 "Segment coding structure and parameters" of the Blu-ray specifications . Each Segment contains a presentation timestamp. This timestamp will be used as the timestamp for the Matroska Block. A Segment is normally shown until a subsequent Segment is encountered. Therefore, the Matroska Block MAY have no Duration. In that case, a player MUST display a Segment within a Matroska Block until the next Segment is encountered. A muxer MAY use a Duration, e.g., by calculating the distance between two subsequent Segments. If a Matroska Block has a Duration, a player MUST display that Segment only for the duration of the BlockDuration.

HDMV Text Subtitles The specifications for the HDMV Text Subtitle format (short: HDMV TextST) can be found in section 9.15 "HDMV text subtitle streams" of the Blu-ray specifications .

Track Parameters The CodecID to use is S_HDMV/TEXTST. A CodecPrivate element is required. It MUST contain the stream's Dialog Style Segment as described in section 9.15.4.2 "Dialog Style Segment" of the Blu-ray specifications .

Matroska Blocks Each HDMV Dialog Presentation Segment (short: Segment) will be stored in a Matroska Block. A Segment is the data structure described in section 9.15.4.3 "Dialog presentation segment" of the Blu-ray specifications . Each Segment contains a start and an end presentation timestamp (short: start PTS & end PTS). The start PTS will be used as the timestamp for the Matroska Block. The Matroska Block MUST have a Duration, and that Duration is the difference between the end PTS and the start PTS. A player MUST use the Matroska Block's timestamp and BlockDuration instead of the Segment's start and end PTS for determining when and how long to show the Segment.

Character set When TextST subtitles are stored inside Matroska, the only allowed character set is UTF-8. Each HDMV text subtitle stream in a Blu-ray can use one of a handful of character sets. This information is not stored in the MPEG2 Transport Stream itself but in the accompanying Clip Information file. Therefore, a muxer MUST parse the accompanying Clip Information file. If the information indicates a character set other than UTF-8, it MUST re-encode all text Dialog Presentation Segments from the indicated character set to UTF-8 prior to storing them in Matroska.

Digital Video Broadcasting (DVB) subtitles The specifications for the Digital Video Broadcasting subtitle bitstream format (short: DVB subtitles) can be found in the document. The storage of DVB subtitles in MPEG transport streams is specified in the document.

Track Parameters The CodecID to use is S_DVBSUB. The CodecPrivate element is five bytes long and has the following structure:

2 bytes: composition page ID (bit string, left bit first)
2 bytes: ancillary page ID (bit string, left bit first)
1 byte: subtitling type (bit string, left bit first)

The semantics of these bytes are the same as the ones described in section 6.2.41 "Subtitling descriptor" of .

Matroska Blocks Each Matroska Block consists of one or more DVB Subtitle Segments as described in section 7.2 "Syntax and semantics of the subtitling segment" of . Each Matroska Block SHOULD have a Duration indicating how long the DVB Subtitle Segments in that Block SHOULD be displayed.

ARIB (ISDB) subtitles The specifications for the ARIB B-24 subtitle bitstream format (short: ARIB subtitles) and its storage in MPEG transport streams can be found in the documents , , and .

Track Parameters The CodecID to use is S_ARIBSUB. The CodecPrivate element is three bytes long and has the following structure:

1 byte: component tag (bit string, left bit first)
2 bytes: data component ID (bit string, left bit first)

The semantics of the component tag are the same as those described in , part 2, Annex J. The semantics of the data component ID are the same as those described in , fascicle 2, Vol. 3, Section 2, 4.2.8.1.

Matroska Blocks Each Matroska Block consists of a single synchronized PES data structure as described in chapter 5 "Independent PES transmission protocol" of , volume 3, with a Synchronized_PES_data_byte block containing one or more ISDB Caption Data Groups as described in chapter 9 "Transmission of caption and superimpose" of , volume 1, part 3. All of the Caption Statement Data Groups in a given Matroska Track MUST use the same language index. A Data Group is normally shown until a subsequent Group provides instructions to clear it. Therefore, the Matroska Block SHOULD NOT have a Duration. A player SHOULD display a Data Group within a Matroska Block until its internal duration elapses, or until a subsequent Data Group removes it.

Block Additional Mapping Extra data or metadata can be added to each Block using BlockAdditional data. Each BlockAdditional contains a BlockAddID that identifies the kind of data it contains. When the BlockAddID is set to "1" the contents of the BlockAdditional element are defined by the "Codec BlockAdditions" section of the codec; see . The following XML depicts the nested elements of a BlockGroup element with an example of BlockAdditions with a BlockAddID of "1": {Binary data of a VP9 video frame in YUV} 1 {alpha channel encoding to supplement the VP9 frame} ]]> When the BlockAddID is set a value greater than "1", then the contents of the BlockAdditional element are defined by the BlockAdditionMapping element, within the associated TrackEntry element, where the BlockAddID element of BlockAdditional element equals the BlockAddIDValue of the associated TrackEntry's BlockAdditionMapping element. That BlockAdditionMapping element identifies a particular Block Additional Mapping by the BlockAddIDType. The values of BlockAddID that are 2 or greater have no semantic meaning, but simply associate the BlockMore element with a BlockAdditionMapping of the associated Track. See on Block Additional Mappings for more information. It is RECOMMENDED to not use a value of 4 for BlockAddID and BlockAddIDValue when BlockAddIDType is not 4 -- i.e., ITU T.35 metadata , as some WebM-oriented demuxers consider a block with BlockAddID of 4 as ITU T.35 metadata without checking the BlockAddIDType element. The following XML depicts a use of a Block Additional Mapping to associate a timecode value with a Block: 1 568001708 1 2 timecode 121 V_FFV1

3000 {binary video frame} 2 01:00:00:00 ]]> Block Additional Mappings detail how additional data is stored in the BlockMore element with a BlockAdditionMapping element, within the Track element, which identifies the BlockAdditional content. Block Additional Mappings define the BlockAddIDType value reserved to identify that type of data as well as providing an optional label stored within the BlockAddIDName element. When the Block Additional Mapping is dependent on additional contextual information, then the Mapping SHOULD describe how such additional contextual information is stored within the BlockAddIDExtraData element.

SMPTE ST 12-1 Timecode Description SMPTE ST 12-1 timecode values can be stored in the BlockMore element to associate the content of a Matroska Block with a particular timecode value. If the Block uses Lacing, the timecode value is associated with the first frame of the Lace. The Block Additional Mapping contains a full binary representation of a 64-bit SMPTE timecode value stored in big-endian format and expressed exactly as defined in Section 8 and 9 of SMPTE 12M , without the 16-bit synchronization word. For convenience, here are the time address bit assignments as described in : SMPTE ST 12-1 Time Address Bit Positions

Bit Positions	Label
0--3	Units of frames
8--9	Tens of frames
16--19	Units of seconds
24--26	Tens of seconds
32--35	Units of minutes
40--42	Tens of minutes
48--51	Units of hours
56--57	Tens of hours

For example, a timecode value of "07:12:26;18" can be expressed as a 64-bit SMPTE 12M value as: Or with the irrelevant bits marked with an "x" which gives 26 usable bits: This is interpreted in hexadecimal:

0x8 units of frames
0x1 tens of frames
0x6 units of seconds
0x2 tens of seconds
0x2 units of minutes
0x1 tens of minutes
0x7 units of hours
0x0 tens of hours

Given no value is above 9, the BCD coding correspond to the actual values:

8 units of frames
1 tens of frames
6 units of seconds
2 tens of seconds
2 units of minutes
1 tens of minutes
7 units of hours
0 tens of hours

Or:

18 frames
26 seconds
12 minutes
07 hours

Security Considerations This document inherits security considerations from the EBML and Matroska documents. Codec handling may be one of the more error-prone aspect of using Matroska. The parsing and interpretation of binary data can lead to many types of security issues. Although these issues don't come from Matroska itself, it's worth noting some issues that need to be considered. The CodecPrivate may be missing from the TrackEntry description. The TrackEntry MAY be discarded in that case. An existing CodecPrivate data may be corrupted or incomplete or too big. The TrackEntry MAY be discarded in that case. A lot of codec have internal fields to hold values that are already found in the TrackEntry like the video dimensions or the audio sampling frequency. If these values differ that can lead to playback issues and even crashes.

IANA Considerations

Matroska Codec IDs Registry This document defines registries for Codec IDs stored in the CodecID element. A CodecID is a case-sensitive ASCII string with a prefix defined in . The details of the string format are found in . To register a new Codec ID in this registry, one needs a Codec ID string, a TrackType value, a description, a Change Controller, and an optional Reference to a document describing the Codec ID. Some Codec IDs values are deprecated. Such Codec IDs are marked as "Reclaimed" in the "Matroska Codec IDs" registry. "Matroska Codec IDs" are to be allocated according to the "First Come First Served" policy . shows the initial contents of the "Matroska Codec IDs" registry. The Change Controller for the initial entries is the IETF. Initial Contents of "Matroska Codec IDs" Registry

Codec ID	Track Type	Description	Reference
V_AV1	1	Alliance for Open Media AV1	This document,
V_AVS2	1	AVS2-P2/IEEE.1857.4	This document,
V_AVS3	1	AVS3-P2/IEEE.1857.10	This document,
V_CAVS	1	AVS1-P2/IEEE.1857.3	This document,
V_DIRAC	1	Dirac / VC-2	This document,
V_FFV1	1	FFV1	This document,
V_JPEG2000	1	JPEG 2000	This document,
V_MJPEG	1	Motion JPEG	This document,
V_MPEGH/ISO/HEVC	1	HEVC/H.265	This document,
V_MPEGI/ISO/VVC	1	VVC/H.266	This document,
V_MPEG1	1	MPEG 1	This document,
V_MPEG2	1	MPEG 2	This document,
V_MPEG4/ISO/AVC	1	AVC/H.264	This document,
V_MPEG4/ISO/AP	1	MPEG4 ISO advanced profile	This document,
V_MPEG4/ISO/ASP	1	MPEG4 ISO advanced simple profile	This document,
V_MPEG4/ISO/SP	1	MPEG4 ISO simple profile	This document,
V_MPEG4/MS/V3	1	Microsoft MPEG4 V3	This document,
V_MS/VFW/FOURCC	1	Microsoft Video Codec Manager	This document,
V_QUICKTIME	1	Video taken from QuickTime files	This document,
V_PRORES	1	Apple ProRes	This document,
V_REAL/RV10	1	RealVideo 1.0 aka RealVideo 5	This document,
V_REAL/RV20	1	RealVideo G2 and RealVideo G2+SVT	This document,
V_REAL/RV30	1	RealVideo 8	This document,
V_REAL/RV40	1	rv40 : RealVideo 9	This document,
V_THEORA	1	Theora	This document,
V_UNCOMPRESSED	1	Raw uncompressed video frames	This document,
V_VP8	1	VP8 Codec format	This document,
V_VP9	1	VP9 Codec format	This document,
A_AAC	2	Advanced Audio Coding	This document,
A_AAC/MPEG2/LC	2	Low Complexity	This document,
A_AAC/MPEG2/LC/SBR	2	Low Complexity with Spectral Band Replication	This document,
A_AAC/MPEG2/MAIN	2	MPEG2 Main Profile	This document,
A_AAC/MPEG2/SSR	2	Scalable Sampling Rate	This document,
A_AAC/MPEG4/LC	2	Low Complexity	This document,
A_AAC/MPEG4/LC/SBR	2	Low Complexity with Spectral Band Replication	This document,
A_AAC/MPEG4/LTP	2	Long Term Prediction	This document,
A_AAC/MPEG4/MAIN	2	MPEG4 Main Profile	This document,
A_AAC/MPEG4/SSR	2	Scalable Sampling Rate	This document,
A_AC3	2	Dolby Digital / AC-3	This document,
A_AC3/BSID9	2	Dolby Digital / AC-3	This document,
A_AC3/BSID10	2	Dolby Digital / AC-3	This document,
A_ALAC	2	ALAC (Apple Lossless Audio Codec)	This document,
A_ATRAC/AT1	2	Sony ATRAC1 Codec	This document,
A_DTS	2	Digital Theatre System	This document,
A_DTS/EXPRESS	2	Digital Theatre System Express	This document,
A_DTS/LOSSLESS	2	Digital Theatre System Lossless	This document,
A_EAC3	2	Dolby Digital Plus / E-AC-3	This document,
A_FLAC	2	FLAC	This document,
A_MLP	2	Meridian Lossless Packing / MLP	This document,
A_MPEG/L1	2	MPEG Audio 1, 2 Layer I	This document,
A_MPEG/L2	2	MPEG Audio 1, 2 Layer II	This document,
A_MPEG/L3	2	MPEG Audio 1, 2, 2.5 Layer III	This document,
A_MS/ACM	2	Microsoft Audio Codec Manager (ACM)	This document,
A_REAL/14_4	2	Real Audio 1	This document,
A_REAL/28_8	2	Real Audio 2	This document,
A_REAL/ATRC	2	Sony Atrac3 Codec	This document,
A_REAL/COOK	2	Real Audio Cook Codec	This document,
A_REAL/RALF	2	Real Audio Lossless Format	This document,
A_REAL/SIPR	2	Sipro Voice Codec	This document,
A_OPUS	2	Opus interactive speech and audio codec	This document,
A_PCM/FLOAT/IEEE	2	Floating-Point, IEEE compatible	This document,
A_PCM/INT/BIG	2	PCM Integer Big Endian	This document,
A_PCM/INT/LIT	2	PCM Integer Little Endian	This document,
A_QUICKTIME	2	Audio taken from QuickTime files	This document,
A_QUICKTIME/QDMC	2	QDesign Music	This document,
A_QUICKTIME/QDM2	2	QDesign Music v2	This document,
A_TRUEHD	2	Dolby TrueHD	This document,
A_TTA1	2	The True Audio	This document,
A_VORBIS	2	Vorbis	This document,
A_WAVPACK4	2	WavPack	This document,
S_ARIBSUB	17	ARIB STD-B24 subtitles	This document,
S_DVBSUB	17	Digital Video Broadcasting subtitles	This document,
S_HDMV/PGS	17	HDMV presentation graphics subtitles	This document,
S_HDMV/TEXTST	17	HDMV text subtitles	This document,
S_KATE	17	Karaoke And Text Encapsulation	This document,
S_IMAGE/BMP	17	Bitmap	This document,
S_ASS	17	Advanced SubStation Alpha Format	Reclaimed,
S_TEXT/ASS	17	Advanced SubStation Alpha Format	This document,
S_TEXT/ASCII	17	ASCII Plain Text	This document,
S_TEXT/SSA	17	SubStation Alpha Format	This document,
S_TEXT/USF	17	Universal Subtitle Format	This document,
S_TEXT/UTF8	17	UTF-8 Plain Text	This document,
S_TEXT/WEBVTT	17	Web Video Text Tracks (WebVTT)	This document,
S_SSA	17	SubStation Alpha Format	Reclaimed,
S_VOBSUB	17	VobSub subtitles	This document,
B_VOBBTN	18	VobBtn Buttons	This document,

Matroska BlockAdditional Type IDs Registry This document defines registries for BlockAdditional Type IDs stored in the BlockAddIDType element. The values correspond to the unsigned integer BlockAddIDType value described in . To register a new BlockAdditional Type ID in this registry, one needs a BlockAddIDType unsigned integer, a BlockAddIDName string value, a Change Controller, and an optional Reference to a document describing the BlockAdditional Type ID. "Matroska BlockAdditional Type IDs" are to be allocated according to the "First Come First Served" policy . shows the initial contents of the "Matroska BlockAdditional Type IDs" registry. The Change Controller for the initial entries is the IETF. Initial Contents of "Matroska BlockAdditional Type IDs" Registry

BlockAddIDType	BlockAddIDName	Reference
0	Use BlockAddIDValue	This document,
1	Opaque data	This document,
4	ITU T.35 metadata	This document,
121	SMPTE ST 12-1 timecode	This document,
0x61766345	Dolby Vision enhancement-layer AVC configuration	This document,
0x68766345	Dolby Vision enhancement-layer HEVC configuration	This document,
0x64766343	Dolby Vision configuration dvcC	This document,
0x64767643	Dolby Vision configuration dvvC	This document,
0x64767743	Dolby Vision configuration dvwC	This document,
0x6D766343	MVC configuration	This document,