Wednesday, January 9, 2013

MP4 file format

This is a MP4 file format notes that reference from ISO IEC 14496-12 2005 edition about Information technology — Coding of audio-visual objects — Part 12: ISO base media file format

This is not designed for details explanation of each atom. For detail information, please read the ISO IEC 14496-12 document.

General Format


In general, MP4 file format has the following structure

ftyp
  • File type box that denote the mp4 media type
mdat
  • Media data box which contains the actual AV frames.
  • Within a mdat, there are chunks and samples
moov
  • Movie box which is the container for all metadata
  • Each moov has have a mvhd (Movie header box)
  • It can contains N trak box. Each trak box contains media specific meta data information Usually, it will have 2 tracks (video and audio)
  • More importantly, it contains sample information such as stsd, stts, stsz stsc, stco, etc...

Mdat Atom

MPEG4 sample


H.264 sample


Mdat is the media data atom which contain video and audio frames. As you can see from the screenshot, it is separated into 2 tracks (video and audio). Each track has multiple chunks and each chunks has multiple samples. Usually, you can treat each sample as a AV frame.

The number of sample in the chunk is defined in stsc atom (sample to chunk box) and the chunk offset is defined in stco atom (chunk offset box).

For MPEG4 (see MPEG4 sample), the red box denote the start code for MPEG4 Elementary stream. ISO 14496-14 states that MPEG4 media-data is stored as access units, a range of contiguous bytes for each access unit (a single access unit is the definition of a ‘sample’ for an MPEG-4 media stream). See 3.1.1 of the document

For H.264 (see H.264 sample), the red box denote the frame size (4 bytes). The blue box is the start of the frame, in this case, it is H.264 Non-IDR frame. ISO 14496-15 states that H.264 sample needs a length field preceding each NAL. See 5.2.3 of that document.

STSC - Sample To Chunk Box


The stsc tells you the number of samples in a chunk. To read this, you need to read first chunk and samples per chunks together. In the screenshot, first chunk has 1, 3, 5, 6..... and samples per chunk has 4, 5, 4, 5.... This means the followings:

chunk 1 - 2 has 4 samples
chunk 3 - 4 has 5 samples
chunk 5 has 4 samples

and so on...

STCO - Chunk Offset Box


This box tells you the location of the chunk. This offset is referred from the start of file. In the screenshot, it has values of 1516, 4880,...

As this is a video track, that means the first video chunk is located at 1516 bytes of the file.

STSZ - Sample Size Box


This box tell you the size of each sample in the chunk. It also tells you the number of sample counts in this track.

If you look at the entry size, it state 2229, 529,....

That means the first sample has 2229 bytes and second samples has 529 bytes
STSD - Sample Description Box


This box tells you the codec type, initialization and any information requires for the coding in the track.

As you can see in the screenshot, it contains AVC configuration box. Those are the information required (SPS, PPS, etc..) for decoding this video track.

Reference: ISO IEC 14496-12, ISO IEC 14496-14, ISO IEC 14496-15
MP4 File Structure




MP4 file is a popular multimedia container format. Many video and audio files are in this format. It should not be confused with the codec method MPEG-4.
It is an instance of the more general format defined by ISO/IEC 14496-12:2004 (MPEG-4 Part 12: ISO base media file format) which is directly based upon QuickTime File Format.  In my opinion, ISO based media file format is over-engineered.
MP4 file format can also be used for video streaming by using Movie Fragments boxes.  The minimum structure for such streaming is depicted in the following figure:











http://www.tomkv.com/what-is-mp4-format.html
http://perso.telecom-paristech.fr/~concolat/MPEGFileFormats.pdf
http://hzhang.net/home/blog/mp4-file-structure


























No comments:

Post a Comment