AviSynth, variable framerate (vfr) video and hybrid video

There are two kinds of video when considering framerate, constant framerate (cfr) video and variable framerate (vfr) video. For cfr video the frames have a constant duration, and for vfr video the frames have a non-constant duration. Many editing programs (including VirtualDub and AviSynth) assume that the video is cfr. One of the reasons is that avi doesn't support vfr. This won't change in the near future for various reasons. Although the avi container doesn't support vfr, there are several containers (mkv, mp4 and wmv/asf for example) which do support vfr.

Hybrid video is commenly defined as being a mix of pulled-down material and non-pulled-down material (where the pulldown can be of fields, as in standard 3:2 pulldown, or frames, as in frame duplication). It's not relevant whether the pulldown is hard (the fields/frames are duplicated before the encoding) or soft (the pulldown is done by added the appriopriate flags in the stream which indicate which fields/frames should be duplicated during playback). So, it can be either cfr or vfr. Thus hybrid video is simply video with different base framerates (for example 8, 12, and 16 fps at which anime is often drawn). The base framerate is the rate before any pulldown. What makes hybrids challenging is the need to decide what final framerate to use.

Table of contents

Variable framerate and hybrid video

It's important to understand that usually video is cfr. There is one example where converting to vfr can be very useful, which is hybrid video. Hybrid video is video with different base framerates (for example 8, 12, and 16 fps at which anime is often drawn). The most common example of hybrid video consists of parts that are interlaced/progressive NTSC (29.97 fps) and other parts which are FILM (telecined from 23.976 fps to 29.97 fps). For soft pulldown, the NTSC part (also called video part) is played back at 29.97 fps and the telecined part also by duplicating fields (to go from 23.976 fps to 29.97 fps). For hard pulldown, it is played back at 29.97 fps without adding any fields. Other examples of hybrid video include many of the modern anime TV Series, many of the Sci-Fi TV Series, such as Stargate: SG1, Star Trek: TNG, and Babylon 5), and many of the "Making Of" documentaries included on DVD.

Examples of hybrid video include many of the modern anime TV Series, many of the Sci-Fi TV Series, such as Stargate: SG1, Star Trek: TNG, and Babylon 5), and many of the "Making Of" documentaries included on DVD, with a mix of 29.97 fps, either interlaced or progressive interviews and photographs, along with film clips originally 23.976 fps.

How to recognize vfr content (mkv/mp4)

Here are some ways to determine if the mkv/mp4 is vfr:

mkv: get timecodes file using mkv2vfr to check this.

mp4: this can be found out by using mp4dump (from the MPEG4 tools by MPEG4ip package). Open a dos prompt and type (using appropriate paths)

mp4dump -verbose=2 holly_xvid.mp4 > log.txt

Open the log file, and look for output like this (look up the stts atom to figure out the length of each frame):

type stts
       version = 0 (0x00)
       flags = 0 (0x000000)
       entryCount = 41 (0x00000029)
        sampleCount = 3 (0x00000003)
        sampleDelta = 1000 (0x000003e8)
        sampleCount[1] = 1 (0x00000001)
        sampleDelta[1] = 2000 (0x000007d0)
        sampleCount[2] = 3 (0x00000003)
        sampleDelta[2] = 1000 (0x000003e8)
        sampleCount[3] = 1 (0x00000001)
        sampleDelta[3] = 2000 (0x000007d0)
        sampleCount[4] = 3 (0x00000003)
        sampleDelta[4] = 1000 (0x000003e8)
        sampleCount[5] = 1 (0x00000001)
        sampleDelta[5] = 2000 (0x000007d0)
        etc ...

sampleDelta indicates how long the frames get displayed and sampleCount tells how many frames. Thus on the example above:
3 frame get displayed with length 1000
than 1 frames get displayed with length 2000
than 3 frame with length 1000
than 1 frame with length 2000
etc ...

The time values are of course not seconds, but "ticks", which you have to calculate into seconds via the "timescale" value.  This "timescale" is stored in timescale atom for the video track (make sure that you look at the right timescale for your track, cause every track has its own timescale). Look for output like this:

type mdia
    type mdhd
     version = 0 (0x00)
     flags = 0 (0x000000)
     creationTime = 3197912378 (0xbe9c453a)
     modificationTime = 3197912378 (0xbe9c453a)
     timeScale = 24976 (0x00006190)
     duration = 208000 (0x00032c80)
     language = 21956 (0x55c4)
     reserved = <2 bytes> 00 00 

In this example the timeScale is 24976. Most of the frames have a length of 1000. 1000/24976 = 0.04 which means each frame of the first 3 gets displayed with a length of 0.04 seconds, which is the equivalent to 25 fps (1/25 = 0.04). The next frame has a length of 2000. 2000/24976 = 0.08 which means that it is displayed with a length of 0.08, which is the equivalent to 12.5 fps (1/12.5 = 0.08). etc ...

The log file above comes from a video which is in fact hybrid.

Opening hybrid video (MPEG-2) in AviSynth and re-encoding

Assuming you have hybrid video, there are several ways to encode it. They are listed below. The first method is to convert it to cfr video (either 23.976 or 29.97 fps). The second one is to encode it at 120 fps using avi and dropped frames (where duplicate frames are dropped upon playback). The third one is to create true vfr using the mkv or mp4 container.

encoding to 23.976 fps or 29.97 fps

If we choose the video rate, the video sequences will be OK, but the FILM sequences will not be decimated, appearing slightly jumpy (due to the duplicated frames). On the other hand, if we choose the FILM rate, the FILM sequences will be OK, but the video sequences will be decimated, appearing jumpy (due to the "missing" frames). Additionally, when encoding to 29.97 fps, you will get lower quality for the same file size, because of the 25% greater number of frames. It's a tough decision which to choose. If the clip is mostly FILM you might choose 23.976 fps, and if the clip is mostly video you might choose 29.97 fps. The source also is a factor. If the majority of the video portions are fairly static "talking heads", for example, you might be able to decimate them to 23.976 fps without any obvious stutter on playback. When you create your d2v project file you will see whether the clip is mostly video (NTSC) or FILM (in the information box). However, many of these hybrids are encoded entirely as NTSC, with the film portions being "hard telecined" (the already telecined extra fields having also been encoded) so you'll have to examine the source carefully to determine what you have, and how you wish to treat it.

The AviSynth plugin Decomb provides two special decimation modes to better handle hybrid clips. To quote the Decomb documentation on how to use this plugin:

Mostly Film Clips (mode=3)

Let's first consider the case where the clip is mostly film. In this case, we want to decimate the film portions normally so they will be smooth. For the nonfilm portions, we want to reduce their frame rate by blend decimating each cycle of frames from 5 frames to 4 frames. Video sequences so rendered appear smoother than when they are decimated as film.

Here is a typical script to enable this mode of operation:

Telecide(order=0, guide=1)
Decimate(mode=3, threshold=1.0)

There are 2 factors that enable Decimate to treat the film and nonfilm portions appropriately. First, when Telecide declares guide=1, it is able to pass information to Decimate about which frames are derived from film and which from video. For this mechanism to work, Decimate must immediately follow Telecide. Clearly, the better job you do with pattern locking in Telecide (by tweaking parameters as required), the better job Decimate can do.

The second factor is the threshold. If a cycle of frames is seen that does not have a duplicate, then the cycle is treated as video. TIn order to choose the right threshold, it's a good idea to call Decimate with the argument show=true and visually determine how Decimate works with different threshold values. Note that threshold=0 disables the second factor.

Make sure to get the field order correct - DVDs are generally order=1, and captured video is generally order=0. The included DecombTutorial.html explains how to determine the field order.

Another IVTC was developed specifically to handle hybrid material without the blended frames that Decomb's mode=3 introduces. This is SmartDecimate. While you do get "clean" frames as a result, it also may play with slightly more stutter than does Decomb's result. A typical script might go:

B = TDeint(mode=1) # or KernelBob(order=1)
SmartDecimate(24, 60, B)

In order to keep the result as smooth playing as possible, it will insert the "Smart Bobbed" frames from time to time.

Mostly Video Clips (mode=1)

Now let's consider the case where the clip is mostly video. In this case, we want to avoid decimating the video portions in order to keep playback as smooth as possible. For the film portions, we want to leave them at the video rate but change the duplicated frames into frame blends so it is not so obvious.

Here is a typical script to enable this mode of operation:

Telecide(order=0, guide=1)
Decimate(mode=1, threshold=1.0)

There are 2 factors that enable Decimate to treat the film and nonfilm portions appropriately. First, when Telecide declares guide=1, it is able to pass information to Decimate about which frames are derived from film and which from video. For this mechanism to work, Decimate must immediately follow Telecide. Clearly, the better job you do with pattern locking in Telecide (by tweaking parameters as required), the better job Decimate can do.

The second factor is the threshold. If a cycle of frames is seen that does not have a duplicate, then the cycle is treated as video. The threshold determines what percentage of frame difference is considered to be a duplicate. Note that threshold=0 disables the second factor.

encoding at 120 fps using avi with dupe frames

You need the following utilities

The link above describes perfectly how to do it. But since it can be done more simply we will repeat the process here:

1) Create the index (idx) file:

Put DVD2AVI 1.76 and MPEG2DEC.dll in the same directory as the directory where FPSChk.exe is located or else it will CRASH. Start up FPSChk.exe. Unfortunately it's in Japanese, but we will guide you through it Hit ALT+F, then hit O to open a file. Open your D2V file and then you should be able to see it. The next step is hit ALT+A then hit S to scan the D2V for portions which are FILM and which are VIDEO. The counters on the status bar at the bottom will count down, so be patient and wait. When it is done scanning, hit ALT+F and then hit W to write the IDX file.

2) Encode a crap 30 fps file using dec60.dll:

Here you can use AviSynth v2.5 again (but you do need mpeg2dec.dll and dvd2avi 1.76). You need LoadPluginEx.dll to load the v2.0x plugins. Create the following script (don't forget to adjust the paths of course)

LoadPlugin("C:\Program Files\AviSynth2_temp\plugins\LoadPluginEx.dll")
LoadPlugin("C:\Program Files\AviSynth2_temp\Dec60.dll")
LoadPlugin("C:\Program Files\AviSynth2_temp\mpeg2dec.dll")

Mpeg2Source("F:\Guides\Hybrid\TNGsample.d2v")
Dec60(idxfile="F:\Guides\Hybrid\TNGsample.idx", deint=false)

Create your DivX/XviD.

3) Convert crappy 29.97 fps file to 119.88 fps:

Now run AVI60GUI.exe and you will notice three fields. The top one is where you load your avi that you have encoded from the DEC60 step. The second field asks where you wish to save the modified avi. The bottom field asks for your IDX file. Once all three are loaded and the 119.88 fps field is higlighted hit enter and wait. Once it is done, voila, you've made a silky smooth 120 fps encode. (Now it is the time to mux in your audio.)

This works because the actual frames rendered is 29.97 fps, but hacked in such a way that 23.976 fps and 29.97 fps are given a different rate of playback via dropped frames (which delay the frames by the correct amount of ms). (If you open the avi in VirtualDub you will see which frames will be dropped upon playback (below the frame you will see: Frame 23 (0:00:00.192) [D] [43.91 kB]).

creating vfr video using the mkv (matroska) container

First download mkvtoolnix. We will use this to mux our video into the MKV container WITH a timecode adjustment file.

If your video will contain any out-of-order frames (b-frames), you will need the TIVTC package from Tritical's AviSynth page. In addition to that utility, if you are encoding to ASP (like DivX or XviD) you will need the MPEG4 Modifier program. Alternatively, if you are encoding to AVC (like x264) you will need MPEG4IPTools.

There are several AviSynth plugins that you can use to generate the VFR video and required timecode file. An example is given below using the Decomb521VFR plugin. Another alternative is the TDecimate plugin contained in the TITVC package that you may have already downloaded. My personal favorite is the DeDup plugin. All these work in the same basic fashion by generating a timecode file that can later be used in mkvmerge.

Anyway, here are further details using the Decomb521VFR plugin:

Make the following script

Mpeg2Source("F:\Guides\Hybrid\TNGsample.d2v")
Decomb521VFR_Decimate(mode=4, threshold=1.0, progress=true, timecodes="F:\Guides\Hybrid\timecodes.txt", vfrstats="F:\Guides\Hybrid\video.vfrstats")
# see Decomb521VFR decomentation for details

and encode it (see note just below).

If your video will contain any out-of-order frames (b-frames), you should obtain the latest mkvmerge 1.5.5 or newer, which supports in-order timecode v2 files. You will still need to encode to MP4 (or MKV) for AVC video, but you can skip to the mux section below.

For b-frames with older versions of mkvmerge, encode to the AVI container format for ASP (divx, xvid) or MP4 container format for AVC (x264). In the latter case, if you can only generate raw x264, mux only the video stream into an MP4 container before you proceed. This is to handle a short-coming of mkvmerge. After that you need to get the frame ordering information from the AVI or MP4 container.

(More precisely, when muxing ASP files with b-frames with old versions of mkvmerge you should only need to reorder the timecode file if you are using "--engage native_mpeg4". It should playback properly without reordering the timecode file if you mux normally (vfw compatibility mode) (at least according to tritical). Though, due to the various vfw workarounds for b-frames and all that stuff I think the timecodes could/may end up being off by a few frames. However, --engage native_mpeg4 has always been broken and buggy all the way up to mkvmerge 1.5.5 so it was never recommended for use. Anyway, it is definitely the best to let the new versions (1.5.6 or newer) of mkvmerge handle it.)

For ASP AVI use MPEG4 Modifier:

  1. Run MPEG4 Modifier
  2. Load the AVI file
  3. Click on the "Video Info" button
  4. Click on the "Write Frame List..." button
  5. Save the file as dump.txt

For AVC MP4 use the mp4dump utility from the MPEG4IPTools package:

mp4dump -verbose movie.mp4 > dump.txt

Again, only for b-frame video, you now need to reorder the timecode file to match the reordered frames. Use the AVCTcA utility included in the TITVC package like so:

AVCTcA dump.txt timecodes.txt reordered.txt

Now, whether you used b-frames or not, it is time to mux to MKV format:

  1. From mkvtoolnix, open mmg.exe which is a gui for mkvmerge
  2. Add your video stream file
  3. Add your audio stream file
  4. Click on the imported video track
  5. Browse for the "reordered.txt" timecode file
  6. Click on the audio track
  7. If you used b-frames OR if your audio already needs a delay, set one now
  8. Start muxing

To play it you need a Matroska splitter. For AVC you will need Haali's Splitter, but for ASP you can use it or Gabest's Splitter.

creating vfr video using the mp4 container

It's not possible to do this in an automatic way (using a timecodes file). One way is to encode multiple cfr avi files (some with 23.976 fps FILM and some with 29.97 fps video) and join them directly into one vfr mp4 file with mp4box and the -cat option.

summary of the methods

Summing up the advantages and disadvantages of the above mentioned methods. When encoding to 23.976 or 29.97 fps the clip will be cfr (which is good if you want to re-edit it in AviSynth/VDub, because many editors need cfr material), but it may look jumpy on playback due to duplicated or missing frames. When encoding to 120 fps using drop frames, the clip is cfr and not jumpy on playback. However it is much effort to create those 120 fps encodings (and some of the required tools are closed source). Encoding to mkv using true vfr (using timecodes) is the only option that neither loses nor duplicates frames, however it is not nearly as broadly supported as AVI.

creating vfr video using the mp4 container using n-vops

To create vfr video in MP4: here you can find out how to make AVIs with n-vops, and here to transcode AVI to MP4 (MP4box or with 3ivx mp4 muxer) to create a true vfr MP4 stream (n-vop frames are dropped). GSpot can be used to check whether your AVI has n-vops frames (btw a clip with n-vops is still cfr). However with this procedure there is no garantee that all duplicated frames will be marked as n-vop frames; the codec either encodes extra frames to hit a required bitrate or misses near-dups.

This will not work with AVC as it does not support n-vops.

Opening hybrid video (non MPEG-2) in AviSynth and re-encoding

There is one way to open vfr video in AviSynth without loosing sync (using DirectShowSource). If you have vfr MKV content you will have an additional way to open it (using mkv2vfr to extract a cfr avi and a timecodes file, enabling you to re-encode to mkv after processing).

opening vfr content in AviSynth

It is possible to open vfr video in AviSynth (and keep the audio in sync). One of the ways to open it is to find out the average framerate (by dividing the total number of frames by the duration in seconds) and use this rate in DirectShowSource. Thus

av_rate = ...
DirectShowSource("F:\Guides\Hybrid\vfr.mp4", fps=av_rate, convertfps=true)

Depending on the duration of a frame, frames will be added or dropped to keep sync.

encoding to cfr video using DirectShowSource

Re-encoding to 23.976 or 29.97 fps:

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=29.97, convertfps=true) # or fps=23.976

or

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=119.88, convertfps=true)
FDecimate(29.97) # or FDecimate(23.976)

(I don't know which will produce better results in theory.) This works for MP4 too.

re-encoding vfr video using DirectShowSource

The description below is an adjusted copy of a description of HeadlessCow given in this thread.

0) Get vfrtools and install Java VM.

1) Grab the MultiDecimate filter and create an avs for the file you wish to encode. Make sure to trim off any junk frames you don't want before you call MultiDecimate in your script. You should end up with a file that looks something like this (no processing filters are necessary now):

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=119.88, convertfps=true) # or use AviSource for 120 fps avi
ConvertToYUY2() # as of writing this guide MultiDecimate requires YUY2
MultiDecimate(pass=1)

2) Open your avs in vdub and play through it exactly once, then close it. You should now have a mfile.txt file in the same directory as your avs script. It will look like this:

0 0.000000
1 0.000000
2 0.000000
3 0.000000
4 0.004137
5 0.000000
...

Each line contains a framenumber and the difference to the previous frame.

3) You're probably thinking, I know what's coming, run the MultiDecimate GUI...but no! This is where my program comes in.

Make sure you have a Java VM installed first. Open a dos prompt and pass vfrtools.jar the name of the mfile.txt file and the original fps of the file, probably 119.88 (after putting vfrtools.jar in the same path as your AviSynth script and mfile.txt).

java -jar vfrtools.jar mfile.txt 119.88

This will generate two files in your directory. The first is the dlist.txt file that MultiDecimate pass 2 will expect.

3 2 1 4550 970
0 0
1 4
2 8
3 12
4 16
5 20
...

The first line is bogus except for the frame counts, but the first 3 values are set to stuff that seemed "safe". (no crashes or other weirdness)

The second is the timecodes.txt which is in the format the mkvmerge expects.

# timecode format v1
Assume 119.88
0,297,29.97
298,968,23.976
969,969,39.96

Assume fps is set to the input framerate, which seems safe enough. From then on every frame in the file is part of some range and assigned an fps.

4) Switch to using the second pass of MultiDecimate and encode your video. Your avs should look something like this, but feel free to add any filtering you want to the end.

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=119.88, convertfps=true) # or use AviSource for 120 fps avi
ConvertToYUY2()
MultiDecimate(pass=2)

Yes, this output is gonna be at an odd fps. But it doesn't matter, we'll fix it in the muxing process.

5) Merge your video and any audio you want using mkvmerge. Just make sure you use the timecodes.txt file with your video. Make sure you use a version new enough to support this. 

Open mmg.exe which is a gui for mkvmerge: open your video, your audio (add a delay if necessary), the timecodes.txt file (which is possible if you click on the video clip) and start muxing.

To play it you need a Matroska splitter, which you can download here (Gabest) or here (Haali).

(You could also use DeDup for this, although I haven't tried it. I think it also works for YV12.)

encoding to vfr video (using MPEG-2)

http://forum.doom9.org/showthread.php?t=93691

I didn't look at it yet, so i can't give any comments/hints.

re-encoding MKV content in AviSynth using mkv2vfr

mkv2vfr extracts all video frames from Matroska to a cfr AVI file and a timecode file. You can extract video to avi, process it with any apps and mux back to matroska using a timecode file. Assuming you didn't add/remove frames you can use the same timecode file, and if you changed the the number of frames you'd need to edit the timecode file by hand. mkv2vfr is a command line application. Usage

mkv2vfr.exe xvid.mkv xvid.avi timecodes.txt

Discussion can be found here. Open the avi in AviSynth (which is 23.976 or 29.97 fps; see timecodes file):

AviSource("xvid.avi")

Instead of using the timecode file created by mkv2vfr you can also use the timecode file created by vfrtools (as explained in section 're-encoding vfr video').

Audio synchronization

Several methods are discussed to encode your video (at 23.976, 29.97 or vfr video). You might wonder why your audio stays in sync regardless of the method you used to encode your video. Prior to encoding, the video and audio have the same duration, so they start out in sync. The following two situations might occur:

If you encode the video stream at 23.976 or 29.97 fps (both cfr) by using Decimate(mode=3, threshold=1.0) or Decimate(mode=1, threshold=1.0), frames will be removed or added, and thus your audio stream will be in sync. By a similar reasoning the vfr encoding will be in sync.

Finally, suppose you open vfr video in AviSynth with DirectShowSource. Compare the following

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=29.97) # or fps=23.976

and

DirectShowSource("F:\Guides\Hybrid\vfr_startrek.mkv", fps=29.97, convertfps=true) # or fps=23.976

The former will be out of sync since the video is speeded up (or slowed down), and the latter will be in sync since frames are added to convert it to cfr.

References

Essential reading: Force Film, IVTC, and Deinterlacing and more (an article written by some people from at doom9).
Creating 120 fps video.
Documentation of Decomb521VFR.
About Decomb521VFR1.0 mod for automated Matroska VFR.
Mkvextract GUI by DarkDudae.

Besides all people who contributed to the tools mentioned in this guide, the author of this tutorial (Wilbert) would like to thank bond, manono and tritical for their useful suggestions and corrections of this tutorial.

$Date: 2005/10/03 16:31:31 $