FVL — Fourier Video Loop
♦ seamlessly loop frame sequence by cutting spectrum of per-pixel transformation over time ♦
Each example video is a concatenation of 2 copies of certain frame sequence, so that the «seam», or «rapture», is exactly at the middle of the time scale. Original videos were taken from videos.pexels.com.
Now, let's try it on video with almost no initial periodicity and with high range of colours.
Given a channel (R, G, B) of a pixel, its values at consequent frames of the sequence make the function, also called a signal, f(t), which is approximated by the partial sum of its Fourier series
f(t) = a0/2 + ∑k=1n [ akcos(kt) + bksin(kt) ]
where the time t = tj = 2πj/SignalLength belongs to the [0; 2π) interval and, as well as the signal f(t), is discrete. As usual, n = SignalLength/2 and SignalLength = 2r. Using the well-known Direct/Inverse Fast Fourier Transform, we switch between the signal and its spectrum, consisting of ak and bk coefficients. Then we nullify or reduce some of them, in particular those corresponding to the high frequencies, to decrease the «leap» of f(t) at 0, which causes the «seam» of the looped video.
Filtering the spectrum of the time series of pixel values is a well-known approach to video processing and transformation, see e.g. [Fuchs2010]. It is also used to amplify subtle motions, which are hard to perceive by the naked eye, see [Wu2012], [Wadhwa2013].
Speaking of looping, we consider this approach to be «intermediate» in quality and simplicity between the classic «cut video in 2 halves and append the 1st one to the 2nd one with small crossfaded overlap» and more sophisticated methods like [Liao2013], [Liao2015], [Tralie2018].
The advantage of this approach becomes its disadvantage from the other point of view. The classic method produces the video where the time interval with overlap may be distinguishable from the rest of the video, because visual perception tends to separate two «layers» being crossfaded. In constrast, the spectrum cut transformation affects the whole video, and makes it harder to recognize the moment when the period ends. However, as you can see in the examples above, this transformation introduces blurring, echo of some moving objects, and loss of «high frequency» details, making the video more static in comparison to the original one.
Run FVL binary without any options to get the most essential parts of what is given below.
FVL works with the frame images in lossless .png format, not with the video files (.mp4, .avi etc.) Therefore you need first to extract the frames from the video, and then make the video from the transformed frames. For this purpose use e.g. ffmpeg. From ffmpeg's manual:
Extract frames from video (fragment's start is 2nd sec and duration is 10 sec):
$ ffmpeg -ss 2 -t 10 -i /path/to/movie.mp4 -s 720x408 /path/to/frames/%06d.png
(here you apply FVL)
Make video from frames:
$ ffmpeg -framerate 24 -i /path/to/frames/%06d.png -pix_fmt yuv420p -crf 21 /path/to/fragment.mp4
As for FVL itself, the syntax of its command-line usage is:
$ ./fvl /path/to/in /path/to/out --tile <size> --cut <method> <param> --[no]delin
/path/to/in dir contains input images: 000001.png, 000002.png, etc. Their number must be a power of 2.
/path/to/out dir will contain output images with the same enumeration.
--tile (or -t) option specifies tile size: the frame rectangle in processed tile by tile of no bigger than this size, to reduce memory consumption.
--cut (or -c) is followed by cutting method and its parameter. Possible arguments are described further.
--[no]delin (or -[n]dl) switches delinearization (see further).
Input and output paths are mandatory, the rest of the options may be omitted, in which case they are set to defaults: --tile 256 --cut acmax 1.5 --delin.
Spectrum cut methods
All methods can be preceded by delinearization, which is subtracting the linear component K × t from the signal. K is estimated as in linear regression. This should improve the result e.g. when the pixels are becoming increasingly brighter/darker throughout the sequence. To [not] delinearize, provide --[no]delin option.
Each method has a single parameter, denoted by p.
• high: nullify all frequencies higher than p × HighestFrequency, where HighestFrequency = SignalLength / 2
• all: nullify all frequencies whose Amplitude < p × MaxAmplitude
• up: start from HighestFrequency and nullify them until the one whose Amplitude ≥ p × MaxAmplitude
• logall: similar to all, but compares log(1 + Amplitude2)
• logup: similar to up, but compares log(1 + Amplitude2)
• acmax: multiply Amplitude of each frequency by (Amplitude / MaxAmplitude)p
• dzhigh: start from HighestFrequency and nullify until |Derivative at 0| < p
• dzacmax: use minimal power s such that |Derivative at 0| < p after acmax-transform with p ← s
Don't hesitate, grab the source from Download and add more efficient cut methods you are presumably aware of (see spectrumcut.go). For instance, make the cutting more synchronous for 3 pixel channels.
$ ./fvl ~/test/in ~/test/out -t 408 -c acmax 1.5 -dl
$ ./fvl ~/test/in ~/test/out -t 408 -c dzhigh 20 -dl
$ ./fvl ~/test/in ~/test/out -t 320 -c up 0.6 -ndl
Probably the most annoying limitation of the current implementation is that the number of frames must be an exact power of 2, e.g. 128, 256, 512.
Commonly for the «periodicity amplification» algorithms aimed at structure preservation, it is not enough to have just something to amplify in the first place. The input frame sequence should be periodic enough, on its own, where the periodicity needs to be; otherwise, as the 2nd example above implies, the structure may be lost. Typical examples are water streams/waves/ripples, swaying of leaves and branches of trees, and other kinds of small oscillations.
See Videos for more examples, revealing other limitations. In general, you have to find the equilibrium between retaining too noticeable seam and losing too many motion details.
The amount of memory consumed by the running instance of FVL in bytes should be approximately
Memory = TileWidth × TileHeight × Channels × NumberOfFrames
but it can occupy 2–3 times more than that due to the supplemental data structures and the behaviour of garbage collector.
All stages of the process are parallelized, thus the more CPU cores you have, the faster it comes to an end. Just beware of overheat to make sure that «it» in the previous sentence means the process, not the CPU...
References & Links
[Fuchs2010] M. Fuchs, T. Chen, O. Wang, R. Raskar, H.-P. Seidel, H.P.A. Lensch: Real-time temporal shaping of high-speed video streams, 2010, Comp. Graph. 34(5):575–584.
[Liao2013] Z. Liao, N. Joshi, H. Hoppe: Automated video looping with progressive dynamism, 2013, ACM Trans. Graph., 32(4):77:1–77:10.
[Liao2015] J. Liao, M. Finch, H. Hoppe: Fast computation of seamless video loops, 2015, ACM Trans. Graph. 34(6):197:1–197:10.
[Tralie2018] C.J. Tralie, M. Berger: Topological Eulerian Synthesis of Slow Motion Periodic Videos, 2018.
[Wadhwa2013] N. Wadhwa, M. Rubinstein, F. Durand, W.T. Freeman: Phase-Based Video Motion Processing, 2013, ACM Trans. Graph. 32(4), Article 80.
[Wu2012] H.-Y. Wu, M. Rubinstein, E. Shih, J. Guttag, F. Durand, W. Freeman: Eulerian Video Magnification for Revealing Subtle Changes in the World, 2012, ACM Trans. Graph. 31(4): 1–8.
Copyright © 2014–2019 Sunkware
This site gathers statistics with StatCounter