Is ffmpeg's "Output Stream" framerate... wrong?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Is ffmpeg's "Output Stream" framerate... wrong?

roninpawn
I develop a Python application used to conduct official timing for
speedrunning leaderboards based on automated video analysis. And I've
caught a little oversight of my own that leads me to wonder if there isn't
an oversight at the core of ffmpeg in the 'fps' reported in the output
stream.

This relates to Variable Frame Rate video, so please try to hold your
shouts of "*JUST CONVERT IT TO CFR*" until after.

---

So, I'm opening a rawvideo pipe to ffmpeg in Python (using ffmpeg-python
0.2.0 by Karl Kroening as a command-line wrapper) to receive a bitstream of
the frames in a video for analysis. It's blazing fast, btw! Used to do this
with OpenCV and it was a slog. (Not to mention unable to transit the
"read-head" to the ACTUAL frame or time requested without just landing on a
near-ish keyframe instead.)

Before accessing the file, I call to ffprobe to get the 'r_frame_rate,' and
use those values to identify the frames per second of the footage. And
apparently, what 'r_frame_rate' returns is the "Output" stream fps. Which
is NOT the "Input" frames per second on VFR video.

Have a look at the Input data in my console for this VFR footage,
specifically the fps under Stream#0:0:


> *Input #0*, mov,mp4,m4a,3gp,3g2,mj2, from 'I:/Downloads/Medieval 111
> IGT.mp4':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 512
>     compatible_brands: isomiso2avc1mp41
>     encoder         : Lavf58.29.100
>   Duration: 00:01:23.94, start: 0.000000, bitrate: 2270 kb/s
>     *Stream #0:0*(und): Video: h264 (Main) (avc1 / 0x31637661),
> yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 2137 kb/s, *30.01 fps*,
> 30 tbr, 90k tbn, 60 tbc (default)


So the input is 30.01 fps. Cool. Now look at the Output stream reported.

*Output #0*, rawvideo, to 'pipe:':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 512
>     compatible_brands: isomiso2avc1mp41
>     encoder         : Lavf58.45.100
>     *Stream #0:0*: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 446x344
> [SAR 1:1 DAR 223:172], q=2-31, 110465 kb/s, *30 fps*, 30 tbn, 30 tbc
> (default)


It's just straight-up 30fps. Okay! Great. So, ffmpeg is automatically
converting this VFR footage to CFR and handing it back to me at 30.00fps.
...is what I had erroneously assumed.

(lot of "buts" coming)

But my application that times events detected in the frames of the footage
counted 2301 frames between two events. And then told me that 2301 / 30fps
is 1m 17s 700ms. Which is correct! AT 30FPS ANYWAY.

But when the footage is converted to CFR, or simply mathematically measured
at 30.0fps there are NOT 2301 frames between the two events. There are 2300
frames between those events. Because if you account for the .01 of
30.01fps, a frame must be dropped. 2301 frames / 30.01fps = 1m 17s 674ms.
And at a flat integer of 30fps, it's not 2301, but 2300 frames / 30fps that
gets you a 667ms time. As a matter of approximation, a frame must be
discarded to be as accurate as you can while squeezing the footage.

But all this is just backstory, because I now realize that ffmpeg is NOT
altering the footage of VFR in any way, which makes the MOST sense. It's
just handing me back a bitstream of every frame in the video. Perfectly
sensible!

But while it's handing me back every frame of the 30.01fps media in the
output stream, it's also identifying that output as a flat 30fps. Which it
IS NOT. As ffmpeg clearly reports in the input stream's console log. And
yet more confusing, it seems that ffprobe's 'r_frame_rate' is returning the
output frame rate of 30 / 1, instead of whatever values it used to come up
with the 30.01 it declares for the input at runtime.

To deepen the quagmire, I have pushed VFR video through this application
before that averaged 59.96fps. And ffprobe / ffmpeg have reported back to
me a frame rate of 59.94fps instead -- snapping into the common NTSC frame
base of US-broadcast 60fps TV.

So I think, for the first time, I understand the information ffmpeg is
giving me. That's good! But I find myself wondering why this would possibly
be the intended behavior? The "output" frame rate and the response from
ffprobe is automatically snapped to the nearest of an internally stored
list of industry-standard framebases and reported as the fps when it is
definitely not. There would be no need to "snap" to these values if they
were correct. And because the output tends to report round integer values
of decisive framerates, it led me to the conclusion that ffmpeg was doing
some automatic magic on VFR to present it back to me as CFR. Which it is
not doing.

AND AND, because the frame rate ffprobe returns is this willfully incorrect
industry-standard framebase data, any calculations done USING that value
will be decidedly more wrong than they would be when using the VFR frame
rate echoed in the INPUT stream. So my question is:

Why?

What is the virtue or benefit of this mis-reporting behavior of the
output-frame rate? If it's a convenience feature to report the nearest
industry standard frame rate without another developer having to maintain
their own list, that's a good idea! But I can't see why it would be
reported as the output frames per second on console, when the output
stream's frames are being delivered at a different frame rate. The console
representation led me, as a developer, to draw the most obvious conclusion
from an application I trust: That if ffmpeg is telling me this output
stream is 30fps, the output frames of this stream must be at 30fps. I
assumed there was some automatic transcoding applied to VFR, or internal
frame-drop / add happening to conform the input to the figure presented on
the console. And I scratched my head for a long time wondering why I never
got a float value back from 'r_frame_rate' when handing it VFR.

So, I'm asking for confirmation that I'm understanding this right. And it's
not a rhetorical question when I ask why ffmpeg intentionally mis-reports
the best known frame rate figure in the output stream? Is there a reason?
Does it make more sense when you aren't pulling a raw bytestream or
something? Also, I presume I'll find in the documentation a value other
than 'r_frame_rate' where I can poll the actual frame rate - the INPUT
frame rate - instead of this snapped and conformed one. Feel free to save
me the search if you know, though!

Thanks in advance for replies.
-Roninpawn
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Is ffmpeg's "Output Stream" framerate... wrong?

Gyan Doshi-2


On 2021-03-16 07:21, roninpawn wrote:

> I develop a Python application used to conduct official timing for
> speedrunning leaderboards based on automated video analysis. And I've
> caught a little oversight of my own that leads me to wonder if there isn't
> an oversight at the core of ffmpeg in the 'fps' reported in the output
> stream.
>
> This relates to Variable Frame Rate video, so please try to hold your
> shouts of "*JUST CONVERT IT TO CFR*" until after.
>
> ---
>
> So, I'm opening a rawvideo pipe to ffmpeg in Python (using ffmpeg-python
> 0.2.0 by Karl Kroening as a command-line wrapper) to receive a bitstream of
> the frames in a video for analysis. It's blazing fast, btw! Used to do this
> with OpenCV and it was a slog. (Not to mention unable to transit the
> "read-head" to the ACTUAL frame or time requested without just landing on a
> near-ish keyframe instead.)
>
> Before accessing the file, I call to ffprobe to get the 'r_frame_rate,' and
> use those values to identify the frames per second of the footage. And
> apparently, what 'r_frame_rate' returns is the "Output" stream fps. Which
> is NOT the "Input" frames per second on VFR video.
>
> Have a look at the Input data in my console for this VFR footage,
> specifically the fps under Stream#0:0:
>
>
>> *Input #0*, mov,mp4,m4a,3gp,3g2,mj2, from 'I:/Downloads/Medieval 111
>> IGT.mp4':
>>    Metadata:
>>      major_brand     : isom
>>      minor_version   : 512
>>      compatible_brands: isomiso2avc1mp41
>>      encoder         : Lavf58.29.100
>>    Duration: 00:01:23.94, start: 0.000000, bitrate: 2270 kb/s
>>      *Stream #0:0*(und): Video: h264 (Main) (avc1 / 0x31637661),
>> yuv420p(tv, bt709), 1280x720 [SAR 1:1 DAR 16:9], 2137 kb/s, *30.01 fps*,
>> 30 tbr, 90k tbn, 60 tbc (default)
>
> So the input is 30.01 fps. Cool. Now look at the Output stream reported.
>
> *Output #0*, rawvideo, to 'pipe:':
>>    Metadata:
>>      major_brand     : isom
>>      minor_version   : 512
>>      compatible_brands: isomiso2avc1mp41
>>      encoder         : Lavf58.45.100
>>      *Stream #0:0*: Video: rawvideo (BGR[24] / 0x18524742), bgr24, 446x344
>> [SAR 1:1 DAR 223:172], q=2-31, 110465 kb/s, *30 fps*, 30 tbn, 30 tbc
>> (default)
>
> It's just straight-up 30fps. Okay! Great. So, ffmpeg is automatically
> converting this VFR footage to CFR and handing it back to me at 30.00fps.
> ...is what I had erroneously assumed.
>
> (lot of "buts" coming)
>
> But my application that times events detected in the frames of the footage
> counted 2301 frames between two events. And then told me that 2301 / 30fps
> is 1m 17s 700ms. Which is correct! AT 30FPS ANYWAY.
>
> But when the footage is converted to CFR, or simply mathematically measured
> at 30.0fps there are NOT 2301 frames between the two events. There are 2300
> frames between those events. Because if you account for the .01 of
> 30.01fps, a frame must be dropped. 2301 frames / 30.01fps = 1m 17s 674ms.
> And at a flat integer of 30fps, it's not 2301, but 2300 frames / 30fps that
> gets you a 667ms time. As a matter of approximation, a frame must be
> discarded to be as accurate as you can while squeezing the footage.
>
> But all this is just backstory, because I now realize that ffmpeg is NOT
> altering the footage of VFR in any way, which makes the MOST sense. It's
> just handing me back a bitstream of every frame in the video. Perfectly
> sensible!
>
> But while it's handing me back every frame of the 30.01fps media in the
> output stream, it's also identifying that output as a flat 30fps. Which it
> IS NOT. As ffmpeg clearly reports in the input stream's console log. And
> yet more confusing, it seems that ffprobe's 'r_frame_rate' is returning the
> output frame rate of 30 / 1, instead of whatever values it used to come up
> with the 30.01 it declares for the input at runtime.
>
> To deepen the quagmire, I have pushed VFR video through this application
> before that averaged 59.96fps. And ffprobe / ffmpeg have reported back to
> me a frame rate of 59.94fps instead -- snapping into the common NTSC frame
> base of US-broadcast 60fps TV.
>
> So I think, for the first time, I understand the information ffmpeg is
> giving me. That's good! But I find myself wondering why this would possibly
> be the intended behavior? The "output" frame rate and the response from
> ffprobe is automatically snapped to the nearest of an internally stored
> list of industry-standard framebases and reported as the fps when it is
> definitely not. There would be no need to "snap" to these values if they
> were correct. And because the output tends to report round integer values
> of decisive framerates, it led me to the conclusion that ffmpeg was doing
> some automatic magic on VFR to present it back to me as CFR. Which it is
> not doing.
>
> AND AND, because the frame rate ffprobe returns is this willfully incorrect
> industry-standard framebase data, any calculations done USING that value
> will be decidedly more wrong than they would be when using the VFR frame
> rate echoed in the INPUT stream. So my question is:
>
> Why?

Timestamps are expressed in units of time_base. The reciprocal of the
time_base is reported as the tbn value for that stream. Your input has a
tbn of 90000; the output actually has no timestamps (rawvideo) but for
processing, the timebase is assigned as the reciprocal of the output
framerate, here 30. Imagine two stopwatches, one with millisecond
display and one limited to seconds. The average lap time calculated is
liable to differ.  Same here, since the timekeeping resolution is much
lower, the output (nominal) timestamps shift a bit, affecting the fps as
well.

Regards,
Gyan
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Is ffmpeg's "Output Stream" framerate... wrong?

Carl Zwanzig
In reply to this post by roninpawn
On 3/15/2021 6:51 PM, roninpawn wrote:
> I develop a Python application used to conduct official timing for
> speedrunning leaderboards based on automated video analysis.

Instead of relying on output frame rate, have you considered using the
Presentation Time Stamps (PTS) of the input? Those more accurately direct
when a frame should be displayed (presented) so the timing ought to be more
accurate.

(There was much discussion about PTS on this list recently.)

z!
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Is ffmpeg's "Output Stream" framerate... wrong?

roninpawn
My fix was - as expected - simple as finding the correct variable to poll
with ffprobe:

*'r_frame_rate' is "the lowest framerate with which all timestamps can be
> represented accurately (it is the least common multiple of all framerates
> in the stream)."*

*'avg_frame_rate' is just that: total duration / total # of frames*


Switching to 'avg_frame_rate' gets me the correct rate to calculate against
- bing, bosh. VFR timing works as expected.

---
But I am still trying to insinuate here that the console output of ffmpeg,
in so much as I understand it, is just plain *INCORRECT*. The output stream
in this instance is 'rawvideo.' There is no frame drop/duplication enabled
or happening as far as I know -- and the frame count of the test I
originally described in this thread supports that. The source footage is
VFR with an average frame rate that calculates to 30.011923688394276fps. When
I open a stream to ffmpeg, what I receive from the output stream is every
single one of that source's frames. Literally nothing has changed from
input to output in this operation to alter the frame base/rate or number of
frames I will receive. (as far as I am aware)

Despite the equity of INPUT and OUTPUT here, the output stream is loudly
declaring itself 30fps. *Which it is not*. That's just not at all correct.
And the value '30fps' is occupying the same position in the stream metadata
where the source input frame rate was displayed. Which communicates clearly
that on the input, this was 30.01fps -- but the output is now 30fps flat. A
purely FALSE declaration that misled me into thinking that both
'r_frame_rate' and the actual output stream were converting VFR to the
nearest standard timebase by some internal magic. Me, trusting the console
output, resulted in a bug in my code and subsequent timing inaccuracies...
because the console output is not correct.

All that said... After glancing at some superuser threads it looks like
this output stream frame rate is accurate / helpful in other circumstances.
Like when altering the frame rate of the output with vsync declarations, or
when full-on transcoding. And it wouldn't make any difference when you've
got a CFR source at an industry-standard frame base. But in this use case
of rawvideo output from a VFR source, that fps figure is nothing but a *wrong
number* in an important place, misleading anyone who looks at it. Which, to
my mind, makes it a bug that wants some error-trapping.

Am I wrong?
-Roninpawn

On Tue, Mar 16, 2021 at 12:56 PM Carl Zwanzig <[hidden email]> wrote:

> On 3/15/2021 6:51 PM, roninpawn wrote:
> > I develop a Python application used to conduct official timing for
> > speedrunning leaderboards based on automated video analysis.
>
> Instead of relying on output frame rate, have you considered using the
> Presentation Time Stamps (PTS) of the input? Those more accurately direct
> when a frame should be displayed (presented) so the timing ought to be
> more
> accurate.
>
> (There was much discussion about PTS on this list recently.)
>
> z!
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".



--
Roninpawn on YouTube
(http://www.youtube.com/user/roninpawn
<http://www.youtube.com/user/roninpawn>)
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".