Unable to sync audio and video

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Unable to sync audio and video

Livio Tenze
Dear group,

I am trying to write a code to encode (in realtime) a stream with audio and
video in Linux environment.
-The video source is read with av_read_frame and the /dev/videoX is opened.
The pts values with respect to the first packed seem to be reasonable.
-The audio source comes from alsa: here I see really strange behaviour. The
pts value of the second packet with respect to the first pts packet seems
to be "delayed" from 0.4s to 1.8s. I don't understand this behaviour: I
checked the time from start to the second packet and the elapsed time
cannot be justified (ms and not s).
-Finally, when I write audio and video in the output MP4 stream (I use
libx264 and libfdk_aac for video and audio respectively) the audio stream
is delayed.

I don't know what to check. Please give me suggestions to solve this issue.
What can I check?

Thanks in advance and best regards.
Livius
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Nicolas George
Livio Tenze (12021-02-15):
> -The audio source comes from alsa: here I see really strange behaviour. The
> pts value of the second packet with respect to the first pts packet seems
> to be "delayed" from 0.4s to 1.8s. I don't understand this behaviour: I
> checked the time from start to the second packet and the elapsed time
> cannot be justified (ms and not s).

Can you observe the same phenomenon using ffprobe?

ffprobe -f alsa -i default -show_packets -of compact | head -n 50 | less -S

would be a good way to look.

Do you observe warnings on the console?

How does your application handle parallelism between encoding and
reading from devices?

> -Finally, when I write audio and video in the output MP4 stream (I use
> libx264 and libfdk_aac for video and audio respectively) the audio stream
> is delayed.

Do the timestamps between audio and video match before encoding?

Regards,

--
  Nicolas George

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Livio Tenze
Thank you for your answer.

On Mon, Feb 15, 2021 at 7:13 PM Nicolas George <[hidden email]> wrote:

> Livio Tenze (12021-02-15):
> > -The audio source comes from alsa: here I see really strange behaviour.
> The
> > pts value of the second packet with respect to the first pts packet seems
> > to be "delayed" from 0.4s to 1.8s. I don't understand this behaviour: I
> > checked the time from start to the second packet and the elapsed time
> > cannot be justified (ms and not s).
>
> Can you observe the same phenomenon using ffprobe?
>
> ffprobe -f alsa -i default -show_packets -of compact | head -n 50 | less -S
>
> would be a good way to look.
>
> Do you observe warnings on the console?
>

Ok, I will check.

>
> How does your application handle parallelism between encoding and
> reading from devices?
>

At the moment I have only one thread (the main one, as in doc/examples).
The process I implemented is the following:
0) Initialize PTS output frame (one PTS for video and another for audio)
1) check PTS difference between PTS video and audio
2) if ptsvideo>ptsaudio then decode one video frame, and encode it in the
output stream
    otherwise decode audio frame, and it in the output stream.
3) go to point 1

This is a chunk of my code.

AVRational video_timebase = out_video_stream->time_base;

        AVRational audio_timebase = out_audio_stream->time_base;

        if (av_compare_ts(audio_pts, audio_timebase,

                          video_pts, video_timebase) < 0) {


            if (decodeAudio(picture))

                encode_audio(picture, audio_pts, out_audio_stream);

        }

        else {

            // https://stackoverflow.com/questions/49280566/ffmpeg-c-convert-compress-a-single-image-out-of-buffer

            // https://stackoverflow.com/questions/49446335/ffmpeg-h264-encode-each-single-image

            if (decodeVideo(picture, (picture2) ? &picture2 : NULL))

                encode_video(picture, picture2, video_pts, out_video_stream);

        }



> > -Finally, when I write audio and video in the output MP4 stream (I use
> > libx264 and libfdk_aac for video and audio respectively) the audio stream
> > is delayed.
>
> Do the timestamps between audio and video match before encoding?
>

Yes, please check above. I use two PTS values: one for audio and one for
video. The PTS from the input stream differ from the ones of the output
stream. I check the distance between the first packet.pts and the current:
here I saw that strange behaviour where the audio elapsed time is very high
with respect to the video input stream.
Do you think that this behaviour is due to missing multithread? I found the
above flowchart in doc/examples where no threads are used.

>
> Regards,
>
> --
>   Nicolas George
>

Many thanks for your help.
Livius


> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Nicolas George
Livio Tenze (12021-02-16):
> At the moment I have only one thread (the main one, as in doc/examples).
> The process I implemented is the following:
> 0) Initialize PTS output frame (one PTS for video and another for audio)
> 1) check PTS difference between PTS video and audio
> 2) if ptsvideo>ptsaudio then decode one video frame, and encode it in the
> output stream
>     otherwise decode audio frame, and it in the output stream.
> 3) go to point 1

That should work, provided you checked that your timestamps relate to
the same origin. If some timestamps relate to the system boot and some
to 1970-01-01, you will get a desync.

Plus, if the capture did not start at the same time, you will get extra
frames at the beginning of a stream, and it is possible that some
players will not catch up or catch up slowly. It would probably be more
reliable to discard frames captured before the first frame of the other
stream.

> > Do the timestamps between audio and video match before encoding?
> Yes, please check above. I use two PTS values: one for audio and one for
> video. The PTS from the input stream differ from the ones of the output
> stream. I check the distance between the first packet.pts and the current:
> here I saw that strange behaviour where the audio elapsed time is very high
> with respect to the video input stream.

All you talk here is timestamps consistency within each stream. I was
asking about timestamps consistency between the streams.

> Do you think that this behaviour is due to missing multithread? I found the

Yes. Without parallelism for the capture, the codec initialization could
take so much time as to cause a buffer overrun in one of the capture
drivers.

> above flowchart in doc/examples where no threads are used.

It is entirely possible this flowchart is not up to date or chose to
gloss over implementations details that are not deemed relevant for the
information expressed. The command line tool ffmpeg uses threads for
inputs as soon as there are more than one.

Regards,

--
  Nicolas George

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Livio Tenze
On Tue, Feb 16, 2021 at 11:09 AM Nicolas George <[hidden email]> wrote:

> Livio Tenze (12021-02-16):
> > At the moment I have only one thread (the main one, as in doc/examples).
> > The process I implemented is the following:
> > 0) Initialize PTS output frame (one PTS for video and another for audio)
> > 1) check PTS difference between PTS video and audio
> > 2) if ptsvideo>ptsaudio then decode one video frame, and encode it in the
> > output stream
> >     otherwise decode audio frame, and it in the output stream.
> > 3) go to point 1
>
> That should work, provided you checked that your timestamps relate to
> the same origin. If some timestamps relate to the system boot and some
> to 1970-01-01, you will get a desync.
>

The timestamp I am currently using is related to the pts obtained from the
AVPacket packets: I use the first PTS packet as reference. Is it a right
approach for syncing?


> Plus, if the capture did not start at the same time, you will get extra
> frames at the beginning of a stream, and it is possible that some
> players will not catch up or catch up slowly. It would probably be more
> reliable to discard frames captured before the first frame of the other
> stream.
>

I haven't found info about this issue: does the av_read_frame call return
always the latest acquired packet or does it return a buffered packet? I
haven't found this info. The question is related to real-time acquisition.

>
>
> > Do you think that this behaviour is due to missing multithread? I found
> the
>
> Yes. Without parallelism for the capture, the codec initialization could
> take so much time as to cause a buffer overrun in one of the capture
> drivers.
>
> Ok, thank you for this suggestion. Do you suggest to use one thread for
every source and one thread for encoding? Is it a good approach in your
opinion?

>
> Regards,
>
> Thanks a lot!
Livius

>
>
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Nicolas George
Livio Tenze (12021-02-17):
> > That should work, provided you checked that your timestamps relate to
> > the same origin. If some timestamps relate to the system boot and some
> > to 1970-01-01, you will get a desync.
>
> The timestamp I am currently using is related to the pts obtained from the
> AVPacket packets: I use the first PTS packet as reference. Is it a right
> approach for syncing?

It is the only right approach. But you have not answered the question:
have you checked that the timestamps of both streams are relative to the
same origin?

> I haven't found info about this issue: does the av_read_frame call return
> always the latest acquired packet or does it return a buffered packet? I
> haven't found this info. The question is related to real-time acquisition.

av_read_frame() will not skip packets. A device driver may skip data,
but you should try to avoid it at all costs.

> Ok, thank you for this suggestion. Do you suggest to use one thread for
> every source and one thread for encoding? Is it a good approach in your
> opinion?

For devices, running each in its own thread is probably the most
reliable solution. At least until we have a proper event loop.

Regards,

--
  Nicolas George

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Livio Tenze
Hi

On Wed, Feb 17, 2021 at 12:44 PM Nicolas George <[hidden email]> wrote:

> Livio Tenze (12021-02-17):
> > > That should work, provided you checked that your timestamps relate to
> > > the same origin. If some timestamps relate to the system boot and some
> > > to 1970-01-01, you will get a desync.
> >
> > The timestamp I am currently using is related to the pts obtained from
> the
> > AVPacket packets: I use the first PTS packet as reference. Is it a right
> > approach for syncing?
>
> It is the only right approach. But you have not answered the question:
> have you checked that the timestamps of both streams are relative to the
> same origin?
>

No, it is not, because the audio and the video streams come from two
different sources: one webcam and an external microphone. The starting PTS
values are different for two audio and video sources. How should I  get the
same origin with this configuration? Please suggest how to treat this case.

>
> > I haven't found info about this issue: does the av_read_frame call return
> > always the latest acquired packet or does it return a buffered packet? I
> > haven't found this info. The question is related to real-time
> acquisition.
>
> av_read_frame() will not skip packets. A device driver may skip data,
> but you should try to avoid it at all costs.
>

Ok!

>
> > Ok, thank you for this suggestion. Do you suggest to use one thread for
> > every source and one thread for encoding? Is it a good approach in your
> > opinion?
>
> For devices, running each in its own thread is probably the most
> reliable solution. At least until we have a proper event loop.
>

Ok!

Thanks again!
Livius
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Nicolas George
Livio Tenze (12021-02-17):
> No, it is not, because the audio and the video streams come from two
> different sources: one webcam and an external microphone. The starting PTS

Different sources do not mean different timestamps origins. In fact, you
NEED the same timestamps origin if you want to sync.

> values are different for two audio and video sources. How should I  get the
> same origin with this configuration? Please suggest how to treat this case.

Ideally, the documentation of the device you are using should be stating
it. For example:

http://ffmpeg.org/ffmpeg-all.html#video4linux2_002c-v4l2

"Depending on the kernel version and configuration, the timestamps may
be derived from the real time clock (origin at the Unix Epoch) or the
monotonic clock (origin usually at boot time, unaffected by NTP or
manual changes to the clock)."

Unfortunately this is rather the exception than the norm.

If the documentation does not say it, then you need to experiment:
actually look at the timestamps, look at at the actual time when you
capture, do some arithmetic to find the origin, and try to guess what it
corresponds to.

And then propose a patch to document the origin of timestamps for that
particular device.

Regards,

--
  Nicolas George

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Livio Tenze
Dear Nicolas,

I am following your suggestion. I added one thread to acquire the audio
stream. Currently I have only the thread for the audio stream; I will
implement threads also for video acquisition.
The see a strange behaviour:
===============================================
Crop frame system is 1
Dummy read one frame
VIDEO PTS START from video1
AUDIO: void AudioInputThread::startLoop()
AUDIO: audioThread started NOW!
PTS START video: 83565572 audio: -1
stream v:1/15360 a:1/22050
AUDIO: audio: 0.000 0.185759 = 0.186 1614099985303395
set record for 15.00 minutes
AUDIO: head 0, tail 1
start recording
AUDIO: audio: 1.840 0.185759 = 2.026 1614099987143524
AUDIO: head 0, tail 2
AUDIO: audio: 2.068 0.185759 = 2.253 1614099987371002
AUDIO: head 0, tail 3
AUDIO: audio: 1.735 0.185759 = 1.920 1614099987037933
AUDIO: head 0, tail 4
AUDIO: audio: 1.476 0.185759 = 1.662 1614099986779509
AUDIO: head 0, tail 5
AUDIO: audio: 1.293 0.185759 = 1.479 1614099986596620
AUDIO: head 0, tail 6
AUDIO: audio: 1.116 0.185759 = 1.302 1614099986419398
AUDIO: head 0, tail 7
video1: 0.032 83597620 33333
>>> video timer0 0
>>> video timer1 20
===============================================
Please, take into account lines starting with "AUDIO: " tag.
This is a log of my application. I notice that the audio estimated position
(it is estimated taking into account the first received pts) starts
increasing, then decreases, and, after some packets, the behaviour is
monotonic and right. Moreover, the distance from the starting PTS and the
second one is almost 2 seconds.
Have you any suggestions about this strange behaviour?

Thanks again.
Livius

On Wed, Feb 17, 2021 at 2:20 PM Nicolas George <[hidden email]> wrote:

> Livio Tenze (12021-02-17):
> > No, it is not, because the audio and the video streams come from two
> > different sources: one webcam and an external microphone. The starting
> PTS
>
> Different sources do not mean different timestamps origins. In fact, you
> NEED the same timestamps origin if you want to sync.
>
> > values are different for two audio and video sources. How should I  get
> the
> > same origin with this configuration? Please suggest how to treat this
> case.
>
> Ideally, the documentation of the device you are using should be stating
> it. For example:
>
> http://ffmpeg.org/ffmpeg-all.html#video4linux2_002c-v4l2
>
> "Depending on the kernel version and configuration, the timestamps may
> be derived from the real time clock (origin at the Unix Epoch) or the
> monotonic clock (origin usually at boot time, unaffected by NTP or
> manual changes to the clock)."
>
> Unfortunately this is rather the exception than the norm.
>
> If the documentation does not say it, then you need to experiment:
> actually look at the timestamps, look at at the actual time when you
> capture, do some arithmetic to find the origin, and try to guess what it
> corresponds to.
>
> And then propose a patch to document the origin of timestamps for that
> particular device.
>
> Regards,
>
> --
>   Nicolas George
> _______________________________________________
> ffmpeg-user mailing list
> [hidden email]
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> [hidden email] with subject "unsubscribe".
_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: Unable to sync audio and video

Nicolas George
Livio Tenze (12021-02-23):

> I am following your suggestion. I added one thread to acquire the audio
> stream. Currently I have only the thread for the audio stream; I will
> implement threads also for video acquisition.
> The see a strange behaviour:
> ===============================================
> Crop frame system is 1
> Dummy read one frame
> VIDEO PTS START from video1
> AUDIO: void AudioInputThread::startLoop()
> AUDIO: audioThread started NOW!
> PTS START video: 83565572 audio: -1
> stream v:1/15360 a:1/22050
> AUDIO: audio: 0.000 0.185759 = 0.186 1614099985303395
> set record for 15.00 minutes
> AUDIO: head 0, tail 1
> start recording
> AUDIO: audio: 1.840 0.185759 = 2.026 1614099987143524
> AUDIO: head 0, tail 2
> AUDIO: audio: 2.068 0.185759 = 2.253 1614099987371002
> AUDIO: head 0, tail 3
> AUDIO: audio: 1.735 0.185759 = 1.920 1614099987037933
> AUDIO: head 0, tail 4
> AUDIO: audio: 1.476 0.185759 = 1.662 1614099986779509
> AUDIO: head 0, tail 5
> AUDIO: audio: 1.293 0.185759 = 1.479 1614099986596620
> AUDIO: head 0, tail 6
> AUDIO: audio: 1.116 0.185759 = 1.302 1614099986419398
> AUDIO: head 0, tail 7
> video1: 0.032 83597620 33333
> >>> video timer0 0
> >>> video timer1 20
> ===============================================
> Please, take into account lines starting with "AUDIO: " tag.
> This is a log of my application. I notice that the audio estimated position
> (it is estimated taking into account the first received pts) starts
> increasing, then decreases, and, after some packets, the behaviour is
> monotonic and right. Moreover, the distance from the starting PTS and the
> second one is almost 2 seconds.
> Have you any suggestions about this strange behaviour?
I am sorry, but I do not think I can help with that: I have no idea what
these numbers are supposed to mean.

In the meantime, I notice that you have not checked if you had buffer
overruns in ALSA, and that you have not determined the origin of your
timestamps.

Regards,

--
  Nicolas George

_______________________________________________
ffmpeg-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".

signature.asc (849 bytes) Download Attachment