ffmpeg basic concept

Xiao Qiang Lv4

ffmpeg java demo

  • What is ffmpeg
  • High frequency using library
    • libavcodec, contains all encoders and decoders, which is used to encode or decode the resource you input
    • libavformat, deal with the muxers or demuxers for various container. container is a file which is used to save your video, audio streams, and anyclose what is muxer and demuxer in pic?
    • libavfilter, used to modify audio or video
    • libavdevice, used to support different input and output devices
    • libavutil,
    • libwscale,
    • libswreasample
  • ffmpeg package tools
    • ffmpeg, itself is a command line tool which is used to process the video or audio resource
    • ffplay, which is a minimal video player, you can use it play a large wide of video format
    • ffprobe, which is used to extract video or audio information

Media Concepts

  • image
    • pixel, is a 2D point, has color RGB or YUV, alpha value transparency
    • resolution, the quality of the image of the number of pixel of the image
    • Aspect Ratio, the radio of the width and height
  • Audio
    • smallest digital representation of a sound
    • 8/16/24/32 bit value, more bits means high quality video source
    • audio frequency, how many samples during the per second, the higher, the higher quality
    • audio channel, which means that several audios will be played together
      • for example, one audio channel is named ‘Mono’, two is named ‘Stereo’ and both of them are called channel layouts
    • audio track, organize different audio channels
      • for example, an audio can obtain multilingual channel, an English channel or a French chanel, and also it can take a background chanel, some of them can be mixed
  • Video
    • sequence of images, has the time duration, Each image is called a frame
    • Video frame rate(FPS), Frame per second
    • video compression, not very clear about the concept? remove redundant information, Spatial redundancy (within frame), Temporal redundancy (across frame). How to make it?
  • Codec & Container
    the overview of Codec and Container
    • Codec
      • encode, because video is a very big file, if we transfer it directly, which is an impractical idea. and decode is used to play the video
      • some encode algorithm
        • video, H.264, H.265, VP9, Prores, DNxHd,
        • audio, PCM, AAC, MP3
        • the benefits of different codec algorithm
    • Container
      • use to package or wrapper a media file
      • format
        • video, MP4, MXF, QT/MOV, MKV
        • audio, WAV, M4A
  • Transcoding
    • transfer one codec to another one
    • why we need transcode, because it depends on different situation, sometimes, we need take care of quality, being smaller during the transporting in the internet
    • using scenario
      • Transmuxing, From one container to another
      • thumbnail generation, preview image, search hit, hover-scrubbing on the web page, poster frame
      • Frame rate conversion
        • support different TV standards
        • higher FPS: preserve slow motion quality for editing
        • lower FPS: playback or streaming
      • Bitrate conversion
        • quality the higher bitrate the higher quality
        • smaller storage space
        • support different network bandwidth
      • change GOP size
        • good for editing based on I-frame (keyframe) only, 保持一个关键帧,然后可以继续操作,就可以得到动画
        • good for compression and streaming based on long GOP
      • overlay
        • logo ….
      • subtitle/capture
      • timecode
      • covert video resolution
      • Audio volume adjustment
      • Audio mixing eg: mix the person speaking voice and the background voice
      • audio resampling, changing the frequency

How to setup it?

  • how to get it?
    • No official builds
    • Build from source code, download it from github repo
    • Download Pre-build packages, Use package management tool, like homebrew and APT
  • support OS
    • macOS brew install ffmpeg
    • linux
    • Windows

How to use its sub-component tool

  • ffprobe
    • When you install the ffmpeg in brew, you can use ffmpeg, ffmprobe or ffmplay directly. For the part, you will use ffmprobe to analyze the video or audio resource.
    • Input the following command, ffprobe xxxx.mp4, you will get the following result with log.image * If you don't need any log, you can input the following command, `ffprobe -v error bunny_1080p_60fps.mp4 -show_format`, you will retrieve the whole video format details
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      [FORMAT]
      filename=bunny_1080p_60fps.mp4
      nb_streams=3
      nb_programs=0
      format_name=mov,mp4,m4a,3gp,3g2,mj2
      format_long_name=QuickTime / MOV
      start_time=0.000000
      duration=634.533333
      size=355856562
      bit_rate=4486529
      probe_score=100
      TAG:major_brand=isom
      TAG:minor_version=1
      TAG:compatible_brands=isomavc1
      TAG:creation_time=2013-12-16T17:59:32.000000Z
      TAG:title=Big Buck Bunny, Sunflower version
      TAG:artist=Blender Foundation 2008, Janus Bager Kristensen 2013
      TAG:comment=Creative Commons Attribution 3.0 - http://bbb3d.renderfarming.net
      TAG:genre=Animation
      TAG:composer=Sacha Goedegebure
      [/FORMAT]
      * If you want to know the details of the stream, you can input `ffprobe -v error bunny_1080p_60fps.mp4 -show_format -show_streams` * with parameter `-print_format`, you can only **format** output csv or json
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      {
      "programs": [],
      "streams": [
      {
      "codec_name": "h264"
      },
      {
      "codec_name": "mp3"
      },
      {
      "codec_name": "ac3",
      "side_data_list": [{}]
      }
      ]
      }
      * with parameter `-show_stream -select-streams v`, only output video stream
      1
      2
      3
      4
      5
      {
      "programs": [],
      "streams": [{
      "codec_name": "h264"}]
      }
      * with parameter `-show_entries stream=codec_name`, only output codec name, you might need to remove the other parameters like `-show_stream` or `-show_format` * with parameter `-print_format default=noprint_wrappers=1`, remove outside tag
      1
      2
      3
      codec_name=h264
      codec_name=mp3
      codec_name=ac3
      * with parameter `-print_format default=noprint_wrappers=1:nokey=1`, you can get the only value without key
      1
      2
      3
      h264
      mp3
      ac3
      * You can analyze the stream information by URL directly without downloading it to loacl machine `ffprobe -v error ` * [more details, www.ffmpeg.org/ffprobe.html](https://www.ffmpeg.org/ffprobe.html)
  • ffplay
    • with the parameter -x , -y, you can control the size of the video window
    • with the parameter -noborder, you can remove the border of the video window
    • with the parameter -top -left, you can control the position of the video window
    • with the parameter -fs, you can get the full screen video window
    • with the parameter -an, you can disable the audio
    • with the parameter vn, you can disable the video
      • tips, you can press key W to switch the show-mode
    • with the parameter -showmode waves, you can get the video wave
    • with the parameter -loop <times>, you can get looping video
    • with pressing key S, you can step through the frames of a video one frame at a time
  • ffmpeg
    • overview of how to edit media
      overview
    • details
      • input & output
        • ffmpeg -i <protocol>:<device>, file is default, you also can choose the network pattern, like http, ftp, rtp etc.
        • according to the above processing, you can run the following command like ffmpeg -i input.mov ...add some filter... output.mp4
        • with the parameter, -f extension, you can do the muxing or demux use the command. During the input process, it was used to demux. During the output process, it was used to mux.
          • ffmpeg -i input.mov ... -f ... output.extension
        • with parameter pipe:0, don’t know very well
        • capture, I guess this can be used to record screen;
          • ffmpeg -f avfoundation -list_devices true -i "" list the useful device in the current machine
          • screen
            • ffmpeg -f avfoundation -i ":1" ....
          • audio (microphone)
            • ffmpeg -f avfoundation -i ":2" ....
          • webcam
            • ffmpeg -f avfoundation -i "default" ....
        • testing
          • ffmpeg -f lavfi -i color=color=red..., lavfi is a virtual space
      • stream selection
        the example of streams in container
        • For one container, generally speaking, it includes one video stream and one more audios streams
        • how to select a stream, ::, these three inputs will control the stream you want to select.
        • ffmpeg -v error -y -i bunny_1080p_60fps.mp4 -to 1 bunny_1080p_60fps-1s.mp4, the command is used to edit a video
          • -y override the file anyway
          • -to cut the video until 1 second
          • For default, ffmpeg only picks up one video and one audio
            • get all, with parameter -map 0
            • get video, with parameter -map 0:v
            • get audio, with parameter -map 0:a
            • get first audio, with parameter -map 0:a:0
        • You also can process multiple data together, integration some video, layout(logo), audio together
          • ffmpeg -v error -y -i bunny_1080p_60fps.mp4 -i bullfinch.mov -to 1 -map 0:v:0 -map 1:a:0 bunny_1080p_60fps-1s-mix.mp4
      • filter
        • do some change for the media, the common library is libavfilter
        • syntax, filter=key1=values:key2=value2
          • eg: scale filter scale=width=1920:height=1080
            • different filter labels are used ; [semicolon sign] to split each others. In my opinion, I think that semicolon sign is used to do the textual separation, which the filter is more readable.
            • for one filter with multiple options, using : [colon sign] to split each other
            • using comma , to split the different filter
        • it can be labeled, which can be referred for later option
        • filter graph
          • with param -vf process graphs with single input and output, especially operate video resource
          • eg:
            1
            2
            3
            4
            5
            ffmpeg -v error -i <medida name with extension> -vf
            "split[bg][ol]; // split the media with the same two parts
            [bg]scale=width=1920:height=1080,format=gray[bg_output];
            [ol]scale=-1:480,hflip[ol_output];
            [bg_output][ol_output]overlay=x=W-w:y=(H-h)/2" <ouput file > // combine two outputs togther.
          • with param -filter-complex, you can manipulate all the streams, like video, audio and picture.
            1
            2
            3
            4
            5
            ffmpeg -v error -y -i bullfinch.mov -i logo.png -filter_complex 
            "[1:v]scale=-1:200[s_logo];
            [0:v][s_logo]overlay=x=W-w-50:y=H-h-50,split=2[sd_in][hd_in];
            [sd_in]scale=-2:480[sd];[hd_in]scale=-2:1080[hd]"
            -map \[sd\] bully-sd.mp4 -map \[hd\] bully-hd.mp4
            • tips: If you want to use it in zsh, you should add escape symbol before the special character, like '[', ']'
          • with param -af, you can do some editions on audio stream only for one input and output
          • common filter
            • scale, one input and one output
            • split, one input and multiple output
            • overlay, multiple input and one output
      • encoding
        • what is kind of thing should we take into consideration?
          • compression
          • quality vs size
          • stream vs post-production
          • target application
          • compatibility, check the codec is supported by the container
        • converting
          • If you want to transfer one container to another and don’t set the specific codec method, ffmpeg will set a default codec (suitable one).
            • eg: ffmpeg -v error -y -i bullfinch.mov transcoded.mxf
            • with parameter, -vcodec <codec name> to set specific video codec method
              • ffmpeg -encoders, list all the useful codec method
            • with parameter -avodec, set the specific audio codec method
          • detail of AVC short for advanced video codec, H.264
            • codec library: libx264
            • profile, baseline, main, high, it is optional
            • Rate control (edit the bitrate, effect the video quality)
              • with parameter -crf
              • CRF: constant quality, variable bitrate, focus on high quality crf 0-51, the higher, the worse quality
              • Two-pass ABR, this is a quantitative option, with parameter -b:v 2M, which means that the bitrate is set as 2Mbps
                • tips, it cannot give you very accurate bitrate, therefore you can use two pass method. like -b:v 2M -pass 1 -f null /dev/null, then -b:v 2M -pass 2 filename. The result of the first step will be saved into the statics
              • preset, which is used to control the speed of compression, the quicker, the bigger compression file you get
      • Streaming
        • Basic Concept
          • The device connect to the internet, then you can get the media resource from the internet transportation, rather than download the media directly. This process can be called as streaming.
          • And the streaming can be consumed without any waiting just watch it directly when your device connects to the internet
            overview of the streaming get or provide
          • Streaming playback, it is a mechanism for keep a comfortable watching experience.
            • Playable instantly, without downloading the whole media files
            • Seekable, when you choose the different timeline, the media can be played based on the correct timeline rather than play it from the start
            • Adaptive, based on different network situation (adaptive streaming bitrate, how to make it? not only the network situation, but also the player window size)
              • In my opinion, it will start with a file extended with m3u8. Actually, the following playable videos are separated very small parts whose format is ts. When the network becomes worse, it will retrieve some low bitrate media resource until the network turn to the good situation.
        • Streaming protocol
          • different protocol is suitable to different situations, some are applicable to ingest, some are for distribution.
            • RTMP
              • real-time-messaging protocol
              • based on TCP
              • low latency
              • stop updating, so not support to latest codec method
              • required flash plugin
              • is popular to live-streaming ingest
            • HTTP
              • widest reach
              • TCP based
              • unlikely to be blocked anywhere
              • support its native player HTML5 video, no need to add extra plugin, like flash
              • native support HLS and MPEG-DASH adaptive method
              • good at distribution, not widely used to ingest
            • SRT
              • secure reliable transport
              • based on UDP based
              • faster than RTMP
              • cannot support any browsers, because its based on UDP, but it’s extremely applicable to ingest streaming
        • Progressive Download
          • The progressive download method downloads and caches video on the viewer’s device. A short period of time is required to buffer and cache the beginning of the media file before it starts playing
          • single-file media
            • advantage
              • Not segmented
              • Easier to handle
              • native browser support
              • copy and download easily, you can send the whole file to other services
          • container formats
            • MP4, WebM, Ogg, these formats are supported by native browser
          • The index
            • It’s used to look up the media data of a time aor frame, just like a table or a map to save the media information
            • It will be written into the end of the mp4 file
            • Similar to Apple player format (QuickTime, MOV)
            • the data will get some hierarchical sections, the section are atom/box, atom will be set into the stream during the codec period
              what is the atom
              the category of atoms
          • fast-started
            • fast-started structure
            • This will let the video be played gradually rather than download the whole file once time
            • with command ffmpeg -v trace -i <video name>, you can get the verbose information
            • pick atom ffmpeg -v trace -i fast-start.mp4 2>&1 | grep -e type:\'mdat\' -e type:\'moov\'
            • convert to a fast-started media ffmpeg -i fast-start.mp4 -movflags +faststart -c copy test-fast-started.mp4
              • option -movflag
        • Adaptive Streaming
          • Sometime, the outside condition is changeable, for single-quality media, it just only has one quality, single resolution, bitrate choice
          • adaptive resolution, different screen size the stream will choose some related stream. But if the screen resolution is greater than the low standard resolution and less than the high standard resolution, it will still use the higher one, take the 560p as an example, the play will still play the 720p media.
          • ffmpeg can do it?
          • how it work
            • adaptive streaming solution
              • HLS, http live streaming
                • codec (H.264 avc H.265 havc)
                • container (TS, fMP4, Fragmented MP4)
                • manifest
                  • m3u8
                    • master playlist is used to point the media playlist
                    • media playlist list the individual the segments of ts file in sequence
              • DASH, dynamic adaptive streaming over http
                • MPEG-DASH (mp4)
                • codec freedom, H.264, H.265, VP9, etc.
                • container only fMP4
                • manifest (mdp, media presentation description, format is xml)
              • common characteristics
                • both can split media into several small segments
          • the facts need to consider when encoding the media to adaptive streaming
            • frame type
            • concept of frame
            • What is the self-contained in GOPs
          • how to use ffmpeg to convert the media into HLS or DASH
            • HLS, TS, A+V, the format of playlist is m3u8 and the segment is ts
              •   ffmpeg -y -i bunny_1080p_60fps.mp4 -to 10 \
                  -filter_complex "[0:v]fps=30,split=3[720_in][480_in][240_in];[720_in]scale=-2:720[720_out];[480_in]scale=-2:480[480_out];[240_in]scale=-2:240[240_out]" \
                  -map \[720_out\] -map \[480_out\] -map \[240_out\] -map 0:a -map 0:a -map 0:a \
                  -b:v:0 3500k -maxrate:v:0 3500k -bufsize:v:0 3500k \
                  -b:v:1 1690k -maxrate:v:1 1690k -bufsize:v:1 1690k \
                  -b:v:2 326k -maxrate:v:2 326k -bufsize:v:2 326k \
                  -b:a:0 128k \
                  -b:a:1 96k \
                  -b:a:2 64k \
                  -x264-params "keyint=60:min-keyint=60:scenecut=0" \
                  -var_stream_map "v:0,a:0,name:720-4M v:1,a:1,name:480-2M v:2,a:2,name:240p-500k" \
                  -hls_time 2 \
                  -hls_list_size 0 \
                  -hls_segment_filename adaptive-%v-%03d.ts \
                  -master_pl_name adaptive.m3u8 \
                  adaptive-%v.m3u8
                 
                1
                2
                3
                4
                5
                6
                7
                8
                9
                10
                11
                12
                13
                14
                15
                16
                17
                18
                19
                20
                21
                22
                23
                24
                25
                26
                27
                28
                29
                30
                31
                32
                33
                34
                35
                36
                37
                38
                39
                          * `-b:v:0` is used to select the video stream and set the average bitrate, like `-b:v:0 128k`;
                * `-bufsize` is used to check the average bitrate in this set value. eg: `-bufsize 128k`, During each 128k media transmission, ffmpeg will calculate the current average bitrate
                * if the smaller value is set, the frequency of checking bitrate will be too high. Otherwise, it will take a long duration
                * Generally speaking, it can set the same as the video average bitrate or even the half of it
                * `-maxrate` is used to limit the max transmission bitrate. eg: Your max bandwidth is **1024kb/s**, and assuming you can totally use this bandwidth, audio will use 128k/s (this audio is stereo, 64k per channel), therefore, only 896k/s you can use, so you should set this value as your **maxrate**
                * references
                * [What does -bufsize do?](https://trac.ffmpeg.org/wiki/Limiting%20the%20output%20bitrate)
                * [Encoding for streaming sites](https://trac.ffmpeg.org/wiki/EncodingForStreamingSites)
                * HLS, TS separate the audio and video, only single audio stream, will have itself playlist manifest
                * HLS, fMP4
                * DASH, fMP4
                * HLS+DASH, fMP4
                * ![HLS DASH mixed pattern](https://user-images.githubusercontent.com/6279298/210923276-23ba054e-eec4-4120-9ac0-403ee39494b7.png)
                * ```yaml
                ffmpeg -y -i bullfinch.mov -to 10 \
                -filter_complex "[0:v]fps=30,split=3[720_in][480_in][240_in];[720_in]scale=-2:720[720_out];[480_in]scale=-2:480[480_out];[240_in]scale=-2:240[240_out]" \
                -map \[720_out\] -map \[480_out\] -map \[240_out\] -map 0:a \
                -b:v:0 3500k -maxrate:v:0 3500k -bufsize:v:0 3500k \
                -b:v:1 1690k -maxrate:v:1 1690k -bufsize:v:1 1690k \
                -b:v:2 326k -maxrate:v:2 326k -bufsize:v:2 326k \
                -b:a:2 128k \
                -x264-params "keyint=60:min-keyint=60:scenecut=0" \
                -hls_playlist 1 \
                -hls_master_name adaptive.m3u8 \
                -seg_duration 2 \
                adaptive.mpd
                ```
                * example of manipulate video
                * Trimming
                * `ffmpeg -y -v error -i nature.mp4 -ss 00:03:55 -to 240 bear.mp4`
                * `-v [flags+]loglevel` set the processing log error verbose
                * `-ss` set up the start second
                * `-to` set up the total duration
                * `-t` set up the segment duration, take the above command as an example, if you want to achieve the same behavior, you can use `-ss 00:03:55 -t 5`
                * Merge
                * define a list file
                * ```text
                file 'bear.mp4'
                file 'flower.mp4'
        • ffmpeg -y -v error -f concat -i list.txt merge.mp4
      • Thumbnail
        • ffmpeg -v error -i bullfinch.mov -vframes 1 bullfinch-poster-image.jpg
          • -vframes set up the number of frames
          • You also can use filter during this processing, by the following command
            • ffmpeg -v error -i bullfinch.mov -vframes 1 -vf scale=320:180 bullfinch-poster-image-thumbnail.jpg
          • if you don’t want to output the first frame as the thumbnail, you can add the -ss before the -vframes
          • If you want to output multiple frame pictures, you can run the following command
            • ffmpeg -v error -i bullfinch.mov -vf fps=1,scale=320:180 bullfinch-thumbnail-%02d.jpg
            • -%02d is a template syntax, which can be used to generate multiple file
      • Scaling
        • resize the media or photo resource
          • -vf scale=-2:480, minus two means we want to keep the aspect ratio
          • force original aspect ratio option, force_original_aspect_ratio
          • If you can want to keep the desired window size, you can add the pad option: pad=640:480:(ow-iw)/2:(oh-oh)/2, with this option, you can output the video in the desired bound box.
      • Overlay
        • ffmpeg -v error -y -i (video) -i (pic) -filter_complex "overlay" output_file.extension
        • adjust the transparency of the overlay -filter_complex "[1:v]colorchannelmixer=aa=0.4[transparent_log];[0:v][transparent_log]overlay"
          • aa Adjust contribution of input red, green, blue and alpha channels for output alpha channel. Default is 1 for aa, and 0 for ar, ag and ab.
        • multiple overlay, overlay the video with logo at first, then set a label to them. Overlaying the previous output and the third input video
      • Drawing text and or timeCode
        • fixed text when play
          • ffplay bullfinch.mov -v error -an -vf "drawtext=text=Bird"
            • fixed draw text
          • with parameter x and y, you can adjust its position, like ffplay bullfinch.mov -v error -an -vf "drawtext=fontfile=Example.ttf:text=Bird:x=(w-text_w)/2:y=(h-text_h)/2:fontsize=60"
            • Screen Shot 2023-01-08 at 7 22 05 PM
        • moving text when play
          • with parameter t which represents the time, if you want the text from the top to the center you can use the following command:
            • y=t*50
            • Generally speaking, If you want to move the text from the right to the left, you can add the parameter x=(w-text_w)/2-100*t
        • add time code
          • default
            • ffplay bullfinch.mov -v error -an -vf "drawtext=fontfile=Example.ttf:text=Bird:x=(w-text_w)/2:y=(h-text_h)/2:fontsize=60:timecode='00\:00\:00\:00':rate='24000/1001'
          • with boxColor
            • :box=1: boxcolor=red@0.2
            • time code with color box
    • example of manipulation audio
      • extract channels
        • some channels play the voice, and some channels play the background music
          • with filter amerge=inputs=2, you can merge the whole stream to a one stream with multiple channels
          • with the following command you can split the whole stream into different files ffmpeg -v error -y -i two-stereo-tracks.m4a -filter_complex "amerge=inputs=2,asplit=4[all0][all1][all2][all3];[all0]pan=mono|c0=c0[ch0];[all1]pan=mono|c0=c1[ch1];[all2]pan=mono|c0=c2[ch2];[all3]pan=mono|c0=c3[ch3]" -map \[ch0\] ch0.m4a -map \[ch1\] ch1.m4a -map \[ch2\] ch2.m4a -map \[ch3\] ch3.m4a
            • with the filter asplit like [in] asplit=<output number> [out0][out1][out2], you can get number of output
            • with the filter pan, you can manipulate the audio
      • mix channels
        • with ffmpeg -v error -y -i ch0.m4a -i ch1.m4a -i ch2.m4a -i ch3.m4a -filter_complex "amerge=inputs=4" mix_four_in_one_channel.m4a
        • with the filter amix=inputs=<number of input>, it can work, but the mixed audio will drop down the volume
        • with "amerge=inputs=4,pan=mono|c0=c0+c1+c2+c3", you can avoid this issue
  • Title: ffmpeg basic concept
  • Author: Xiao Qiang
  • Created at : 2023-03-05 11:56:02
  • Updated at : 2025-03-08 10:49:30
  • Link: http://fdslk.github.io/tech/video/ffmpeg/2023/03/05/ffmpeg_basic_concept/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments