libvpx解码过程解读

上接《libvpx再深入一点》，本篇从上篇最后的vp9_receive_compressed_data函数开始解读。

# `vp9_receive_compressed_data`

int vp9_receive_compressed_data(VP9Decoder *pbi, size_t size,
                                const uint8_t **psource) {
  VP9_COMMON *volatile const cm = &pbi->common;
  BufferPool *volatile const pool = cm->buffer_pool;
  RefCntBuffer *volatile const frame_bufs = cm->buffer_pool->frame_bufs;
  const uint8_t *source = *psource;
  int retcode = 0;
  cm->error.error_code = VPX_CODEC_OK;

1
2
3
4
5
6
7
8

初始化一些变量

  if (size == 0) {
    // This is used to signal that we are missing frames.
    // We do not know if the missing frame(s) was supposed to update
    // any of the reference buffers, but we act conservative and
    // mark only the last buffer as corrupted.
    //
    // TODO(jkoleszar): Error concealment is undefined and non-normative
    // at this point, but if it becomes so, [0] may not always be the correct
    // thing to do here.
    if (cm->frame_refs[0].idx > 0) {
      assert(cm->frame_refs[0].buf != NULL);
      cm->frame_refs[0].buf->corrupted = 1;
    }
  }

1
2
3
4
5
6
7
8
9
10
11
12
13
14

可能有丢失帧？

  pbi->ready_for_new_data = 0;

  // Check if the previous frame was a frame without any references to it.
  if (cm->new_fb_idx >= 0 && frame_bufs[cm->new_fb_idx].ref_count == 0 &&
      !frame_bufs[cm->new_fb_idx].released) {
    pool->release_fb_cb(pool->cb_priv,
                        &frame_bufs[cm->new_fb_idx].raw_frame_buffer);
    frame_bufs[cm->new_fb_idx].released = 1;
  }

1
2
3
4
5
6
7
8
9

从后面找可用buffer的过程来看，这里的cm->new_fb_idx就是上一帧用过的帧buffer的index。结合注释看，这里的操作是为了释放上一帧用完没释放的buffer。

主要调用的函数是这个pool->release_fb_cb，这函数是BufferPool里面的一个变量：

顾名思义，这个变量带个_cb后缀，应该是某种回调函数，和它上面那个变量一个是用于分配空间时的回调一个是用于释放空间时的回调。它们都是在init_buffer_callbacks函数里初始化的：

可以看到，如果没有在传入的ctx里设置回调，就会用默认的回调，但是这默认的回调从函数名上看又不像是请求/释放内存空间后的回调函数，而是像进行请求/释放内存空间的函数。再结合前面看到的用法，这个release_fb_cb根本就不是操作完成之后的“回调”。可能这个release_fb_cb虽然带个_cb的名字但是作者并不想把它当成操作之后的回调函数用吧。

  // Find a free frame buffer. Return error if can not find any.
  cm->new_fb_idx = get_free_fb(cm);
  if (cm->new_fb_idx == INVALID_IDX) {
    pbi->ready_for_new_data = 1;
    release_fb_on_decoder_exit(pbi);
    vpx_clear_system_state();
    vpx_internal_error(&cm->error, VPX_CODEC_MEM_ERROR,
                       "Unable to find free frame buffer");
    return cm->error.error_code;
  }

1
2
3
4
5
6
7
8
9
10

找一个可用的帧buffer，如果没有就直接报错退出。前面先释放了buffer，这里用上，很合理。

核心函数就是这个get_free_fb：

可用看到，其实就是从传入的VP9_COMMON里面的buffer_poll->frame_bufs里面找到ref_count为0的buffer，如果没有就返回INVALID_IDX。

所以很明显记载buffer是否可用就是用一个ref_count数值实现的。然后调用的话直接就是buffer_poll->->frame_bufs[cm->new_fb_idx]。

  // Assign a MV array to the frame buffer.
  cm->cur_frame = &pool->frame_bufs[cm->new_fb_idx];

  pbi->hold_ref_buf = 0;
  pbi->cur_buf = &frame_bufs[cm->new_fb_idx];

1
2
3
4
5

接下来，找到的可用的buffer又赋值给了cm->cur_frame和pbi->cur_buf

（为什么一个是用的&pool->frame_bufs[cm->new_fb_idx]另一个用的&frame_bufs[cm->new_fb_idx]？明明都是同一个变量，可能只是作者们马虎了吧）

  if (setjmp(cm->error.jmp)) {
    cm->error.setjmp = 0;
    pbi->ready_for_new_data = 1;
    release_fb_on_decoder_exit(pbi);
    // Release current frame.
    decrease_ref_count(cm->new_fb_idx, frame_bufs, pool);
    vpx_clear_system_state();
    return -1;
  }

  cm->error.setjmp = 1;

1
2
3
4
5
6
7
8
9
10
11

setjmp属于C函数库，作用是分别承担非局部标号和goto作用。

setjmp函数用于保存程序的运行时的堆栈环境，接下来的其它地方，你可以通过调用longjmp函数来恢复先前被保存的程序堆栈环境。

这个setjmp是C语言里的错误处理机制，是try-catch的初级形式。难懂，以后再学习。

  vp9_decode_frame(pbi, source, source + size, psource);

  swap_frame_buffers(pbi);

  vpx_clear_system_state();

1
2
3
4
5

直接调用了vp9_decode_frame这应该就是主要的解码的函数了。然后swap_frame_buffers和vpx_clear_system_state应该就是释放内存空间之类的操作。

  if (!cm->show_existing_frame) {
    cm->last_show_frame = cm->show_frame;
    cm->prev_frame = cm->cur_frame;
    if (cm->seg.enabled) vp9_swap_current_and_last_seg_map(cm);
  }

  if (cm->show_frame) cm->cur_show_frame_fb_idx = cm->new_fb_idx;

  // Update progress in frame parallel decode.
  cm->last_width = cm->width;
  cm->last_height = cm->height;
  if (cm->show_frame) {
    cm->current_video_frame++;
  }

1
2
3
4
5
6
7
8
9
10
11
12
13
14

最后有几个和show_frame有关的操作。点进去这个show_existing_frame和show_frame，发现它们都是在vp9_decode_frame的read_uncompressed_header里赋值的，并且是从数据包头里读取出来的数据。

从这里的两个if可以看出，如果cm->show_frame为true，那么：

方才解码出的帧cm->new_fb_idx会赋值给cm->cur_show_frame_fb_idx从名字上看应该是放进当前展示的帧里面
cm->current_video_frame的值加一

从current_video_frame的注释里可以进一步推测，这个show_existing_frame和show_frame应该是控制跳过一些帧（被解码但不被显示）：


  cm->error.setjmp = 0;
  return retcode;
}

1
2
3
4

函数结束。

这个vp9_receive_compressed_data也没有触及到解码的核心操作，它只是为解码准备好了各种变量。真正的解码操作在vp9_decode_frame里面。

# `vp9_decode_frame`

void vp9_decode_frame(VP9Decoder *pbi, const uint8_t *data,
                      const uint8_t *data_end, const uint8_t **p_data_end) {

1
2

函数开始。从之前的解析看，这个data就是存放压缩帧数据的buffer起点地址，data_end是终止地址。在vp9_receive_compressed_data里面p_data_end赋的值是psource，是source的地址。所以这里的p_data_end就是data的地址。

  VP9_COMMON *const cm = &pbi->common;
  MACROBLOCKD *const xd = &pbi->mb;

1
2

取出两个context，一个是已经见过很多次的运行时变量，另外一个看名字应该是解码用的宏块结构体。

  struct vpx_read_bit_buffer rb;
  int context_updated = 0;
  uint8_t clear_data[MAX_VP9_HEADER_SIZE];
  const size_t first_partition_size = read_uncompressed_header(
      pbi, init_read_bit_buffer(pbi, &rb, data, data_end, clear_data));

1
2
3
4
5

这应该是读取数据包包头。点进去一看，其实就是调用一堆vp9_read_sync_code读取包头，根据读到的值给cm赋值。

  const int tile_rows = 1 << cm->log2_tile_rows;
  const int tile_cols = 1 << cm->log2_tile_cols;

1
2

初始化了两个tile数量相关的变量。点进去发现是在read_uncompressed_header调用的setup_tile_info的里面从包头中读取并赋值的。放在包头的只能是2的次方的值，压缩成log2存储，非常合理。

  YV12_BUFFER_CONFIG *const new_fb = get_frame_new_buffer(cm);

调用的这个：

直接取了vp9_receive_compressed_data里面弄好的buffer，没毛病嗷。

#if CONFIG_BITSTREAM_DEBUG || CONFIG_MISMATCH_DEBUG
  bitstream_queue_set_frame_read(cm->current_video_frame * 2 + cm->show_frame);
#endif
#if CONFIG_MISMATCH_DEBUG
  mismatch_move_frame_idx_r();
#endif

1
2
3
4
5
6

两个Debug用的东西？不懂

  xd->cur_buf = new_fb;

应该是把存储压缩帧信息的buffer赋值给了一个解码用的宏块结构体。

  if (!first_partition_size) {
    // showing a frame directly
    *p_data_end = data + (cm->profile <= PROFILE_2 ? 1 : 2);
    return;
  }

1
2
3
4
5

first_partition_size为false就直接showing a frame？什么操作

  data += vpx_rb_bytes_read(&rb);
  if (!read_is_valid(data, first_partition_size, data_end))
    vpx_internal_error(&cm->error, VPX_CODEC_CORRUPT_FRAME,
                       "Truncated packet or corrupt header length");

1
2
3
4

这一看就是read_uncompressed_header读完包头之后来读一下标志位验证包头长度对不对

  cm->use_prev_frame_mvs =
      !cm->error_resilient_mode && cm->width == cm->last_width &&
      cm->height == cm->last_height && !cm->last_intra_only &&
      cm->last_show_frame && (cm->last_frame_type != KEY_FRAME);

1
2
3
4

如果满足条件就use_prev_frame_mvs用上一帧的运动矢量？

  vp9_setup_block_planes(xd, cm->subsampling_x, cm->subsampling_y);

设置block_planes块平面？应该是帧内分块编码相关的操作。看这函数：

😂就是设置了一下长宽吧这是。

  *cm->fc = cm->frame_contexts[cm->frame_context_idx];
  if (!cm->fc->initialized)
    vpx_internal_error(&cm->error, VPX_CODEC_CORRUPT_FRAME,
                       "Uninitialized entropy context.");

1
2
3
4

entropy context？熵上下文？应该是和熵解码相关。不太懂，以后再学

总之这里是初始化了帧解码时的上下文，里面应该是存储的帧解码出来从数据。

  xd->corrupted = 0;
  new_fb->corrupted = read_compressed_header(pbi, data, first_partition_size);
  if (new_fb->corrupted)
    vpx_internal_error(&cm->error, VPX_CODEC_CORRUPT_FRAME,
                       "Decode failed. Frame data header is corrupted.");

1
2
3
4
5

又是一个读包头的操作，不过这次是在读compressed_header。

  if (cm->lf.filter_level && !cm->skip_loop_filter) {
    vp9_loop_filter_frame_init(cm, cm->lf.filter_level);
  }

1
2
3

如果不跳过环路滤波的话就初始化环路滤波器。

由于FDCT变换后的量化（Quant）过程是一个有损（lossy）过程，会照成信息损失。再经过反量化（Rescale）和IDCT后恢复的矩阵与原矩阵存在一定的误差，特别宏块的边界，会照常恢复的图像呈现方块化，而方块化的图片对于后面的图片预测存在极大的影响，所以我们需要通过环路滤波进行去方块化。

  if (pbi->tile_worker_data == NULL ||
      (tile_cols * tile_rows) != pbi->total_tiles) {
    const int num_tile_workers =
        tile_cols * tile_rows + ((pbi->max_threads > 1) ? pbi->max_threads : 0);
    const size_t twd_size = num_tile_workers * sizeof(*pbi->tile_worker_data);
    // Ensure tile data offsets will be properly aligned. This may fail on
    // platforms without DECLARE_ALIGNED().
    assert((sizeof(*pbi->tile_worker_data) % 16) == 0);
    vpx_free(pbi->tile_worker_data);
    CHECK_MEM_ERROR(cm, pbi->tile_worker_data, vpx_memalign(32, twd_size));
    pbi->total_tiles = tile_rows * tile_cols;
  }

1
2
3
4
5
6
7
8
9
10
11
12

接着上面的tile_rows和tile_cols处理，这里应该是确认pbi->tile_worker_data的大小足够并且pbi->total_tiles的值正确。

  if (pbi->max_threads > 1 && tile_rows == 1 &&
      (tile_cols > 1 || pbi->row_mt == 1)) {

1
2

这一看就是准备开始多线程了。

    if (pbi->row_mt == 1) {
      *p_data_end =
          decode_tiles_row_wise_mt(pbi, data + first_partition_size, data_end);

1
2
3

一行多列多线程的情况，就调用解码单行的函数decode_tiles_row_wise_mt，环路滤波应该是包含在里面了（确实如此，见《libvpx中的decode_tiles》）。

    } else {
      // Multi-threaded tile decoder
      *p_data_end = decode_tiles_mt(pbi, data + first_partition_size, data_end);
      if (!pbi->lpf_mt_opt) {
        if (!xd->corrupted) {
          if (!cm->skip_loop_filter) {
            // If multiple threads are used to decode tiles, then we use those
            // threads to do parallel loopfiltering.
            vp9_loop_filter_frame_mt(
                new_fb, cm, pbi->mb.plane, cm->lf.filter_level, 0, 0,
                pbi->tile_workers, pbi->num_tile_workers, &pbi->lf_row_sync);
          }
        } else {
          vpx_internal_error(&cm->error, VPX_CODEC_CORRUPT_FRAME,
                             "Decode failed. Frame data is corrupted.");
        }
      }
    }

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

多行多列多线程的情况，除了解码多行多列的decode_tiles_mt还要调用多线程的环路滤波vp9_loop_filter_frame_mt。

  } else {
    *p_data_end = decode_tiles(pbi, data + first_partition_size, data_end);
  }

1
2
3

这单线程的代码，就只有一个decode_tiles解码所有的块，环路滤波应该是包含在里面了（确实如此，见《libvpx中的decode_tiles》）。

  if (!xd->corrupted) {
    if (!cm->error_resilient_mode && !cm->frame_parallel_decoding_mode) {
      vp9_adapt_coef_probs(cm);

      if (!frame_is_intra_only(cm)) {
        vp9_adapt_mode_probs(cm);
        vp9_adapt_mv_probs(cm, cm->allow_high_precision_mv);
      }
    }
  } else {
    vpx_internal_error(&cm->error, VPX_CODEC_CORRUPT_FRAME,
                       "Decode failed. Frame data is corrupted.");
  }

1
2
3
4
5
6
7
8
9
10
11
12
13

一些错误处理，里面有三个看着像是自适应的函数vp9_adapt_coef_probs、vp9_adapt_mode_probs、vp9_adapt_mv_probs应该就是正常解码解不出来的时候的一些尝试吧。

  // Non frame parallel update frame context here.
  if (cm->refresh_frame_context && !context_updated)
    cm->frame_contexts[cm->frame_context_idx] = *cm->fc;

1
2
3

最后更新帧上下文？

函数结束。

离真相又进了一步！这个vp9_decode_frame负责读取帧压缩数据包头、初始化上下文结构体值，最后调用了多线程的decode_tiles_row_wise_mt和decode_tiles_mt以及单线程的decode_tiles进行解码。所以decode_tiles_row_wise_mt、decode_tiles_mt、decode_tiles这三个函数就是更深层的核心代码。

接下来就来框框decode_tiles_row_wise_mt、decode_tiles_mt、decode_tiles这三个函数是怎么“decode tiles”的：《libvpx中的decode_tiles》。

Yin的笔记本

libvpx解码过程解读

# vp9_receive_compressed_data

# vp9_decode_frame

# `vp9_receive_compressed_data`

# `vp9_decode_frame`