`libvpx`再深入一点

前置知识：先学会最外层的API

本文将深入一点点找到内部的编解码函数的调用过程。

在《libvpx的使用方法简析 - simple_decoder.c》我们了解到解码过程是调用两个API函数vpx_codec_decode和vpx_codec_get_frame，也知道了这两个函数核心也只是在调用vpx_codec_iface里的dec.decode和dec.get_frame。

而在《libvpx的使用方法简析 - simple_encoder.c》我们也了解到编码过程也是调用两个API函数vpx_codec_encode和vpx_codec_get_cx_data，而这两个函数也是调用的vpx_codec_iface里的接口enc.encode和get_cx_data。

除此之外，我们还知道vpx_codec_iface的实现分为vp8和vp9两个版本，两个版本都有各自的编码器实现和解码器实现：

vp9解码器实现是vpx_codec_vp9_dx
vp9编码器实现是vpx_codec_vp9_cx
vp8解码器实现是vpx_codec_vp8_dx
vp8编码器实现是vpx_codec_vp8_cx

显然，这些接口实现里的函数就是具体的编解码过程。这样，我们进一步深入的入口就从这些实现开始。

# vp9解码

vp9解码器实现是vpx_codec_vp9_dx：

而其中decoder_decode就是dec.decode、decoder_get_frame就是dec.get_frame。

# `decoder_get_frame`

先看比较简单的decoder_get_frame：

static vpx_image_t *decoder_get_frame(vpx_codec_alg_priv_t *ctx,
                                      vpx_codec_iter_t *iter) {

1
2

函数开始。

  vpx_image_t *img = NULL;

这个是要返回的结果，就是一个帧图像。

  // Legacy parameter carried over from VP8. Has no effect for VP9 since we
  // always return only 1 frame per decode call.
  (void)iter;

1
2
3

从注释中可以看到这里是一个历史遗留问题，vp9每次decoder_get_frame只会返回一帧，不需要像vp8那样迭代，所以迭代器参数在vp9里没用。

  if (ctx->pbi != NULL) {

首先必须要有ctx->pbi的存在才能进行下面这些操作。

    YV12_BUFFER_CONFIG sd;
    vp9_ppflags_t flags = { 0, 0, 0 };
    if (ctx->base.init_flags & VPX_CODEC_USE_POSTPROC) set_ppflags(ctx, &flags);

1
2
3

首先设置了vp9_ppflags_t。从这个if里的判断条件看，这应该是控制后处理过程的设置。

关于视频解码的后处理，可以参考ffmpeg对后处理的介绍：《wiki:Postprocessing》

    if (vp9_get_raw_frame(ctx->pbi, &sd, &flags) == 0) {

这里的判断就蕴含着本函数的核心处理过程：

可以看到，其实就是把数据从pbi->common里找出来放进sd里，并且如果有定义后处理过程就进行一下后处理。

      VP9_COMMON *const cm = &ctx->pbi->common;
      RefCntBuffer *const frame_bufs = cm->buffer_pool->frame_bufs;

1
2

取出了一个pbi里的变量和一个什么buffer。

      ctx->last_show_frame = ctx->pbi->common.new_fb_idx;

这里看着像是把之前解码好的帧存储为“上一帧”的操作。

      if (ctx->need_resync) return NULL;

这个need_resync应该是vp9_get_raw_frame里面返回的什么错误吧。

      yuvconfig2image(&ctx->img, &sd, ctx->user_priv);

这函数名显然是用于吧sd里的帧数据转成img格式的操作。

      ctx->img.fb_priv = frame_bufs[cm->new_fb_idx].raw_frame_buffer.priv;

这里看着也像是把之前解码好的帧存储为“上一帧”的操作。

      img = &ctx->img;
      return img;

1
2

返回了。这个&ctx->img应该就是在yuvconfig2image从sd转换过来的帧图像了。

    }
  }
  return NULL;
}

1
2
3
4

函数结束。

# `decoder_decode`

再来看稍微复杂一点的decoder_decode：

static vpx_codec_err_t decoder_decode(vpx_codec_alg_priv_t *ctx,
                                      const uint8_t *data, unsigned int data_sz,
                                      void *user_priv, long deadline) {

1
2
3

函数开始。

  const uint8_t *data_start = data;
  const uint8_t *const data_end = data + data_sz;

1
2

首先是计算原始数据的起始和终止地址。

  vpx_codec_err_t res;

初始化了返回值。这返回只返回了错误信息。

  uint32_t frame_sizes[8];
  int frame_count;

1
2

这个是给后面一个压缩包解码出多个帧的情况用的。从这个frame_sizes尺寸看应该是最多8帧。

  if (data == NULL && data_sz == 0) {
    ctx->flushed = 1;
    return VPX_CODEC_OK;
  }

1
2
3
4

如果没有数据就直接返回。

  // Reset flushed when receiving a valid frame.
  ctx->flushed = 0;

1
2

有数据就先flush。

  // Initialize the decoder on the first frame.
  if (ctx->pbi == NULL) {
    const vpx_codec_err_t res = init_decoder(ctx);
    if (res != VPX_CODEC_OK) return res;
  }

1
2
3
4
5

如果没有ctx->pbi就先初始化。从注释上看，这个ctx->pbi为空是在第一帧才会出现的情况。

这个ctx->pbi前面经常见到，从这种在第一帧初始化的操作，看来是用来在解码过程中存储一些临时数据的变量。

  res = vp9_parse_superframe_index(data, data_sz, frame_sizes, &frame_count,
                                   ctx->decrypt_cb, ctx->decrypt_state);
  if (res != VPX_CODEC_OK) return res;

1
2
3

从这个函数名上看，首先是读取superframe。

查一下superframe的概念，就是vpx可以将多个帧放在一个压缩包里。所以这里的frame_sizes和frame_count应该就是“输出结果”，读取superframe就是读取出superframe里面帧的数量和每个帧数据的大小。

  if (ctx->svc_decoding && ctx->svc_spatial_layer < frame_count - 1)
    frame_count = ctx->svc_spatial_layer + 1;

1
2

SVC是指可适性视频编码(Scalable Video Coding)，详情请看《SVC和视频通信》。

  // Decode in serial mode.
  if (frame_count > 0) {

1
2

首先是frame_count>0的情况。这个应该是解包superframe一个数据包有好几个帧的情况。

    int i;

    for (i = 0; i < frame_count; ++i) {

1
2
3

一个循环对每个帧进行解压。

      const uint8_t *data_start_copy = data_start;
      const uint32_t frame_size = frame_sizes[i];

1
2

首先是获取帧数据起点和大小。

      vpx_codec_err_t res;
      if (data_start < data || frame_size > (uint32_t)(data_end - data_start)) {
        set_error_detail(ctx, "Invalid frame size in index");
        return VPX_CODEC_CORRUPT_FRAME;
      }

1
2
3
4
5

数据错误就报错返回。

      res = decode_one(ctx, &data_start_copy, frame_size, user_priv, deadline);
      if (res != VPX_CODEC_OK) return res;

1
2

这个decode_one应该就是解码的核心函数了。

      data_start += frame_size;

每次解码完成后就更新帧数据起点，很合理。

帧解码循环结束。

  } else {

接下来是frame_count<=0时的操作

    while (data_start < data_end) {

这里应该就是没有frame_count采用直接扫描的方式

      const uint32_t frame_size = (uint32_t)(data_end - data_start);

frame_size也是计算出来的

      const vpx_codec_err_t res =
          decode_one(ctx, &data_start, frame_size, user_priv, deadline);
      if (res != VPX_CODEC_OK) return res;

1
2
3

核心操作依旧是这个decode_one。

      // Account for suboptimal termination by the encoder.
      while (data_start < data_end) {
        const uint8_t marker =
            read_marker(ctx->decrypt_cb, ctx->decrypt_state, data_start);
        if (marker) break;
        ++data_start;
      }

1
2
3
4
5
6
7

接下来这个应该是扫描直到找到下一个帧的起始标记。

    }
  }

1
2

循环结束。

  return res;
}

1
2

函数结束。

可以看出，这个decoder_decode主要是解析出数据包中帧的数量，然后对每一帧调用decode_one进行解码。

# `decode_one`

再看看这个decode_one又是什么

static vpx_codec_err_t decode_one(vpx_codec_alg_priv_t *ctx,
                                  const uint8_t **data, unsigned int data_sz,
                                  void *user_priv, int64_t deadline) {
  (void)deadline;

1
2
3
4

函数开始。

  // Determine the stream parameters. Note that we rely on peek_si to
  // validate that we have a buffer that does not wrap around the top
  // of the heap.
  if (!ctx->si.h) {
    int is_intra_only = 0;
    const vpx_codec_err_t res =
        decoder_peek_si_internal(*data, data_sz, &ctx->si, &is_intra_only,
                                 ctx->decrypt_cb, ctx->decrypt_state);
    if (res != VPX_CODEC_OK) return res;

    if (!ctx->si.is_kf && !is_intra_only) return VPX_CODEC_ERROR;
  }

1
2
3
4
5
6
7
8
9
10
11
12

这是什么操作，没懂。

  ctx->user_priv = user_priv;

  // Set these even if already initialized.  The caller may have changed the
  // decrypt config between frames.
  ctx->pbi->decrypt_cb = ctx->decrypt_cb;
  ctx->pbi->decrypt_state = ctx->decrypt_state;

  if (vp9_receive_compressed_data(ctx->pbi, data_sz, data)) {
    ctx->pbi->cur_buf->buf.corrupted = 1;
    ctx->pbi->need_resync = 1;
    ctx->need_resync = 1;
    return update_error_state(ctx, &ctx->pbi->common.error);
  }

  check_resync(ctx, ctx->pbi);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

核心操作应该就是这个vp9_receive_compressed_data了。

看这样子应该是出错返回1，不出错返回0。


  return VPX_CODEC_OK;
}

1
2
3

函数结束。

特别注意一下最后调用的这个check_resync：

static INLINE void check_resync(vpx_codec_alg_priv_t *const ctx,
                                const VP9Decoder *const pbi) {
  // Clear resync flag if the decoder got a key frame or intra only frame.
  if (ctx->need_resync == 1 && pbi->need_resync == 0 &&
      (pbi->common.intra_only || pbi->common.frame_type == KEY_FRAME))
    ctx->need_resync = 0;
}

1
2
3
4
5
6
7

这里面注释写道解码器会在接收到关键帧或仅帧内编码帧时进行resync，相对应的就是在收到帧间编码帧时不会resync。这个操作应该是和帧间编码有关的，可能是在收到无帧间编码的帧时清除帧间编码遗留的数据。

下接《libvpx解码过程解读》

Yin的笔记本

Choose mode

libvpx再深入一点

# vp9解码

# `decoder_get_frame`

# `decoder_decode`

# `decode_one`

Yin的笔记本