Qt / Qml 视频硬解码(CUDA)中如何实现无上传硬渲染(一)

【写在前面】

很多时候，我们在对视频的解码和渲染的处理都要经过以下步骤：

软解码，

视频帧位于内存

。
- 软渲染，
  
  需要拷贝到图像然后渲染；硬渲染则需要上传纹理，然后渲染
  
  。
硬解码，

视频帧位于显存

。
- 软渲染，
  
  需要下载到内存，然后拷贝到图像再渲染；硬渲染则直接拷贝到纹理，然后渲染
  
  。

一般我们处理硬解时都会将解码帧下载到内存，然后渲染( 方便处理 )。

然而，对于超高分辨率( 4K 8K )而言，上传下载带来的的性能损失太大了( CPU瓶颈 )，为了实现更流畅的体验和更低的资源占用，应当考虑更好的方案。

当然，这里没必要提软解码，因为无论如何都需要上传( 硬渲染 )。

另一方面，现在流行的硬解码大多使用

Nvidia CUDA( cuvid )

，因此本篇只以 CUDA 硬解为例来实现硬渲染，其他硬解思路基本一致。

【需要的准备】

首先，我假设你已经有一个拉流器( live555 或 ffmpeg，本地文件则不需要 )，然后有一个 NV 的硬解码器

NVDecoder

，另外需要一定的

OpenGL

基础，因为我这里的硬渲染需要使用

OpenGL

。

需要准备好的工具：

拉流器( 本地文件直接取码流即可 )
Nvidia 硬解码器
OpenGL 环境( 因为这里是 Qt，所以使用 QOpenGL )
CUDA 环境( 我这里的版本是 CUDA 11.0 )

【正文开始】

注意，为了简单起见，我这里的图像格式简单的使用了 RGBA( 而正常情况下都是 NV12 )。

实际上，Nvidia 官方的示例相当明了：

其核心思路是

将 CUDA 图形资源与 OpenGL 资源

关联起来：

CUgraphicsResource cuda_tex_resource;
ck(cuGraphicsGLRegisterBuffer(&cuda_tex_resource, m_pbo.bufferId(), CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD));
ck(cuGraphicsMapResources(1, &cuda_tex_resource, 0));
CUdeviceptr d_tex_buffer;
size_t d_tex_size;
ck(cuGraphicsResourceGetMappedPointer(&d_tex_buffer, &d_tex_size, cuda_tex_resource));
GetImageHW((CUdeviceptr)g_ppFrame, (uchar *)d_tex_buffer, m_videoWidth * 4, m_videoHeight);
ck(cuGraphicsUnmapResources(1, &cuda_tex_resource, 0));

这里的

g_ppFrame

，它是

NVDecoder

解码出来的视频帧的显存地址(

CUdeviceptr

)。
而

m_pbo

则是

OpenGL

中的像素缓冲对象 (

PBO

)。
至于

GetImageHW

，只是简单封装了显存拷贝函数。

因此，这里的工作相当简单，只有

注册 & 关联 & 拷贝

。

经过这些操作，现在视频帧到达

PBO

，渲染就轻而易举了：

m_pbo.bind();
if (m_texture.isCreated()) {
    m_texture.bind();
    m_texture.setData(0, 0, 0, m_videoWidth, m_videoHeight, 4, QOpenGLTexture::RGBA, QOpenGLTexture::UInt8, nullptr);
}
m_pbo.release();

m_program.bind();
m_vbo.bind();
m_program.enableAttributeArray(0);
m_program.setAttributeBuffer(0, GL_FLOAT, 0, 2, 2 * sizeof(GLfloat));

m_program.enableAttributeArray(1);
m_program.setAttributeBuffer(1, GL_FLOAT, 2 * 4 * sizeof(GLfloat), 2, 2 * sizeof(GLfloat));

m_program.setUniformValue("texture", 0);

glDrawArrays(GL_QUADS, 0, 4);

m_vbo.release();
m_program.release();

将

PBO

的数据拷贝至

OpenGL Texture

，然后绘制即可。

当然，大致的思路就是这样，然而各种坑也相当多，比如

CUDA 上下文必须和 OpenGL 上下文在同一个线程

。

接着将整个流程整理一下，放入 Qt 中，先实现一个

Renderer

：

class Renderer : public QObject, protected QOpenGLFunctions
{
    Q_OBJECT

public:
    Renderer(QObject *window) :
        m_texture(QOpenGLTexture::Target2D)
      , m_vbo(QOpenGLBuffer::VertexBuffer)
      , m_pbo(QOpenGLBuffer::PixelUnpackBuffer)
      , m_window(window)
    {

    }

    ~Renderer()
    {
        if (m_vbo.isCreated()) m_vbo.destroy();
        if (m_pbo.isCreated()) m_pbo.destroy();
        if (m_texture.isCreated()) m_texture.destroy();
    }

    static int stream_callback(int channelId, void *userPtr, int mediaType, char *pbuf, FFS_FRAME_INFO *frameInfo)
    {
        Q_UNUSED(channelId);

        auto _this = reinterpret_cast<Renderer *>(userPtr);

        if (mediaType == MEDIA_TYPE_VIDEO && frameInfo) {
            auto frameWidth = frameInfo->width;
            auto frameHeight = frameInfo->height;

            if (!_this->m_initCodec) {
                _this->initializeVideoSize(frameWidth, frameHeight);

                int errCode;
                std::string erroStr;
                _this->m_deocder_handle = NvDecoder_Create(FFmpeg2NvCodecId(frameInfo->codec), _this->m_pbo.bufferId()
                                                           , frameWidth, frameHeight, false, true, rgba, errCode, erroStr);
                qDebug() << __func__ << "NvDecoder_Create:" << _this->m_deocder_handle << errCode << QString::fromStdString(erroStr);

                cuMemAlloc((CUdeviceptr *)&g_ppFrame, frameWidth * frameHeight * 4);

                _this->m_initCodec = true;
            }

            if (_this->m_deocder_handle && pbuf) {
                uint8_t **ppFrame;
                int nFrameReturned = 0;
                int nFrameLen = 0;
                int nRet = NvDecoder_DecodeHW(_this->m_deocder_handle, (const uint8_t *)pbuf, frameInfo->length, &ppFrame, &nFrameLen, &nFrameReturned);
                //qDebug() << __func__ << "NvDecoder_DecodeHW:" << nRet << nFrameReturned << nFrameLen;
                for (int i = 0; i < nFrameReturned; i++) {
                    GetImageHW((CUdeviceptr)ppFrame[i], g_ppFrame, frameWidth * 4, frameHeight);
                    QMetaObject::invokeMethod(_this->m_window, "update");
                    std::unique_lock<std::mutex> locker(_this->m_mutex);
                    _this->m_condition.wait_for(locker, std::chrono::milliseconds(100));
                }
            }
        }

        return 0;
    }

    void initializeGL(int w, int h)
    {
        m_width = w;
        m_height = h;

        initializeOpenGLFunctions();

        initializeShader();

        GLfloat points[] {
            -1.0f, 1.0f,
            1.0f, 1.0f,
            1.0f, -1.0f,
            -1.0f, -1.0f,

            0.0f, 0.0f,
            1.0f, 0.0f,
            1.0f, 1.0f,
            0.0f, 1.0f
        };

        m_vbo.create();
        m_vbo.bind();
        m_vbo.allocate(points, sizeof(points));
        m_vbo.release();

        if (!context) {
            ck(cuInit(0));
            CUdevice cuDevice;
            ck(cuDeviceGet(&cuDevice, 0));
            char szDeviceName[80];
            ck(cuDeviceGetName(szDeviceName, sizeof(szDeviceName), cuDevice));
            qDebug() << "GPU in use: " << szDeviceName;
            ck(cuCtxCreate(&context, CU_CTX_SCHED_BLOCKING_SYNC, cuDevice));
        }

        FFS_Init(&m_ffs_handle);

        FFS_OpenStream(m_ffs_handle, 1000, (char *)m_videoUrl.toStdString().c_str(), RTP_OVER_TCP, MEDIA_TYPE_VIDEO | MEDIA_TYPE_AUDIO | MEDIA_TYPE_EVENT
                       , this, (void *)&stream_callback, 1000, 1);
    }

public slots:
    void render()
    {
        glClearColor(0.2f, 0.3f, 0.3f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
        glDisable(GL_DEPTH_TEST);
        glDisable(GL_CULL_FACE);
        glDepthMask(false);

        m_pbo.bind();
        if (m_texture.isCreated()) {
            m_texture.bind();
            m_texture.setData(0, 0, 0, m_videoWidth, m_videoHeight, 4, QOpenGLTexture::RGBA, QOpenGLTexture::UInt8, nullptr);
        }
        m_pbo.release();

        m_program.bind();
        m_vbo.bind();
        m_program.enableAttributeArray(0);
        m_program.setAttributeBuffer(0, GL_FLOAT, 0, 2, 2 * sizeof(GLfloat));

        m_program.enableAttributeArray(1);
        m_program.setAttributeBuffer(1, GL_FLOAT, 2 * 4 * sizeof(GLfloat), 2, 2 * sizeof(GLfloat));

        m_program.setUniformValue("texture", 0);

        glDrawArrays(GL_QUADS, 0, 4);

        m_vbo.release();
        m_program.release();
    }

    void initializeVideoSize(int w, int h)
    {
        m_videoWidth = w;
        m_videoHeight = h;
        m_updateResource = true;
    }

    void resizeGL(int w, int h)
    {
        if (m_width != w || m_height != h) {
            m_width = w;
            m_height = h;
            glViewport(0, 0, w, h);
        }
    }

    void display()
    {
        if (m_initCodec) {
            ck(cuCtxSetCurrent(context));

            if (m_updateResource) {
                if (m_texture.isCreated()) m_texture.destroy();
                m_texture.create();
                m_texture.bind();
                m_texture.setMinificationFilter(QOpenGLTexture::Nearest);
                m_texture.setMagnificationFilter(QOpenGLTexture::Nearest);
                m_texture.setWrapMode(QOpenGLTexture::ClampToEdge);
                m_texture.setSize(m_videoWidth, m_videoHeight, 4);
                m_texture.setFormat(QOpenGLTexture::RGBAFormat);
                m_texture.allocateStorage(QOpenGLTexture::BGRA, QOpenGLTexture::UInt8);
                m_texture.release();

                if (m_pbo.isCreated()) m_pbo.destroy();
                m_pbo.create();
                m_pbo.bind();
                m_pbo.allocate(nullptr, m_videoWidth * m_videoHeight * 4);
                m_pbo.setUsagePattern(QOpenGLBuffer::StreamDraw);
                m_pbo.release();

                ck(cuGraphicsGLRegisterBuffer(&cuda_tex_resource, m_pbo.bufferId(), CU_GRAPHICS_REGISTER_FLAGS_WRITE_DISCARD));

                m_updateResource = false;
            }

            ck(cuGraphicsMapResources(1, &cuda_tex_resource, 0));
            CUdeviceptr d_tex_buffer;
            size_t d_tex_size;
            ck(cuGraphicsResourceGetMappedPointer(&d_tex_buffer, &d_tex_size, cuda_tex_resource));
            GetImageHW((CUdeviceptr)g_ppFrame, (uchar *)d_tex_buffer, m_videoWidth * 4, m_videoHeight);
            ck(cuGraphicsUnmapResources(1, &cuda_tex_resource, 0));

            render();
        }

        m_condition.notify_one();
    }

private:
    void initializeShader()
    {
        if (!m_program.addShaderFromSourceCode(QOpenGLShader::Vertex,
                                               "#version 330 core\n"
                                               "layout(location = 0) in vec4 position;"
                                               "layout(location = 1) in vec2 texCoord0;"
                                               "out vec2 texCoord;"
                                               "void main(void)"
                                               "{"
                                               "    gl_Position = position;"
                                               "    texCoord = texCoord0;"
                                               "}"))
            qDebug() << m_program.log();

        if (!m_program.addShaderFromSourceCode(QOpenGLShader::Fragment,
                                               "#version 330 core\n"
                                               "in vec2 texCoord;"
                                               "out vec4 FragColor;"
                                               "uniform sampler2D texture;"
                                               "void main(void)"
                                               "{"
                                               "    FragColor = texture2D(texture, texCoord);"
                                               "}"))
            qDebug() << m_program.log();

        if (!m_program.link())
            qDebug() << m_program.log();

        if (!m_program.bind())
            qDebug() << m_program.log();
    }


    bool m_initCodec = false;
    bool m_updateResource = false;
    int m_width, m_height;
    int m_videoWidth, m_videoHeight;
    std::mutex m_mutex;
    std::condition_variable m_condition;
    QOpenGLTexture m_texture;
    QOpenGLBuffer m_vbo, m_pbo;
    QOpenGLShaderProgram m_program;
    CUgraphicsResource cuda_tex_resource;
    CUcontext context = nullptr;
    void *m_ffs_handle = nullptr, *m_deocder_handle = nullptr;
    QString m_videoUrl = "rtsp://admin:pass123456@192.168.0.101:554/h264/ch1/main/av_stream";
    QObject *m_window = nullptr;
};

看起来有点复杂，然而真正要做成产品远不止如此，但这里不需要管那么多，先屏蔽

stream_callback

，整个流程就是标准的

OpenGL

使用流程：

初始化 OpenGL 的各种缓冲&着色器。
render() 中发出各种绘制命令。

而

stream_callback

则是拉流后的回调，此时拿到的即是码流数据，需要进行解码：

使用帧信息初始化解码器。
接着使用 NVDecoder 解码视频帧，并拷贝至 g_ppFrame，此时便接上了正文开头。

渲染器有了，最后我们只需要在

QWidget / Qml

中创建调用即可。

QWdiget

需要借助

QOpenGLWidget

：

class VideoWidget: public QOpenGLWidget
{
public:
    VideoWidget(QWidget* parent = nullptr)
    {
        m_renderer = new Renderer(this);
    }

    virtual void initializeGL() override
    {
        m_renderer->initializeGL(width(), height());
    }

    virtual void paintGL() override
    {
        m_renderer->display();
    }

    virtual void resizeGL(int w, int h) override
    {
        m_renderer->resizeGL(w, h);
    }

private:
    Renderer *m_renderer = nullptr;
};

非常简单，因为渲染器的设计正是如此。

Qml 中如何使用呢？我之前写过一篇文章：

现代OpenGL系列教程(零)—在Qt/Quick中使用OpenGL

https://blog.csdn.net/u011283226/article/details/83217741

因此，这里的实现为：

class VideoItem : public QQuickItem
{
    Q_OBJECT

public:
    VideoItem()
    {
        connect(this, &QQuickItem::windowChanged, this, [this](QQuickWindow *window){
            if (window) {
                connect(window, &QQuickWindow::beforeRendering, this, &VideoItem::sync,
                        Qt::DirectConnection);
                connect(window, &QQuickWindow::sceneGraphInvalidated, this, &VideoItem::cleanup,
                        Qt::DirectConnection);
                window->setClearBeforeRendering(false);
            }
        });
    }

public slots:
    void sync()
    {
        if (!m_renderer) {
            m_renderer = new Renderer(window());
            m_renderer->initializeGL(window()->width(), window()->height());
            m_renderer->resizeGL(window()->width(), window()->height());
            connect(window(), &QQuickWindow::beforeRendering, this, [this]() {
                window()->resetOpenGLState();
                m_renderer->display();
            }, Qt::DirectConnection);
            connect(window(), &QQuickWindow::widthChanged, this, [this]() {
                m_renderer->resizeGL(window()->width(), window()->height());
            });
            connect(window(), &QQuickWindow::heightChanged, this, [this]() {
                m_renderer->resizeGL(window()->width(), window()->height());
            });
        }
    }

    void cleanup()
    {
        if (m_renderer) {
            delete m_renderer;
            m_renderer = nullptr;
        }
    }

private:
    Renderer *m_renderer = nullptr;
};

运行效果：

【结语】

最后，本篇代码实际都是可以使用的，不过需要根据你们自己的项目进行改进。

当然了，因为只是 demo，没有帧率控制，没有各种网络情况的处理，没有解码和渲染的控制等等，这些都需要自己慢慢优化了。

限于篇幅，下一篇将带来 Qml 中更好的实现和集成，敬请期待ヾ(￣▽￣)~~

原文链接：https://blog.csdn.net/u011283226/article/details/128613596

【写在前面】

【需要的准备】

【正文开始】

【结语】

你可能也喜欢