正所谓做工作要做好记录,现在,我要开始记录啦。
1、每个视频都是一个图像序列,其内容比一张图像丰富很多,表现力强,信息量大。对视频的分析通常是基于视频帧,但视频帧通常存在大量冗余,对视频帧的提取也存在漏帧、冗余的现象。视频关键帧提取则主要体现视频中各个镜头的显著特征,通过视频关键帧提取能够有效减少视频检索所需要花费的时间,并能够增强视频检索的精确度。
2、关键帧定义:把图像坐标系中每个“视频帧”都叠加在一起,这时镜头中视频帧的特征矢量会在空间中呈现出一个轨迹的状态,而与轨迹中特征值进行对应的“帧”即可称之为关键帧[1]
3、视频具有层次化结构,由场景、镜头和帧三个逻辑单元组成。视频检索常基于帧进行,因此,提取视频的关键帧至关重要[2]
1、关键帧提取思想:对视频序列采用镜头分割的方式,然后在镜头当中获得内容关键帧提取,接着利用“关键帧”来获得底层的形状、纹理和颜色等特征。
2、关键帧提取方法:
(1)全图像序列
镜头边界方法是将镜头中的第一帧和最后一帧(或中间帧)作为关键帧。该方法简单易行,适于内容活动性小或内容保持不变的镜头,但未考虑镜头视觉内容的复杂性,限制了镜头关键帧的个数,提取的关键帧代表性不强,效果不够稳定。
(2)压缩视频
(3) 自定义k值聚类和内容分析的关键帧提取方法[2]
(4)a 基于抽样的关键帧提取[3]
基于抽样的方法是通过随机抽取或在规定的时间间隔内随机抽取视频帧。这种方法简单不实用。
b 基于颜色特征的关键帧提取
c 基于运动分析的关键帧提取
d 基于镜头边界的关键帧提取(*)
e 基于视频内容的关键帧提取(*)
f 基于聚类的关键帧提取 (可能是最合适的一个了),我要识别的类别已经确定。
(5)使用3D-CNN提取关键帧[4]
他提出了一种基于语义的视频关键帧提取算法,该算法首先使用层次聚类算法对视频关键帧进行初步提取;然后结合语义相关算法对初步提取的关键帧进行直方图对比去掉冗余帧,确定视频的关键帧;最后通过与其他算法比较,本文提出的算法提取的关键帧冗余度相对小。
(6)首先利用卷积自编码器提取视频帧的深度特征,对其进行 K-means 聚类,在每类视频帧中采用清晰度筛选取出最清晰的视频帧作为初次提取的关键帧;然后利用点密度方法对初次提取的关键帧进行二次优化,得到最终提取的关键帧进行手语识别.[5]
(7) 视频关键帧提取方法一般可分为四大类:
第一类:基于图像内容的方法
第二类:基于运动分析的方法
第三类:基于轨迹曲线点密度特征的关键帧检测算法
第四类:目前主流方法:基于聚类的方法
(8)帧间差法
源自: 以下代码出自zyb_as的github
# -*- coding: utf-8 -*-
"""
帧间最大值法
Created on Tue Dec 4 16:48:57 2018
keyframes extract tool
this key frame extract algorithm is based on interframe difference.
The principle is very simple
First, we load the video and compute the interframe difference between each frames
Then, we can choose one of these three methods to extract keyframes, which are
all based on the difference method:1. use the difference orderThe first few frames with the largest average interframe difference are considered to be key frames.
2. use the difference thresholdThe frames which the average interframe difference are large than the threshold are considered to be key frames.
3. use local maximumThe frames which the average interframe difference are local maximum are considered to be key frames.It should be noted that smoothing the average difference value before calculating the local maximum can effectively remove noise to avoid repeated extraction of frames of similar scenes.
After a few experiment, the third method has a better key frame extraction effect.
The original code comes from the link below, I optimized the code to reduce
unnecessary memory consumption.
https://blog.csdn.net/qq_21997625/article/details/81285096
@author: zyb_as
"""
import cv2
import operator
import numpy as np
import matplotlib.pyplot as plt
import sys
from scipy.signal import argrelextremadef smooth(x, window_len=13, window='hanning'):"""smooth the data using a window with requested size.This method is based on the convolution of a scaled window with the signal.The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimizedin the begining and end part of the output signal.input:x: the input signal window_len: the dimension of the smoothing windowwindow: the type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'flat window will produce a moving average smoothing.output:the smoothed signalexample:import numpy as np t = np.linspace(-2,2,0.1)x = np.sin(t)+np.random.randn(len(t))*0.1y = smooth(x)see also: numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolvescipy.signal.lfilterTODO: the window parameter could be the window itself if an array instead of a string """print(len(x), window_len)# if x.ndim != 1:# raise ValueError, "smooth only accepts 1 dimension arrays."## if x.size
运动分析流光法进行关键帧提取
源自:AillenAnthony的github
# Scripts to try and detect key frames that represent scene transitions
# in a video. Has only been tried out on video of slides, so is likely not
# robust for other types of video.# 1. 基于图像信息
# 2. 基于运动分析(光流分析)import cv2
import argparse
import json
import os
import numpy as np
import errnodef getInfo(sourcePath):cap = cv2.VideoCapture(sourcePath)info = {"framecount": cap.get(cv2.CAP_PROP_FRAME_COUNT),"fps": cap.get(cv2.CAP_PROP_FPS),"width": int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),"heigth": int(cap.get(cv2.CAP_PROP_FRAME_Heigth)),"codec": int(cap.get(cv2.CAP_PROP_FOURCC))}cap.release()return infodef scale(img, xScale, yScale):res = cv2.resize(img, None,fx=xScale, fy=yScale, interpolation = cv2.INTER_AREA)return resdef resize(img, width, heigth):res = cv2.resize(img, (width, heigth), interpolation = cv2.INTER_AREA)return res#
# Extract [numCols] domninant colors from an image
# Uses KMeans on the pixels and then returns the centriods
# of the colors
#
def extract_cols(image, numCols):# convert to np.float32 matrix that can be clusteredZ = image.reshape((-1,3))Z = np.float32(Z)# Set parameters for the clusteringmax_iter = 20epsilon = 1.0K = numColscriteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, max_iter, epsilon)labels = np.array([])# clustercompactness, labels, centers = cv2.kmeans(Z, K, labels, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)clusterCounts = []for idx in range(K):count = len(Z[labels == idx])clusterCounts.append(count)#Reverse the cols stored in centers because cols are stored in BGR#in opencv.rgbCenters = []for center in centers:bgr = center.tolist()bgr.reverse()rgbCenters.append(bgr)cols = []for i in range(K):iCol = {"count": clusterCounts[i],"col": rgbCenters[i]}cols.append(iCol)return cols#
# Calculates change data one one frame to the next one.
#
def calculateFrameStats(sourcePath, verbose=False, after_frame=0): # 提取相邻帧的差别cap = cv2.VideoCapture(sourcePath)#提取视频data = {"frame_info": []}lastFrame = Nonewhile(cap.isOpened()):ret, frame = cap.read()if frame == None:breakframe_number = cap.get(cv2.CAP_PROP_POS_FRAMES) - 1# Convert to grayscale, scale down and blur to make# calculate image differences more robust to noisegray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # 提取灰度信息gray = scale(gray, 0.25, 0.25) # 缩放为原来的四分之一gray = cv2.GaussianBlur(gray, (9,9), 0.0) # 做高斯模糊if frame_number
# Take an image and write it out at various sizes.
#
# TODO: Create output directories if they do not exist.
#
def writeImagePyramid(destPath, name, seqNumber, image):fullPath = os.path.join(destPath, "full", name + "-" + str(seqNumber) + ".png")halfPath = os.path.join(destPath, "half", name + "-" + str(seqNumber) + ".png")quarterPath = os.path.join(destPath, "quarter", name + "-" + str(seqNumber) + ".png")eigthPath = os.path.join(destPath, "eigth", name + "-" + str(seqNumber) + ".png")sixteenthPath = os.path.join(destPath, "sixteenth", name + "-" + str(seqNumber) + ".png")hImage = scale(image, 0.5, 0.5)qImage = scale(image, 0.25, 0.25)eImage = scale(image, 0.125, 0.125)sImage = scale(image, 0.0625, 0.0625)cv2.imwrite(fullPath, image)cv2.imwrite(halfPath, hImage)cv2.imwrite(quarterPath, qImage)cv2.imwrite(eigthPath, eImage)cv2.imwrite(sixteenthPath, sImage)#
# Selects a set of frames as key frames (frames that represent a significant difference in
# the video i.e. potential scene chnges). Key frames are selected as those frames where the
# number of pixels that changed from the previous frame are more than 1.85 standard deviations
# times from the mean number of changed pixels across all interframe changes.
#
def detectScenes(sourcePath, destPath, data, name, verbose=False):destDir = os.path.join(destPath, "images")# TODO make sd multiplier externally configurable#diff_threshold = (data["stats"]["sd"] * 1.85) + data["stats"]["mean"]diff_threshold = (data["stats"]["sd"] * 2.05) + (data["stats"]["mean"])cap = cv2.VideoCapture(sourcePath)for index, fi in enumerate(data["frame_info"]):if fi["diff_count"]
(9) 使用ffmpeg进行关键帧提取
代码见于此处
(10) 使用K-means聚类法
源代码见于此处
filenames=dir('images/*.jpg');
%file_name = fly-1;
num=size(filenames,1); %输出filenames中文件图片的个数
key=zeros(1,num); %一行,num列都是0 [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
cluster=zeros(1,num); %[0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
clusterCount=zeros(1,num); %各聚类有的帧数 [0,0,0,0,0,0,0,0,0,0,0,0,....0,0,0,0]
count=0; %聚类的个数 %threshold=0.75; %阈值越大帧越多
%airplane这个视频阈值设为0.93比较合适 0.95更好
%********************************************************阈值**************************************************************%
threshold=0.91; %阈值
centrodR=zeros(num,256); %聚类质心R的直方图 第一帧图片256个初始化全部为0,第二帧也是,其余帧都是 %%%后面相似度大加入一帧后会对其进行调整
centrodG=zeros(num,256); %聚类质心G的直方图
centrodB=zeros(num,256); %聚类质心B的直方图if num==0error('Sorry, there is no pictures in images folder!');
else%令首帧形成第一个聚类img=imread(strcat('images/',filenames(1).name));count=count+1; %产生第一个聚类[preCountR,x]=imhist(img(:,:,1)); %red histogram 得到红色的直方图一共256个数值,每个数值有多少作为直方图的高度[preCountG,x]=imhist(img(:,:,2)); %green histogram[preCountB,x]=imhist(img(:,:,3)); %blue histogramcluster(1)=1; %设定第一个聚类选取的关键帧初始为首帧 cluster变为了(1,0,0,0,0,......,0,0,0)cluster(1)是改变了第一个元素clusterCount(1)=clusterCount(1)+1;%clusterCount(1)为0,加1,变为1,最终 clusterCount(1)为[1,0,0,0,.....,0,0,0]centrodR(1,:)=preCountR; % centrodR本来是num(帧个数)行,256列,全部为0.。现在第一行为第一帧的红色直方图各个数值的高度centrodG(1,:)=preCountG;centrodB(1,:)=preCountB;visit = 1;for k=2:numimg=imread(strcat('images/',filenames(k).name)); %循环读取每一帧,首先是第2帧[tmpCountR,x]=imhist(img(:,:,1)); %red histogram 得到红色分量直方图 第二幅图片的红色直方图[tmpCountG,x]=imhist(img(:,:,2)); %green histogram[tmpCountB,x]=imhist(img(:,:,3)); %blue histogramclusterGroupId=1; %新定义的一个变量clusterGroupId为1maxSimilar=0; %新定义,相似度for clusterCountI= visit:count %目前 count为1 定义新变量clusterCountI I来确定这一帧归属于第一个聚类还是第二个聚类sR=0;sG=0;sB=0;%运用颜色直方图法的差别函数for j=1:256sR=min(centrodR(clusterCountI,j),tmpCountR(j))+sR;%,j从1到256,第一帧中R的所有值256个亮度 以及第二帧的红色直方图所有高度值 进行比较选最小的sG=min(centrodG(clusterCountI,j),tmpCountG(j))+sG;sB=min(centrodB(clusterCountI,j),tmpCountB(j))+sB;enddR=sR/sum(tmpCountR);dG=sG/sum(tmpCountG);dB=sB/sum(tmpCountB);%YUV,persons are sensitive to Yd=0.30*dR+0.59*dG+0.11*dB; %运用颜色直方图法的差别函数 定义了d 差别函数if d>maxSimilarclusterGroupId=clusterCountI;maxSimilar=d;endendif maxSimilar>threshold%相似度大,与该聚类质心距离小%加入该聚类,并调整质心for ii=1:256 centrodR(clusterGroupId,ii)=centrodR(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountR(ii)*1.0/(clusterCount(clusterGroupId)+1);centrodG(clusterGroupId,ii)=centrodG(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountG(ii)*1.0/(clusterCount(clusterGroupId)+1);centrodB(clusterGroupId,ii)=centrodB(clusterGroupId,ii)*clusterCount(clusterGroupId)/(clusterCount(clusterGroupId)+1)+tmpCountB(ii)*1.0/(clusterCount(clusterGroupId)+1);endclusterCount(clusterGroupId)=clusterCount(clusterGroupId)+1;cluster(k)=clusterGroupId; %第k帧在第clusterGroupId个聚类里面 cluster(3)等于1或者2,,也就是属于第一个聚类或者第二个聚类else%形成新的聚类,增加一个聚类质心count=count+1;visit = visit+1;clusterCount(count)=clusterCount(count)+1;centrodR(count,:)=tmpCountR;centrodG(count,:)=tmpCountG;centrodB(count,:)=tmpCountB;cluster(k)=count; %第k帧在第count个聚类里面 否则 cluster(k)就在新建的聚类中endend%至此,所有帧都划进相应的聚类,一共有count个聚类,第k帧在第cluster(k)聚类中%现欲取出每个聚类中离质心距离最近,即相似度最大的作为该聚类的关键帧maxSimilarity=zeros(1,count);frame=zeros(1,count);for i=1:numsR=0;sG=0;sB=0;%运用颜色直方图法的差别函数for j=1:256sR=min(centrodR(cluster(i),j),tmpCountR(j))+sR;%每一帧和聚类质心进行比较,取最小值 sG=min(centrodG(cluster(i),j),tmpCountG(j))+sG;sB=min(centrodB(cluster(i),j),tmpCountB(j))+sB;enddR=sR/sum(tmpCountR);dG=sG/sum(tmpCountG);dB=sB/sum(tmpCountB);%YUV,persons are sensitive to Yd=0.30*dR+0.59*dG+0.11*dB;if d>maxSimilarity(cluster(i))maxSimilarity(cluster(i))=d;frame(cluster(i))=i;endendfor j=1:countkey(frame(j))=1;figure(j);imshow(strcat('images/',filenames(frame(j)).name));end
endkeyFrameIndexes=find(key)
这种方法在878帧图片中提取出198帧,冗余度还是比较高。
(11) 使用CNN来保存图片在使用聚类法提取关键帧,这种方法由于TensorFlow环境搭配的有问题,没有实际运行,在这给出链接
和这个
博主使用一分十五秒00000.MP4视频, 共1898帧,分为七类,帧差最大值帧间差法提取的效果很好。k-means聚类法效果不好,原因在于(1)代码是copy 别人的,没有进行优化,(2)博主对帧间聚类方法理论研究浅薄,指导不了实践。
k-means聚类 提出484帧
帧差最大值帧间差法提取出35帧
[1]苏筱涵.深度学习视角下视频关键帧提取与视频检索研究[J].网络安全技术与应用,2020(05):65-66.
[2]王红霞,王磊,晏杉杉.视频检索中的关键帧提取方法研究[J].沈阳理工大学学报,2019,38(03):78-82.
[3]王俊玲,卢新明.基于语义相关的视频关键帧提取算法[J/OL].计算机工程与应用:1-10[2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.
[4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html
[5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.
附录:
1、数字视音频处理知识点小结
[2020-11-04].http://kns.cnki.net/kcms/detail/11.2127.TP.20200319.1706.018.html.
[4] 张晓宇,张云华.基于融合特征的视频关键帧提取方法.计算机系统应用,2019,28(11):176–181. http://www.c-s-a.org.cn/1003-3254/7163.html
[5] [1]周舟,韩芳,王直杰.面向手语识别的视频关键帧提取和优化算法[J/OL].华东理工大学学报(自然科学版):1-8[2020-11-05].https://doi.org/10.14135/j.cnki.1006-3080.20191201002.
[6] https://me.csdn.net/cungudafa这位小姐姐的博客
附录:
1、数字视音频处理知识点小结