yolov5内存分布分析转载

news/2024/10/11 2:25:28

yolov5内存分布分析

Transpose输出分析

假设batch_size为1，yolov5有三个输出，shape分别是：

（1,3,80,80,85)
（1,3,40,40,85)
（1,3,20,20,85)

其中3代表anchor数量，20*20代表feature_map大小，85代表boundbox的(x,y,w,h,c+80个类别的概率)

其中(x,y,w,h,c+80个类别的概率)在内存中是连续分布的，即：

（1,3,20,20,85)整个数组在内存分布中也是连续分布的，

（0,0,0,0,0）->x->第一个anchor在第一个cell对应的boundingbox的x
（0,0,0,0,1）->y->第一个anchor在第一个cell对应的boundingbox的y
（0,0,0,0,2）->w->第一个anchor在第一个cell对应的boundingbox的w
（0,0,0,0,3）->h->第一个anchor在第一个cell对应的boundingbox的h
......
（0,0,0,1,0）->x->第一个anchor在第二个cell对应的boundingbox的x
（0,0,0,1,1）->y->第一个anchor在第二个cell对应的boundingbox的y
（0,0,0,1,2）->w->第一个anchor在第二个cell对应的boundingbox的w
（0,0,0,1,3）->h->第一个anchor在第二个cell对应的boundingbox的h
......
（0,1,0,0,0）->x->第二个anchor在第一个cell对应的boundingbox的x
（0,1,0,0,1）->y->第二个anchor在第一个cell对应的boundingbox的y
（0,1,0,0,2）->w->第二个anchor在第一个cell对应的boundingbox的w
（0,1,0,0,3）->h->第二个anchor在第一个cell对应的boundingbox的h

即：

后处理代码分析

# 从第一个anchor开始获取
for (int q = 0; q < num_anchors; q++){const float anchor_w = anchors[q * 2];const float anchor_h = anchors[q * 2 + 1];const ncnn::Mat feat = feat_blob.channel(q);#从第一个cell开始获取for (int i = 0; i < num_grid_y; i++){for (int j = 0; j < num_grid_x; j++){const float* featptr = feat.row(i * num_grid_x + j);#第5个是box_confidence值，需要使用sigmoid函数求值float box_confidence = sigmoid(featptr[4]);if (box_confidence >= prob_threshold){# 之所以这么写是因为可以减少sigmoid(class_score)的次数，sigmoid较为耗时#find class index with max class scoreint class_index = 0;float class_score = -FLT_MAX;for (int k = 0; k < num_class; k++){# box_confidence之后是每个类别的概率float score = featptr[5 + k];if (score > class_score){class_index = k;class_score = score;}}#论文规定float confidence = box_confidence * sigmoid(class_score);if (confidence >= prob_threshold){# 依次获取x,y,w,hfloat dx = sigmoid(featptr[0]);float dy = sigmoid(featptr[1]);float dw = sigmoid(featptr[2]);float dh = sigmoid(featptr[3]);# 其余部分省略，可以参考ncnn代码.......}}}}}

====================================================================================

Conv输出分析

NPU对算法进行加速处理时，shape算子，如reshape、transpose通常不支持加速，有两种解决方法，

使用C/C++语言重新实现reshape、transpose算子功能，使用CPU进行处理（待完善）
直接按照conv层的输出内存分布获取数据进行处理

假设batch_size为1，卷积层的输出shape为：

（1,255,80,80)
（1,255,40,40)
（1,255,20,20)

其中255表示3*85，3代表anchor数量，，85代表boundbox的(x,y,w,h,c+80个类别的概率)，20x20代表feature_map大小。

其中(x,y,w,h,c+80个类别的概率)在内存中是连续分布的，即：

（1,255,20,20)整个数组在内存分布中也是连续分布的，

（0,0,0,0）->x->第一个anchor在第一个cell对应的boundingbox的x
（0,0,0,1）->x->第一个anchor在第二个cell对应的boundingbox的x
（0,0,0,2）->x->第一个anchor在第三个cell对应的boundingbox的x
......
（0,1,0,0）->x->第一个anchor在第一个cell对应的boundingbox的y
（0,1,0,1）->x->第一个anchor在第二个cell对应的boundingbox的y
（0,1,0,2）->x->第一个anchor在第三个cell对应的boundingbox的y
......
（0,85,0,0）->x->第一个anchor在第一个cell对应的boundingbox的y
（0,85,0,1）->x->第二个anchor在第二个cell对应的boundingbox的y
（0,85,0,2）->x->第二个anchor在第三个cell对应的boundingbox的y
....

即：

后处理代码分析

# 从第一个cell开始
for(int shiftY = 0; shiftY < gridY; shiftY++){for(int shiftX = 0; shiftX < gridX; shiftX++){# 从第一个anchor开始for(int i = 0; i < 3; i++){pRecord = pMatData[i];# 获取当前cellint pindex = shiftY* gridX + shiftX;# coordindex的坐标对应xint coordindex = pindex;# 指针移动到ypindex = pindex + gridX * gridY;# 指针移动到wpindex = pindex + gridX * gridY;# 指针移动到hpindex = pindex + gridX * gridY;# 指针移动到Cpindex = pindex + gridX * gridY;# 获取C的值float  precord4 = sigmoid(pRecord[pindex]);# 指针移动到Ppindex = pindex + gridX * gridY ;for (cls = 0; cls < classNum; cls++){#获取P的值float  precord5 = sigmoid(pRecord[pindex]);#指针移动到P1pindex = pindex + gridX * gridY;score = precord5 * precord4;if (score > gYolov7Para.confidenceThreshold){//大于设置的阈值# 获取xfloat  precord0 = sigmoid(pRecord[coordindex]);coordindex = coordindex + gridX * gridY;# 获取y	float  precord1 = sigmoid(pRecord[coordindex]);coordindex = coordindex + gridX * gridY;# 获取w	float  precord2 = sigmoid(pRecord[coordindex]);coordindex = coordindex + gridX * gridY;# 获取h	float  precord3 = sigmoid(pRecord[coordindex]);coordindex = coordindex + gridX * gridY;# 其余部分省略.......}}}}}

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.ryyt.cn/news/42763.html

如若内容造成侵权/违法违规/事实不符，请联系我们进行投诉反馈，一经查实，立即删除！

【CMake系列】10-cmake测试集成googletest与第三方库自动化构建

cmake测试，使用ctest 可能不能满足我们的需求，需要我们使用更为强大的第三方测试框架，如googletest，完成项目中的测试工作本篇文章将第三方测试框架 googletest，引入，同时也可以作为关于第三方包自动化构建的很好示例，值得学习本专栏的实践代码全部放在 github 上，…

【CMake系列】10-cmake测试 ctest

cmake作为一个强大的构建系统指导工具，同时也提供了测试功能，可用于项目的单元测试等，也可以与其他测试框架协作，如googletest，共同完成项目开发中的测试工作，本节我们就来学习如何借助cmake 完成测试本专栏的实践代码全部放在 github 上，欢迎 star !!! 如有问题，欢迎…

PS2045L-ASEMI低Low VF肖特基PS2045L

PS2045L-ASEMI低Low VF肖特基PS2045L编辑：ll PS2045L-ASEMI低Low VF肖特基PS2045L 型号：PS2045L 品牌：ASEMI 封装：TO-277 最大平均正向电流（IF）：20A 最大循环峰值反向电压（VRRM）：45V 最大正向电压（VF）：0.24V~0.39V 工作温度：-55C~150C 反向恢复时间：5ns 芯片个数…

mybatis-plus加载多个module的mapper踩坑记录

想要多个module中的mapper.xml文件都被加载到，配置文件中mybatis-plus.mapper-locations这一项必须以classpath*:开头，否则就只会加载匹配到的第一个module中的指定目录背景有一个多模块的项目，每个模块中都有自己的mapper.xml文件。但是在执行一次SQL查询中，mybatis却报出…

上一篇博客写了 Quasar+Cordova开发打包移动端app ，今天来写一下Quasar+Electron如何开发打包为桌面端exe。 Electron 英文文档：https://www.electronjs.org/docs/latest/ Electron 中文文档：https://electron.nodejs.cn/ Quasar+Electron文档：https://www.quasar-cn.cn/q…

带你走进信息安全软件架构

经纬恒润车端信息安全解决方案整合了 MCU 端以及 MPU 端的信息安全解决方案，具体方案包括 Security Boot、安全通信、安全存储、安全诊断和入侵检测等，能满足欧标强制法规要求和国内信息安全法规要求，符合欧标出口要求的车载信息安全技术架构。汽车信息安全逐步受到重…

爬虫 | 防盗链和代理

防盗链referer：一种反爬方式。一些网站在响应之前会先溯源，检查请求的网址X，是从哪个链接进入的（即上一级网址是谁），比如：通过网址A--->进入网址X，那么上一级就是A。如果发现上一级网址不存在，或者错误，则认为是其他歪门邪道来的，就不给你数据。这个就是防盗链r…

【接口自动化测试框架练习】springboot+react+mysql～极简版postman

可以说是一个toy program，chatgpt完成了一部分工作，我也完成了一部分工作，我俩合作的，我占百分之80%，他百分之20%，哈哈没他不行，源码奉上。https://github.com/Jinwenxin/test-api-frontend 1.功能简介：分成三部分，如左侧导航栏所示：测试用例管理：测试用例的增删改…