Linux基础——BClinux8.2 排查vmcore异常宕机问题

news/2024/10/7 22:26:44

 

一、无法/var/crash生成文件

1、参考配置:

https://cloud.tencent.cn/developer/article/2367955

 

2、BCoe8.2调整配置

 

 

 

3、手动生成crash

i.参考:参数详解

https://blog.csdn.net/tombaby_come/article/details/134038949

echo 1 > /proc/sys/kernel/sysrq

echo c > /proc/sysrq-trigger

注意:执行上述配置,主机重启,开始转储内存中数据到/var/crash目录中。

 

4、检查kdump

i.参考:kdump原理

https://zhuanlan.zhihu.com/p/684699511

 

二、crash工具和vmlinux内核一致性检查

1、检查/boot/vmlinuz-4.19.0-240.23.35.el8_2.bclinux.x86_64和/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux的md5值必需保持一致

 

2、主机内核vmlinux位置

/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

 

3、异常宕机vmcore文件所在位置

/var/crash/127.0.0.1-2024-05-06-03\:24\:36/vmcore

 

 

 

三、分析vmcore

 

1、crash工具打开vmcore

 

[root@NewOSBC8 127.0.0.1-2024-05-06-03:24:36]# crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinuxcrash 7.2.7-3.el8.1
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...WARNING: kernel relocated [178MB]: patching 97096 gdb minimal_symbol valuesKERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinuxDUMPFILE: vmcore  [PARTIAL DUMP]CPUS: 2DATE: Mon May  6 03:24:31 2024UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03TASKS: 346NODENAME: NewOSBC8.2RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023MACHINE: x86_64  (1796 Mhz)MEMORY: 2 GBPANIC: "sysrq: SysRq : Trigger a crash"PID: 2289COMMAND: "bash"TASK: ffff8d1122bf0000  [THREAD_INFO: ffff8d1122bf0000]CPU: 0STATE: TASK_RUNNING (SYSRQ)crash> bt
PID: 2289   TASK: ffff8d1122bf0000  CPU: 0   COMMAND: "bash"#0 [ffffa2ab80cefbe8] machine_kexec at ffffffff8c25fabe#1 [ffffa2ab80cefc40] __crash_kexec at ffffffff8c3658ba#2 [ffffa2ab80cefd00] crash_kexec at ffffffff8c36678d#3 [ffffa2ab80cefd18] oops_end at ffffffff8c2259fd#4 [ffffa2ab80cefd38] no_context at ffffffff8c26fd4e#5 [ffffa2ab80cefd90] do_page_fault at ffffffff8c270872#6 [ffffa2ab80cefdc0] page_fault at ffffffff8cc0122e[exception RIP: sysrq_handle_crash+18]RIP: ffffffff8c74eb12  RSP: ffffa2ab80cefe78  RFLAGS: 00010246RAX: ffffffff8c74eb00  RBX: 0000000000000063  RCX: 0000000000000000RDX: 0000000000000000  RSI: ffff8d1131017108  RDI: 0000000000000063RBP: 0000000000000004   R8: 00000000000005ce   R9: 000000000000002dR10: 0000000000000000  R11: ffffa2ab80cefd30  R12: 0000000000000000R13: 0000000000000000  R14: ffffffff8d53c3e0  R15: 0000000000000000ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018#7 [ffffa2ab80cefe78] __handle_sysrq.cold.10 at ffffffff8c74f6f8#8 [ffffa2ab80cefea8] write_sysrq_trigger at ffffffff8c74f5bb#9 [ffffa2ab80cefeb8] proc_reg_write at ffffffff8c55de29
#10 [ffffa2ab80cefed0] vfs_write at ffffffff8c4e0db5
#11 [ffffa2ab80ceff00] ksys_write at ffffffff8c4e102f
#12 [ffffa2ab80ceff38] do_syscall_64 at ffffffff8c2041ab
#13 [ffffa2ab80ceff50] entry_SYSCALL_64_after_hwframe at ffffffff8cc000adRIP: 00007f515c78ab28  RSP: 00007ffc1172a678  RFLAGS: 00000246RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f515c78ab28RDX: 0000000000000002  RSI: 000055b65d8c05c0  RDI: 0000000000000001RBP: 000055b65d8c05c0   R8: 000000000000000a   R9: 00007f515c81bc80R10: 000000000000000a  R11: 0000000000000246  R12: 00007f515ca5b6c0R13: 0000000000000002  R14: 00007f515ca56880  R15: 0000000000000002ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> dis -l sysrq_handle_crash+18
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> dis -l 0xffffffff8c74eb12
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> kmem -iPAGES        TOTAL      PERCENTAGETOTAL MEM   458790       1.8 GB         ----FREE   194411     759.4 MB   42% of TOTAL MEMUSED   264379         1 GB   57% of TOTAL MEMSHARED    50717     198.1 MB   11% of TOTAL MEMBUFFERS      530       2.1 MB    0% of TOTAL MEMCACHED   103545     404.5 MB   22% of TOTAL MEMSLAB    31239       122 MB    6% of TOTAL MEMTOTAL HUGE        0            0         ----HUGE FREE        0            0    0% of TOTAL HUGETOTAL SWAP   532479         2 GB         ----SWAP USED        0            0    0% of TOTAL SWAPSWAP FREE   532479         2 GB  100% of TOTAL SWAPCOMMIT LIMIT   761874       2.9 GB         ----COMMITTED   511634         2 GB   67% of TOTAL LIMIT
crash> sysKERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinuxDUMPFILE: vmcore  [PARTIAL DUMP]CPUS: 2DATE: Mon May  6 03:24:31 2024UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03TASKS: 346NODENAME: NewOSBC8.2RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023MACHINE: x86_64  (1796 Mhz)MEMORY: 2 GBPANIC: "sysrq: SysRq : Trigger a crash"
crash> p cpu_info:1
per_cpu(cpu_info, 1) = $1 = {x86 = 23 '\027',x86_vendor = 2 '\002',x86_model = 104 'h',x86_stepping = 1 '\001',x86_tlbsize = 3072,x86_virt_bits = 48 '0',x86_phys_bits = 45 '-',x86_coreid_bits = 0 '\000',cu_id = 255 '\377',extended_cpuid_level = 2147483680,cpuid_level = 16,x86_capability = {126614527, 802421759, 0, 129319184, 4277678595, 0, 4195321, 376123396, 557056, 563872169, 15, 0, 0, 17584641, 4, 0, 4194308, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 229696, 0},x86_vendor_id = "AuthenticAMD\000\000\000",x86_model_id = "AMD Ryzen 7 5700U with Radeon Graphics\000        \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",x86_cache_size = 512,x86_cache_alignment = 64,x86_cache_max_rmid = -1,x86_cache_occ_scale = -1,x86_power = 256,loops_per_jiffy = 1796624,x86_max_cores = 1,apicid = 2,initial_apicid = 2,x86_clflush_size = 64,booted_cores = 1,phys_proc_id = 2,logical_proc_id = 1,cpu_core_id = 0,cpu_index = 1,microcode = 0,x86_cache_bits = 45 '-',initialized = 1,cpuinfo_x86_extended_size_rh = 0,_rh = {cpu_die_id = 0,logical_die_id = 1,vmx_capability = {0, 0, 0}}
}
crash>  ps 1489PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM1489   1382   0  ffff8d110eb20000  IN  11.9 3106588 249348  llvmpipe-1
crash>

 

crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vm                          linux

vmcore生成时间:DATE: Mon May  6 03:24:31 2024

中断原因:PANIC: "sysrq: SysRq : Trigger a crash"

 

2、查看中断寄存器地址和函数RIP

i.分析当时正在运行哪些应用调用函数sysrq_handle_crash,导致中断卡死问题;

ii.参考:

https://blog.csdn.net/weixin_43564241/article/details/130692946

 

3、查看用户层应用的调用代码

i.通过“[exception RIP: sysrq_handle_crash+18]”标黄部分查看调用代码;

 

 

4、查看宕机时内存使用情况

 

5、用户侧触发

i.手动触发了内存中数据的转储到/var/crash中。

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.ryyt.cn/news/28198.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈,一经查实,立即删除!

相关文章

数仓(一)数仓概述

前几次内容分享,我们一共做了三件事: 1.逐步搭建了传统hadoop大数据平台包括zookeeper、kafka、flume集群 2.采集前端埋点到hdfs 3.采集业务数据到hdfs 这样数据都采集到HDFS上了,其实就是到了数仓了!数据采集请看以下链接:前端埋点数据采集系列 一、采集系统架构设计 二、…

说说你对盒子模型的理解?

一、是什么 当对一个文档进行布局(layout)的时候,浏览器的渲染引擎会根据标准之一的 CSS 基础框盒模型(CSS basic box model),将所有元素表示为一个个矩形的盒子(box) 一个盒子由四个部分组成:content、padding、border、margincontent,即实际内容,显示文本和图像 b…

python教程5:函数编程

函数编程 特性: 1、减少重复代码 2、让程序变的可扩展 3、使程序变得易维护 定义: 默认参数 要求:默认参数放在其他参数后边 指定参数(调用的时候) 正常情况下,给函数传参数要按顺序,不想按顺序就可以⽤指定参数,只需指定参数名即可,但记住⼀个要求就是,指定参数必须放…

linux系统CentOS下安装snmp服务

使用yum安装1.直接使用yum安装snmp*yum install -y net-snmp net-snmp-utils*2.可能碰到的报错3.按照提示安装依赖*yum install libmysqlclient.so.18* 4.要是还有报错,就按照提示执行*yum install -y net-snmp net-snmp-utils --skip-broken*5.其他安装好的上面是四个包,缺…

Nftables漏洞原理分析(CVE-2022-32250)

在nftales中存在着集合(sets),用于存储唯一值的集合。sets 提供了高效地检查一个元素是否存在于集合中的机制,它可以用于各种网络过滤和转发规则。而CVE-2022-32250漏洞则是由于nftables在处理set时存在uaf的漏洞。前言 在nftales中存在着集合(sets),用于存储唯一值的集合。…

YOLO-World环境搭建推理测试

一、引子 CV做了这么多年,大多是在固定的数据集上训练,微调,测试。突然想起来一句话,I have a dream!就是能不能不用再固定训练集上捣腾,也就是所谓的开放词汇目标检测(OVD)。偶尔翻翻AI新闻,发现现在CV领域有在卷开集目标检测的趋势。刚好翻到,YOLO-World这一开源项…

如何查找Lenovo XClarity Controller 的 MIB 文件

描述 本文介绍了为运行Lenovo XClarity Controller (LXCC) 的Lenovo服务器查找和下载 MIB 文件的过程。 程序转至数据中心支持。 lenovo .com 。 在搜索栏中,输入Lenovo服务器型号名称,然后单击自动搜索结果中正确服务器下的“下载” 。注意:在此示例中,将使用 SR650。 在“…

allure功能使用-添加链接linktestcaseissue

1.执行指定测试用例时,在测试方法前添加注解@allure.link跳转到执行地址: 在HTML报告可看到跳转信息: 2.执行指定测试用例时,需要知道测试案例所在代码仓库地址时,在其方法前添加注解@allure.testcase跳转仓库地址(用于代码走读): 3.执行指定测试用例时,需要将该用例…