适应版本:
社区版本OCP:4.2.2-20240315150922
背景描述
- OCP纳管主机后进行主机标准化时,set clock source一直没有成功
分析过程
Bash 2024-05-1014:44:37.552 INFO 823423 ---[pool-manual-subtask-executor16,82ea1ce829564495,4c251a8e816d]c.o.o.e.internal.template.HttpTemplate : POST request to agent, url:http://10.186.61.51:62888/api/v1/system/setClockSource,request body:SetClockSourceRequest(sourceType=tsc), params:null
2024-05-10 14:44:37.565 ERROR 823423 ---[pool-manual-subtask-executor16,82ea1ce829564495,4c251a8e816d]c.o.o.c.c.i.r.methods.RepairClockSource : set clock source to tsc failed:[AgentClient]:http request is failed, response:Unexpected error: symlink/usr/lib/systemd/system/set_clocksource.service/etc/systemd/system/multi-user.target.wants/set_clocksource.service: fileexists
2024-05-10 14:44:37.586 ERROR 823423 ---[pool-manual-subtask-executor16,82ea1ce829564495,4c251a8e816d]c.o.o.c.c.i.h.SystemCheckerHelperImpl : Failed to repair 277. Please see the log for details
2024-05-10 14:44:37.592 ERROR 823423 ---[pool-manual-subtask-executor16,82ea1ce829564495,4c251a8e816d]c.o.ocp.core.util.ExceptionUtils : Checked Exception:com.oceanbase.ocp.core.exception.UnexpectedException occurred with codeerror.common.unexpected, and args [4]
2024-05-10 14:44:37.597 ERROR 823423 ---[pool-manual-subtask-executor16,82ea1ce829564495,4c251a8e816d]c.o.o.c.t.e.c.w.subtask.SubtaskExecutor : An unknown error has occurred. Cause: 4. Error message: {1}. Contactthe administrator.
com.oceanbase.ocp.core.exception.UnexpectedException: [OCPUnexpectedException]: status=500 INTERNAL_SERVER_ERROR,errorCode=COMMON_UNEXPECTED, args=4 atsun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) atsun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) atjava.lang.reflect.Constructor.newInstance(Constructor.java:423) atcom.oceanbase.ocp.core.util.ExceptionUtils.newException(ExceptionUtils.java:96) atcom.oceanbase.ocp.core.util.ExceptionUtils.throwException(ExceptionUtils.java:90) atcom.oceanbase.ocp.core.util.ExceptionUtils.unExpected(ExceptionUtils.java:71) atcom.oceanbase.ocp.compute.checker.internal.task.RepairCheckItemTask.run(RepairCheckItemTask.java:59) atcom.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.execute(JavaSubtaskRunner.java:64) atcom.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.doRun(JavaSubtaskRunner.java:32) atcom.oceanbase.ocp.core.task.engine.runner.JavaSubtaskRunner.run(JavaSubtaskRunner.java:26) atcom.oceanbase.ocp.core.task.engine.runner.RunnerFactory.doRun(RunnerFactory.java:76) atcom.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.doRun(SubtaskExecutor.java:203) atcom.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.redirectConsoleOutput(SubtaskExecutor.java:197) atcom.oceanbase.ocp.core.task.engine.coordinator.worker.subtask.SubtaskExecutor.lambda$submit$2(SubtaskExecutor.java:134) atjava.util.concurrent.FutureTask.run(FutureTask.java:266) atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolEx ecutor.java:624) atjava.lang.Thread.run(Thread.java:750)
Set state for subtask: 2609, operation:EXECUTE, state: DISREGARDED |
- 查看/usr/lib/systemd/system和/etc/systemd/system/multi-user.target.wants/已经设置了软链接,说明设置了systemd开机启动。
问题结论
Bash OCP 纳管主机时,已经将 set clocksource 会写入到/etc/systemd/system中,但在自动修复时,会重新加载到/etc/systemd/system中,如果自动修复检查时已经有这个文件则报错文件已存在 |
处理方案
Bash 从以上来看OCP 纳管主机时,已经将 setclock source 会写入到/etc/systemd/system中,但在自动修复时,会重新加载到/etc/systemd/system中,如果自动修复检查时已经有这个文件则报错文件已存在 [root@localhost multi-user.target.wants]# systemctl list-unit-files | egrepset_clocksource.service set_clocksource.service enabled [root@localhost multi-user.target.wants]#
--方案 将/etc/systemd/system/multi-user.target.wants/set_clocksource.service重命名 mv /etc/systemd/system/multi-user.target.wants/set_clocksource.service/etc/systemd/system/multi-user.target.wants/set_clocksource.service.bak |
- 白屏再进行修复,发现创建了一个相同的文件链接,同时报错已修复
-
-
补充:
Bash 用OAT部署的会写入在/etc/rc.local中 [root@10-186-57-25 ~]# cat /etc/rc.local #!/bin/bash # THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES # # It is highly advisable to create own systemd services or udev rules # to run scripts during boot instead of using this file. # # In contrast to previous versions due to parallel execution during boot # this script will NOT be run after all other services. # # Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure # that this script will be executed during boot.
touch /var/lock/subsys/local /usr/local/bin/set_deadline.sh echo never > /sys/kernel/mm/transparent_hugepage/enabled /usr/local/sbin/set_nic_irq_ob.sh start echo tsc >/sys/devices/system/clocksource/clocksource0/current_clocksource /usr/local/bin/auto_start_ob.sh >> /var/log/ob.autostart.log2>&1 & /usr/local/bin/set_cpufreq.sh |