1.清除尾部参考文献和头部封面
2.去除水印
3.去除右边所有船锚(宏),清除嵌入对象(宏),去除表格(宏),去除所有图形
Sub DeleteAllShapes()Dim shp As ShapeFor Each shp In ActiveDocument.Shapesshp.DeleteNext shp
End Sub
Sub DeleteAllInlineShapesAndShapes()' 删除所有嵌入对象Dim ils As InlineShapeFor Each ils In ActiveDocument.InlineShapesils.DeleteNext ils' 删除所有形状Dim shp As ShapeFor Each shp In ActiveDocument.Shapesshp.DeleteNext shp
End Sub
Sub DeleteAllTables()Dim tbl As TableFor Each tbl In ActiveDocument.Tablestbl.DeleteNext tbl
End Sub
Sub RemoveAllShapes()Dim shp As ShapeDim ils As InlineShape' 删除嵌入式图形For Each ils In ActiveDocument.InlineShapesils.DeleteNext ils' 删除浮动图形For Each shp In ActiveDocument.Shapesshp.DeleteNext shp
End Sub
4.转换为pdf裁剪顶部页码,转换为word
Microscopic Office Word另存为pdf,裁剪,PDF转word
5.清理
5.1去除括号及其内容
Sub DeleteParenthesesAndContent()Dim doc As DocumentDim rng As RangeSet doc = ActiveDocumentSet rng = doc.ContentWith rng.Find.Text = "\(*\)".Replacement.Text = "".Forward = True.Wrap = wdFindContinue.Format = False.MatchWildcards = True.Execute Replace:=wdReplaceAllEnd With
End Sub
5.2删除“图1-8”这种格式的文字
Ctrl + H > 更多 > 通配符 > 查找:图[0-9]{1,}-[0-9]{1,}
> 替换为空,然后把“图”改成“表”再清理一遍。
5.3清除“续表”样文字,清除“三 、”这种样式的文字
查找替换:[0-9一二三四五六七八九十] 、
,
5.4清除“选择题参考答案:”这句话所在段落
Sub DeleteParagraphsContainingText()Dim para As ParagraphDim searchText As String' 设置要查找的文本searchText = "选择题参考答案:"' 遍历所有段落For Each para In ActiveDocument.ParagraphsIf InStr(para.Range.Text, searchText) > 0 Thenpara.Range.DeleteEnd IfNext para
End Sub
5.5手动删除所有课后习题,手动删除残余船锚
6.去除空格,去除换行
分成一栏,复制到记事本,执行python脚本
with open("新建文本文档.txt", 'r', encoding='utf-8') as file:text = file.read()text = text.replace('\n', '')with open('output.txt', 'w', encoding='utf-8') as file:file.write(text)