在Bash中遍历文件内容

技术背景

在Bash脚本编程中，经常需要对文件内容进行逐行处理。例如，读取配置文件、处理日志文件等。不同的遍历方法有不同的特点和适用场景，了解这些方法可以帮助我们更高效地完成文件处理任务。

实现步骤

基本的while循环

1
2
3

while read p; do
    echo "$p"
done < peptides.txt

此方法有一些副作用，如会去除行首的空白字符、解释反斜杠序列，并且如果文件的最后一行没有换行符，会跳过最后一行。

避免副作用的while循环

while IFS="" read -r p || [ -n "$p" ]
do
    printf '%s\n' "$p"
done < peptides.txt

这里IFS=""防止去除行首空白，-r选项避免解释反斜杠序列，|| [ -n "$p" ]确保即使最后一行没有换行符也能被处理。

当循环体可能从标准输入读取时

1
2
3

while read -u 10 p; do
    ...
done 10<peptides.txt

这里使用了不同的文件描述符（如10）来打开文件，避免与标准输入冲突。

使用管道和while循环

cat peptides.txt | while read line
do
    # do something with $line here
done

此方法如果文件最后一行没有换行符，会跳过最后一行。可以通过以下方式避免：

cat peptides.txt | while read line || [[ -n $line ]];
do
    # do something with $line here
done

使用for循环

1	`for word in $(cat peptides.txt); do echo $word; done`

这种方式适合处理没有空格的文件。如果文件中有不想分割的空格，可以这样处理：

1	`OLDIFS=$IFS; IFS=$'\n'; for line in $(cat peptides.txt); do cmd_a.sh $line; cmd_b.py $line; done > outfile.txt; IFS=$OLDIFS`

读取分隔文件

1
2
3

while IFS=: read -r field1 field2 field3; do
    # process the fields
done < input.txt

这里以:为分隔符，读取文件的每一行并分割成多个字段。

从另一个命令的输出读取

1
2
3

while read -r line; do
    # process the line
done < <(command ...)

这种方式比command ... | while read -r line; do ...更好，因为这里的while循环在当前shell中运行，而不是子shell。

读取以空字符分隔的输入

1
2
3

while read -r -d '' line; do
    # logic
done < <(find /path/to/dir -print0)

同时读取多个文件

1
2
3

while read -u 3 -r line1 && read -u 4 -r line2; do
    # process the lines
done 3< input1.txt 4< input2.txt

将整个文件读入数组（Bash 4之前的版本）

1
2
3

while read -r line; do
    my_array+=("$line")
done < my_file

如果文件最后一行没有换行符：

1
2
3

while read -r line || [[ $line ]]; do
    my_array+=("$line")
done < my_file

将整个文件读入数组（Bash 4及以后的版本）

readarray -t my_array < my_file
# 或者
mapfile -t my_array < my_file

for line in "${my_array[@]}"; do
    # process the lines
done

使用xargs

1	`cat peptides.txt \| xargs -I % sh -c "echo %"`

xargs强大且适合在命令行使用，还可以使用-t添加详细信息，使用-p进行验证。

使用head和tail读取指定行

TOTAL_LINES=`wc -l $USER_FILE | cut -d " " -f1 `
for (( i=1 ; i <= $TOTAL_LINES; i++ ))
do
    LINE=`head -n$i $USER_FILE | tail -n1`
    echo $LINE
done

使用sed和wc遍历文件

end=$(wc -l peptides.txt | cut -d ' ' -f 1)
currentLine=1
while [ $currentLine -le $end ]
do
    echo $(sed -n "${currentLine}p" peptides.txt)
    let currentLine++
done

核心代码

以下是一个完整的示例，展示如何使用最健壮的方法逐行读取文件并保留所有空格：

1
2
3

while IFS= read -r line || [[ -n $line ]]; do
    printf "'%s'\n" "$line"
done < /tmp/test.txt

最佳实践

当需要保留行首和行尾的空格时，使用IFS= read -r。
为了确保即使最后一行没有换行符也能被处理，使用|| [[ -n $line ]]。
当循环体可能从标准输入读取时，使用不同的文件描述符打开文件。
处理大文件时，考虑性能问题，如使用awk代替while read进行文件分割。

常见问题

最后一行被跳过：许多基于read内置命令的方法，如果文件最后一行没有换行符，会跳过最后一行。可以使用while IFS= read -r line || [[ -n $line ]];来避免。
空格被分割：使用for word in $(cat peptides.txt)时，如果文件中有空格，会按空格分割单词。可以通过设置IFS=$'\n'来解决。
性能问题：在处理大文件时，while read可能会比较慢，可以考虑使用awk等工具。

后端开发 > 脚本编程

#后端开发 #Bash #Shell脚本 #循环处理 #文件遍历

在Bash中遍历文件内容

https://119291.xyz/posts/looping-through-file-content-in-bash/

作者

发布于

2025年5月21日

许可协议

史上最大规模1.4亿中文知识图谱开源下载上一篇

Macast：基于mpv的跨平台DLNA媒体渲染器应用下一篇