为什么C++从标准输入读取行比Python慢很多？

技术背景

在编程过程中，从标准输入读取数据是常见的操作。然而，不同编程语言在处理这一操作时可能会有不同的性能表现。有人对比了C++和Python从标准输入读取行的速度，发现C++代码比Python代码慢一个数量级，这引发了对背后原因的探讨。

实现步骤

C++代码示例

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

编译命令：

1	`g++ -O3 -o readline_test_cpp foo.cpp`

Python代码示例

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

测试结果

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

核心代码分析

C++慢的原因

默认情况下，cin与stdio同步，这会导致它避免任何输入缓冲。每次读取字符时都会进行系统调用，当读取大量行时，会引入大量开销。例如：

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

如果cin读取的输入超过实际需要，那么第二个整数值将无法用于scanf函数，因为它们有独立的缓冲区。为避免这种情况，默认情况下流与stdio同步，这会带来性能损失。

解决方案

可以通过关闭同步来提高性能，在main函数顶部添加如下代码：

1	`std::ios_base::sync_with_stdio(false);`

还可以设置更大的缓冲区：

1
2
3

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

其他优化方式

使用fgets函数也能提高性能：

const int buffer_size = 500 * 1024;
std::vector<char> buffer(buffer_size);
int size;
long line_count = 0;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

最佳实践

当不需要cin与stdio同步时，关闭同步以提高性能。
如果需要处理大量数据，考虑设置更大的缓冲区或使用fgets函数。
避免不必要的中间工具，如cat，使用重定向直接将文件传递给程序进行测试：

1	`$ /usr/bin/time program_to_benchmark < big_file`

常见问题

为什么C++的行数比Python多一行？

eof标志只有在尝试读取超过文件末尾时才会设置。正确的循环应该是：

1
2
3

while (getline(cin, input_line)) {
    line_count++;
}

使用`cat`进行测试有什么问题？

使用cat会增加不必要的开销，并且可能会影响测试结果的准确性。因为time命令显示的CPU使用情况是cat的，而不是被测试程序的。同时，cat可能会进行输入和输出缓冲以及其他处理，从而影响测试结果。建议直接使用重定向将文件传递给程序进行测试。