为什么在C++中从标准输入读取行比Python慢得多？

技术背景

在编程过程中，我们常常需要从标准输入读取数据。然而，有时会发现C++从标准输入读取行的速度比Python慢很多。这一现象的主要原因在于C++和Python默认设置不同，导致C++需要更多的系统调用。

实现步骤

C++方面

1. 同步问题

默认情况下，cin与stdio同步，这会使它避免任何输入缓冲。要解决这个问题，可以在main函数顶部添加以下代码：

1	`std::ios_base::sync_with_stdio(false);`

此代码可让C++标准流独立缓冲其I/O，在某些情况下能显著提高速度。

2. 设置更大的缓冲区

还可以通过设置更大的缓冲区进一步提升性能，示例代码如下：

1 2	`char buffer[1048576]; std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));`

Python方面

Python在读取标准输入时，默认的输入处理方式相对高效。例如，使用以下简单代码即可统计输入的行数：

import sys
count = 0
for line in sys.stdin:
    count += 1
print(count)

核心代码

C++优化后的读取代码

#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdio>

int main() {
    std::ios_base::sync_with_stdio(false);
    char buffer[1048576];
    std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

    const int buffer_size = 500 * 1024;
    std::vector<char> buffer_vec(buffer_size);
    int size;
    int line_count = 0;

    while ((size = fread(buffer_vec.data(), sizeof(char), buffer_size, stdin)) > 0) {
        line_count += std::count_if(buffer_vec.begin(), buffer_vec.begin() + size, [](char ch) { return ch == '\n'; });
    }

    std::cout << "Line count: " << line_count << std::endl;
    return 0;
}

Python读取代码

import sys
count = 0
for line in sys.stdin:
    count += 1
print(count)

最佳实践

C++

避免不必要的同步：使用std::ios_base::sync_with_stdio(false);关闭同步。
设置合适的缓冲区：根据实际情况设置合适大小的缓冲区。
使用高效的读取函数：如fgets在读取字符串时性能较好。

Python

对于简单的行计数任务，直接使用for line in sys.stdin进行迭代。

常见问题

C++行计数比Python多一行

这是因为C++的eof标志只有在尝试读取超过文件末尾时才会被设置。正确的循环方式如下：

#include <iostream>
#include <string>

int main() {
    std::string input_line;
    int line_count = 0;
    while (std::cin) {
        std::getline(std::cin, input_line);
        if (!std::cin.eof())
            line_count++;
    }
    std::cout << "Line count: " << line_count << std::endl;
    return 0;
}

性能测试使用`cat`的问题

在性能测试中，使用cat命令可能会导致不准确的结果。例如，/usr/bin/time cat big_file | program_to_benchmark实际上计时的是cat的执行时间，而不是被测试程序的时间。更好的做法是使用/usr/bin/time program_to_benchmark < big_file，让shell直接打开文件并将其作为已打开的文件描述符传递给被测试程序。