c++ string 函数优化后，执行反而耗时很高，求解惑

#include <iostream>

//优化前函数 remove_ctrl
std::string remove_ctrl(std::string s) {
    std::string result;
    for(int i = 0;i < s.length();i++) {
        if(s[i] >= 0x20)
            result = result + s[i];
    }
    return result;
}

//优化后函数 remove_ctrl_opt
std::string remove_ctrl_opt(const std::string& src,std::string& dst) {
    int n = src.size();
    dst.reserve(n);
    for(auto it = src.begin();it != src.end();it++) {
        if(*it >= 0x20)
            dst +=  *it;
    }
    return dst;
}

int main(){
    std::string src = "hello,world,hello,world,hello,world,hello,world,hello,world,hello,world,hello,world,hello,world";
    std::string result;
    for(int i = 0;i < 1000000;i++)
        std::string result = remove_ctrl(src);

    for(int i = 0;i < 1000000;i++)
        remove_ctrl_opt(src,result);
    
    return 0;
}

优化函数，从参数拷贝，string 提前 reserve,减少临时变量,结果分别循环 1000000 次，使用 time ./a.out 测试执行时间，优化后的耗时反而很高（优化前 6s 左右，优化后 60s 没执行完直接 control+c 了）。

排查原因，发现将函数 remove_ctrl_opt 的返回类型修改为 void,函数体 return;优化后耗时提升 6 倍左右。这到底是什么原因？ return string ，难道开销这么大？但是优化前函数也是这样，执行才 6s 。

lovelylain

2023-10-28 16:56:50 +08:00

因为你这优化出 bug 了，原来的版本编译器本身会做返回值优化，所以返回 string 没有性能问题，你优化为传入引用后，每次调用复用同一个对象且没有 clear ，string 越来越长占用内存越来越多，不慢才怪了。

cnbatch

2023-10-28 17:41:59 +08:00

第二个优化是错的，楼上各位已经给出错误原因。
如果真想优化，可以这么写：
std::string& remove_ctrl_opt(const std::string& src,std::string& dst)
没错，也就是返回值是个引用。
然后记得 dst.clear() 再 dst.reserve(n)。

不过看着还是很累赘，不如这样写：

std::string remove_ctrl_a(const std::string& s)
{
std::string result;
std::copy_if(s.begin(), s.end(), std::back_inserter(result), [](auto ch){ return ch >= 0x20; } );
return result;
}

或者：

std::string& remove_ctrl_b(const std::string& s, std::string& dst)
{
dst.clear();
dst.reserve(s.size());
std::copy_if(s.begin(), s.end(), std::back_inserter(dst), [](auto ch){ return ch >= 0x20; } );
return dst;
}

经 O3 选项优化后，remove_ctrl_b 是最快的，比起手动使用迭代器更快。

至于 remove_ctrl_a 和修改正确后的 remove_ctrl_opt 谁更快，那得视乎优化选项，以及编译器版本。在三大编译器（ MSVC, GCC Clang) 不同优化选项都测了下，这两个互有胜负。

xiangyuecn

2023-10-28 19:30:37 +08:00

100 万次，为啥 js 只需 1 秒😂

remove_ctrl=(s)=>{
var result="";
for(var i = 0;i < s.length;i++) {
if(s.charCodeAt(i) >= 0x20)
result = result + s.charAt(i);
}
return result;
}

main=()=>{
var src = "hello,world,hello,world,hello,world,hello,world,hello,world,hello,world,hello,world,hello,world";
var result;
for(var i = 0;i < 1000000;i++)
result = remove_ctrl(src);

return result;
}

console.time(1)
console.log(main())
console.timeEnd(1)

cnbatch

2023-10-28 20:38:51 +08:00

@xiangyuecn 机器型号大家都没说，差异可以很大的
再说了，JS 解释器也有 RVO 吧

我自己尝试了好几台机器、不同的系统，原始版本的写法从用时 16 秒到用时 1.7 秒都有，优化后的版本从 124 毫秒到 1.4 秒都有，脱离具体机器谈速度就很不可靠了

20230710

2023-10-29 01:33:28 +08:00

@cnbatch 不是很懂 c++, debug 模式下测试的, remove_ctrl_b 慢出的这些速度是不是 predicate 函数调用产生的. O3 选项能优化掉吗

// 901 milliseconds
std::string remove_ctrl(const std::string &s) {
std::string result;
for (int i = 0; i < s.length(); i++) {
if (s[i] >= 0x20)
result.push_back(s[i]);
}
return result;
}

// 1648 milliseconds
std::string remove_ctrl_b(const std::string &s) {
std::string dst;
dst.clear();
dst.reserve(s.size());
std::copy_if(s.begin(), s.end(), std::back_inserter(dst), [](auto ch) { return ch >= 0x20; });
return dst;
}

csfreshman

2023-10-29 10:47:09 +08:00

@crystalyouth @20230710 @cnbatch @xiangyuecn @Inn0Vat10n @lovelylain @exiledkingcc @agagega @liberize benchmark 测试，符合预期，感谢各位大佬们指出。改进点：
1.传参 dst ，无需 return dst,未优化的函数返回之所以不耗时，因为有编译器优化&&局部变量不会一直追加
2.优化函数里传引用参，函数体内未 clear,依赖外部传入变量
3.copy 逻辑，可以使用 std::copy_if

这是一个专为移动设备优化的页面（即为了让你能够在 Google 搜索结果里秒开这个页面），如果你希望参与 V2EX 社区的讨论，你可以继续到 V2EX 上打开本讨论主题的完整版本。

https://www.v2ex.com/t/986260

V2EX 是创意工作者们的社区，是一个分享自己正在做的有趣事物、交流想法，可以遇见新朋友甚至新机会的地方。

V2EX is a community of developers, designers and creative people.