輝夜's profileKaguya Houraisan's glori...PhotosBlogListsMore Tools Help

Blog


    October, 2009

    fstream 文件 IO 点滴

    http://dantvt.is-programmer.com/posts/11949.html

    很多时候较大数据量的文件 IO 总是成为瓶颈,为了提高效率,有时想要先将文件大块大块的读入再行处理。下面分析两种惯常的处理手法。

    1. 将文件一次性读入 string 中。

    貌似 std::getline 、 istream::getline 或是 operator<< operator>> 等都不提供一次读到文件结尾的机制,只有 istreambuf_iterator 可以做到:

    ifstream in("input.txt");
    string instr((istreambuf_iterator<char>(in)), istreambuf_iterator<char>());

    string 的构造函数前一个参数要多加一层 () 以免编译器误认为是函数声明 = = ...

    这样读入 string 会随着内容动态增长,空间不足时会触发额外的 realloc 及 copy 操作,为提高效率有必要预分配足够的空间:

    ifstream in("input.txt");
    in.seekg(0, ios::end);
    streampos len = in.tellg();
    in.seekg(0, ios::beg);

    string instr;
    instr.reserve(len);
    instr.assign(istreambuf_iterator<char>(in), istreambuf_iterator<char>());

    2. 将文件一次性读入 stringstream 中。

    filebuf 和 stringbuf 无法直接通过 rdbuf() 重定向,因此从 filebuf 到 stringbuf 需要一次 copy 操作。最简单的方法是直接复制整个 streambuf :

    ifstream in("input.txt");
    stringstream ss;
    ss<<in.rdbuf();

    与 string 的情况相同,这里同样也有一个空间 realloc 及 copy 的问题。但 streambuf 的缓冲区不是那么方便操作的,解决方法是我们给他手动指定一个空间:

    ifstream in("input.txt");
    in.seekg(0, ios::end);
    streampos len = in.tellg();
    in.seekg(0, ios::beg);

    vector<char> buffer(len);
    in.read(&buffer[0], len);

    stringstream ss;
    ss.rdbuf()->pubsetbuf(&buffer[0], len);

    最后再顺便 BS 一下 VC 的 STL = =...

    虽然 VC 的编译器效率没的说,但被 STL 拖后腿的话不就白搭了嘛。在文件 IO 方面 (fstream) 比起 MinGW (GCC 4.4.0) 带的要慢好几倍。GCC 的 fstream 格式化读写效率与 C 的比已经不分伯仲,以后应该还会有进一步的提升空间 (编译时格式控制 vs 执行时)

    另外上面最后一段程序在 VS2008 (VC9.0) 下应该无法得到预想的结果,跟踪进去看了一下,VC 标准库里的 pubsetbuf 函数体居然是空的!内容如下(中间还有一层函数调用):

    virtual _Myt *__CLR_OR_THIS_CALL setbuf(_Elem *, streamsize)
            {       // offer buffer to external agent (do nothing)
            return (this);
            }

    看来是等着我们来继承了啊 = = 。而在 MinGW (GCC 4.4.0) 中可以得到预期的结果。

    Comments (2)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.

    To add a comment, sign in with your Windows Live ID (if you use Hotmail, Messenger, or Xbox LIVE, you have a Windows Live ID). Sign in


    Don't have a Windows Live ID? Sign up

    縁 千日wrote:
    Dropbox很便利呢,
    再次感謝指点♪
    Oct. 12
    卡修 本wrote:
    还好我用IO就是输出个日志而已。。。呵呵,看看就好了,不深究了
    Oct. 6

    Trackbacks

    The trackback URL for this entry is:
    http://dantvt.spaces.live.com/blog/cns!D87988A6CAC0A480!1006.trak
    Weblogs that reference this entry
    • None