一个文件夹下有百万个文件，怎么读取？

推荐学习书目

› Learn Python the Hard Way

Python Sites

› PyPI - Python Package Index

› http://diveintopython.org/toc/index.html

› Pocoo

值得关注的项目

› PyPy

› Celery

› Jinja2

› Read the Docs

› gevent

› pyenv

› virtualenv

› Stackless Python

› Beautiful Soup

› 结巴中文分词

› Green Unicorn

› Sentry

› Shovel

› Pyflakes

› pytest

Python 编程

› pep8 Checker

Styles

› PEP 8

› Google Python Style Guide

› Code Style from The Hitchhiker's Guide

This topic created in 3302 days ago, the information mentioned may be changed or developed.

之前把爬虫爬取的源文件都存在了一个文件夹，有一百多万个，现在要读取，直接用 os.walk(path) 这种方式，几个小时了还卡在这一步，有没有其他的方式可以快速的读取

读取

文件

path

Walk

14 replies • 2017-06-01 01:43:26 +08:00

cheneydog

May 5, 2017

ls /xxx >list

sheep3

May 5, 2017

```
import subprocess
cat = subprocess.Popen(["ll",], stdout=subprocess.PIPE)
for line in cat.stdout:
print line
```

dsg001

May 5, 2017

```
import glob
for file in glob.iglob('path'):
print(file)
'''

minbaby

May 5, 2017

一看楼上就是没有遇到过几百万文件在一个目录的情况，之前遇到过（因为 crontab 用了 wget 导致的），用 ls 命令已经不好使了，这个时候只能通过文件名字的规律来处理了，比如文件名是五位数字，可以试试 ls 11* 这种方式缩小每次读取的文件数量

rrfeng

May 5, 2017

ls -U

Osk

May 5, 2017

$ time ls -U | wc -l
5973243

real 0m2.330s
user 0m1.743s
sys 0m0.790s

不用-U 要 24 秒多

privil

May 5, 2017

@Osk 正解，U 参数！我用 ls -U 删了两千万文件……

Ouyangan

May 5, 2017

100 万 , 真刺激

lxf1992521

May 5, 2017

find > a 生成索引，在使用 python 去处理一个几百万行的文本即可。

dbj1991

May 6, 2017

@Osk @minbaby @lxf1992521 那如果是在 windows 下呢

minbaby

May 6, 2017

@dbj1991 win 下就不知道了，没经历过

ihciah

May 6, 2017

想起某童鞋经历过的 inode 用完的情况。。。。

beordle

May 7, 2017

@Osk 学习了

HMSQQbA

Jun 1, 2017 via Android

@cheneydog 这样会多出一项