A 文件是个 json 文件: a.txt
{
"_id": "113.254.82.124",
"_index": "fofapro_subdomain",
"header": "HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 195\r\nCache-Control: no-cache\r\nContent-Type: text/html\r\nDate: Sat, 20 Oct 2018 15:59:44 GMT\r\nEtag: \"0-29d-b90\"\r\nServer: Embedthis-Appweb/3.3.1\r\nWww-Authenticate: Basic realm=\"DCS-2530L\"\r\nX-Frame-Options: SAMEORIGIN\r\n",
}
{
"_id": "http://10.254.82.12",
"_index": "fofapro_subdomain",
"header": "HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 195\r\nCache-Control: no-cache\r\nContent-Type: text/html\r\nDate: Sat, 20 Oct 2018 15:59:44 GMT\r\nEtag: \"0-29d-b90\"\r\nServer: Embedthis-Appweb/3.3.1\r\nWww-Authenticate: Basic realm=\"DCS-2530L\"\r\nX-Frame-Options: SAMEORIGIN\r\n",
}
{
"_id": "https://192.168.1.10:9090",
"_index": "fofapro_subdomain",
"header": "HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 195\r\nCache-Control: no-cache\r\nContent-Type: text/html\r\nDate: Sat, 20 Oct 2018 15:59:44 GMT\r\nEtag: \"0-29d-b90\"\r\nServer: Embedthis-Appweb/3.3.1\r\nWww-Authenticate: Basic realm=\"DCS-2530L\"\r\nX-Frame-Options: SAMEORIGIN\r\n",
}
{
"_id": "127.0.0.1:8343",
"_index": "fofapro_subdomain",
"header": "HTTP/1.1 401 Unauthorized\r\nConnection: close\r\nContent-Length: 195\r\nCache-Control: no-cache\r\nContent-Type: text/html\r\nDate: Sat, 20 Oct 2018 15:59:44 GMT\r\nEtag: \"0-29d-b90\"\r\nServer: Embedthis-Appweb/3.3.1\r\nWww-Authenticate: Basic realm=\"DCS-2530L\"\r\nX-Frame-Options: SAMEORIGIN\r\n",
}
B 文件: b.txt
127.0.01
192.168.1.10
192.168.88.88
代码
import re
import json
def filesJson(filepath,dstpaths):
datas = set()
#正则匹配
rule = re.compile('^[a-zA-z]{1}.*$')
with open(filepath, 'r', encoding='UTF-8') as a, open(dstpaths, 'r', encoding='UTF-8') as b:
b.seek(0)
for realine_a in a:
json_datas = json.loads(realine_a)
ips = json_datas['_id']
if rule.findall(ips):
ips = ips.strip("http[s]?://")
ips = ips.split(":")[0]
datas.add(ips)
for realine_b in b:
if realine_b in datas:
print(realine_b)
else:
break
if __name__ == '__main__':
file_paths = "a.txt"
dstpaths = 'b.txt'
filesJson(file_paths, dstpaths)
我的想法是把 A 文件里的 IP,去除协议和端口,只保留 IP 写入到一个集合中,然后在通过 B 文件的数据去匹配这个集合,有没有这个 IP,如果有这个 IP,把 A 文件这行数据写入到 C 文件中,现在问题是 B 文件无法匹配 A 文件的数据,而且如果 A 文件内容是几百万行数据,B 文件内容是几万行数据,这种逻辑是不是有很大的问题。
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.