最近在处理大文本IO内容时经常爆内存,故在思考如何极致的节省无用的中间操作释放内存。

  • 1 新的迭代方法:不一次性读取全部文件内容到内存
  • 2 释放无用中间内存

1 经典正常操作

先创建一个90M的文件:

with open("text.txt", "w") as f:
    for i in range(1000000):
        f.write("我爱你中国" + "\n")

对该文件进行操作:

import tracemalloc

tracemalloc.start()

with open("text.txt", "r") as f:
    lines = f.readlines()

new_lines = []
for line in lines:
    new_lines.append("1" + line)
# del lines
current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 182.10345554351807 MB
# peak memory is 182.10350131988525 MB

2 释放内存操作

尖峰内存没有减少,实时内存减少。

import tracemalloc

tracemalloc.start()

with open("text.txt", "r") as f:
    lines = f.readlines()

new_lines = []
for line in lines:
    new_lines.append("1" + line)
del lines
current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 92.21963405609131 MB
# peak memory is 182.10350131988525 MB

3 新的迭代方法

实时内存和尖峰内存全部都减少。

import tracemalloc

tracemalloc.start()

def process():
    new_lines = []
    with open("text.txt", "r") as f:
        for line in f:
            new_lines.append("1" + line)
    return new_lines

new_lines = process()
current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 92.21870708465576 MB
# peak memory is 92.23535537719727 MB