大家在学习python的阶段经常可以听到迭代器和生成器,那么这两个的区别是什么呢?

首先大家需要明确的一点是,生成器是特殊的迭代器,迭代器的作用是用于遍历访问。

迭代器和生成器的区别:

  • 迭代器一般用于for循环迭代遍历迭代器中的数据。
  • 生成器就像挤牙膏,挤一下就出一托数据,换句话说就是调用一次就返回一次数据。

迭代器的必备要素:

  • __iter__特殊方法,生成迭代器对象,一般返回自身
  • __next__特殊方法,遍历下一个数据

生成器的必备要素:

  • yield关键字,自动调用iter和next方法返回数据,下一次访问从yield关键词之后开始访问。

我们举一个例子来探究迭代器和生成器的最重要的区别——迭代逻辑。

1.1 先创建一个10w行的文本文件

with open("text.txt", "w") as f:
    for i in range(1000000):
        f.write("我爱你中国" + "\n")

1.2 使用迭代器进行读取,并计算内存消耗

一次性读取全部文件内容:

import tracemalloc

def process_line(line):
    pass

tracemalloc.start()
with open("text.txt", "r") as f:
    lines = f.readlines()
for line in lines:
    process_line(line)

current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 89.88475894927979 MB
# peak memory is 89.90067100524902 MB

不一次性读取全部文件内容:

import tracemalloc

def process_line(line):
    pass

tracemalloc.start()

class LineIter:
    def __init__(self):
        self.f = open("text.txt", "r")

    def __iter__(self):
        return self

    def __next__(self):
        line = self.f.readline()
        if line:
            return line
        else:
            self.f.close()
            raise StopIteration

lines = LineIter()
for line in lines:
    process_line(line)

current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 0.0029458999633789062 MB
# peak memory is 0.04485607147216797 MB

1.3 使用生成器进行读取,并计算内存消耗

一次性全部读取文件:尖峰内存为文件全部大小。

import tracemalloc

def process_line(line):
    pass

tracemalloc.start()

def LineIter():
    with open("text.txt", "r") as f:
        lines = f.readlines()
        for line in lines:
            yield line

lines = LineIter()
for line in lines:
    process_line(line)

current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()
# current memory is 0.0005741119384765625 MB
# peak memory is 89.90090751647949 MB

不一次性全部读取文件:

import tracemalloc

def process_line(line):
    pass

tracemalloc.start()

def LineIter():
    with open("text.txt", "r") as f:
        for line in f:
            yield line

lines = LineIter()
for line in lines:
    process_line(line)

current, peak = tracemalloc.get_traced_memory()
print(f"current memory is {current / 1024**2} MB")
print(f"peak memory is {peak / 1024**2} MB")
tracemalloc.stop()

# current memory is 0.0005702972412109375 MB
# peak memory is 0.03522205352783203 MB


# with open("text.txt", "w") as f:
#     for i in range(1000000):
#         f.write("我爱你中国" + "\n")