作者:王怡君3018 | 来源:互联网 | 2023-07-04 21:27
我有一个通常在整个块设备上运行的脚本,如果读取的每个块都被缓存,它将驱逐其他应用程序正在使用的数据。为了防止这种情况发生,我添加了对使用mmap(2)和posix
我有一个通常在整个块设备上运行的脚本,如果读取的每个块都被缓存,它将驱逐其他应用程序正在使用的数据。为了防止这种情况发生,我添加了对使用mmap(2)和posix_fadvise(2) 的支持,逻辑如下:
指示不再需要块的函数:
def advise_dont_need(fd, offset, length):
"""
Announce that data in a particular location is no longer needed.
Arguments:
- fd (int): File descriptor.
- offset (int): Beginning of the unneeded data.
- length (int): Length of the unneeded data.
"""
# TODO: macOS support
if hasattr(os, "posix_fadvise"):
# posix_fadvise(2) states that "If the application requires that data
# be considered for discarding, then offset and len must be
# page-aligned." When this code aligns the offset and length, the
# advised area is widened under the presumption it is better to discard
# more memory than needed than to leak it which could cause resource
# issues.
# If the offset is unaligned, extend it toward 0 to align it and adjust
# the length to compensate for the change.
aligned_offset = offset - offset % PAGE_SIZE
length += offset - aligned_offset
offset = aligned_offset
# If the length is unaligned, widen it to align it.
length -= length % -PAGE_SIZE
os.posix_fadvise(fd, offset, length, os.POSIX_FADV_DONTNEED)
读取文件的逻辑:
with open(path, "rb", buffering=0) as file,
ProgressBar("Reading file") as progress, timer() as read_loop:
size = file_size(file)
if mmap_file:
# At the time of this writing, mmap.mmap in CPython uses
# st_size to determine the size of a file which will not
# work with every file type which is why file size
# autodetection (size=0) cannot be used here.
fd = file.fileno()
view = mmap.mmap(fd, size, prot=mmap.PROT_READ)
try:
while writer.error is None and hash_queue.error is None:
# Skip offsets that are already in the block map.
if offset in blocks:
while offset in blocks:
if mmap_file:
advise_dont_need(fd, offset, block_size)
offset += block_size
if not mmap_file:
file.seek(offset)
if mmap_file:
block = view[offset:offset + block_size]
advise_dont_need(fd, offset, len(block))
else:
block = file.read(block_size)
if not block:
break
bytes_read += len(block)
while hash_queue.error is None:
try:
hash_queue.put((offset, block), timeout=0.1)
offset += len(block)
progress.update(offset / size)
break
except queue.Full:
pass
finally:
if mmap_file:
view.close()
当我运行脚本并监视 的输出时free -h
,尽管有这种逻辑,但我可以看到缓冲区缓存使用量增加。我的逻辑是否不正确,或者这是posix_fadvise(2)的结果——建议与授权?
以下是一些日志,显示了在 block_size 设置为 1048576 的脚本执行结束时的长度和偏移量值:
offset=107296587776; length=1048576
offset=107297636352; length=1048576
offset=107298684928; length=1048576
offset=107299733504; length=1048576
offset=107300782080; length=1048576
offset=107301830656; length=1048576
offset=107302879232; length=1048576
offset=107303927808; length=1048576
offset=107304976384; length=0