Advent of Code 2024 Day 9 – Disk Fragmenter

Advent of Code 2024 Day 9 – Disk Fragmenter

- 22 mins

Day 9: Disk Fragmenter

Part 1

Today’s input is a sequence of digits corresponding to a disk map. The example input is:

2333133121414131402

This disk map maps the blocks that files and free space are taking up on the disk. The digits alternate between showing the number of blocks that a file takes up, and then the number of free blocks.

So for example, 12345 would mean:

Also, every file has an associated ID number, which starts at 0 and increments by 1 for each new file on disk. So if we visualize the disk map above, with every block represented by a file ID or a . for free space, it would be:

0..111....22222

The example input above would be represented as:

00...111...2...333.44.5555.6666.777.888899

We would like to move blocks one at a time from the end of the desk to the leftmost free space block, until there are no free spaces between files.

For the 12345 example above, the disk map would look like this after moving blocks one at a time:

0..111....22222
02.111....2222.
022111....222..
0221112...22...
02211122..2....
022111222......

And for the example input, the disk map would look like this after moving blocks:

00...111...2...333.44.5555.6666.777.888899
009..111...2...333.44.5555.6666.777.88889.
0099.111...2...333.44.5555.6666.777.8888..
00998111...2...333.44.5555.6666.777.888...
009981118..2...333.44.5555.6666.777.88....
0099811188.2...333.44.5555.6666.777.8.....
009981118882...333.44.5555.6666.777.......
0099811188827..333.44.5555.6666.77........
00998111888277.333.44.5555.6666.7.........
009981118882777333.44.5555.6666...........
009981118882777333644.5555.666............
00998111888277733364465555.66.............
0099811188827773336446555566..............

Once we have compacted the disk map, we want to calculate the checksum of the files on the disk. This is done as such:

For the small example, the checksum is 60; and for the example input, the checksum is 1928.

We need to calculate the checksum for the compacted input disk map.

My Solution to Part 1

A couple things to note about the question:

Let’s go through my pseudocode solution:

First, I read the input disk map as a list of integers:

with open("input.txt", "r") as f:
	line = f.read().strip()
	int_line = [int(x) for x in line]

Converting the list to a list of alternating file IDs and free space blocks:

def convert(line: list) -> list:
    is_freespace = False
    id = 0
    diskmap = []
    for i in line:
        if is_freespace:
            for _ in range(i):
                diskmap.append(FREE_SPACE)
        else:
            for _ in range(i):
                diskmap.append(id)
            id += 1
        is_freespace = not is_freespace
    return diskmap

Breaking down the code above:

Compacting the disk map:

def make_contiguous1(diskmap: list) -> list:
    first_free_block = 0
    last_file_block = len(diskmap) - 1

    while first_free_block < last_file_block:
        while (
            first_free_block < len(diskmap) and diskmap[first_free_block] != FREE_SPACE
        ):
            first_free_block += 1

        while last_file_block >= 0 and diskmap[last_file_block] == FREE_SPACE:
            last_file_block -= 1

        if first_free_block < last_file_block:
            diskmap[first_free_block], diskmap[last_file_block] = (
                diskmap[last_file_block],
                diskmap[first_free_block],
            )

    return diskmap

Breaking down the code above:

By the end of the make_contiguous1 function, the disk map will be compacted.

Calculating the checksum:

files_only = takewhile(lambda x: x != FREE_SPACE, contiguous_diskmap)
checksum = sum(i * id for i, id in enumerate(files_only))
print(f"LOG: {checksum = }")

Breaking down the code above:

As always, I’ve only included the relevant parts of the code here, but to see my full solution, you can check out my Advent of Code GitHub repository.

Part 2

In part 2, we still need to compact the disk map as in part 1, but we cannot move blocks one at a time. Instead, we need to move the entire file to free space. We try to move each file from the end to free space on the left exactly once, and if we can’t move it to free space, we leave the file as is and try to move the next file.

So the example input disk map when compacted looks like this instead:

00...111...2...333.44.5555.6666.777.888899
0099.111...2...333.44.5555.6666.777.8888..
0099.1117772...333.44.5555.6666.....8888..
0099.111777244.333....5555.6666.....8888..
00992111777.44.333....5555.6666.....8888..

Going through the steps:

Calculating the checksum is the same as in part 1, so the new checksum for the example input is 2858.

We need to calculate the checksum for the compacted disk map as per the new rules.

My Solution to Part 2

My solution is a bit more complicated than part 1, as now we need to keep track of file sizes and free space sizes, and move files to free spaces based on their sizes.

I’ll go through my pseudocode solution:

The input file is read the same as in part 1. The conversion function is different:

def convert_with_size(line: list) -> list:
    is_freespace = False
    id = 0
    diskmap = []
    for i in line:
        if is_freespace:
            diskmap.append((FREE_SPACE, i))
        else:
            diskmap.append((id, i))
            id += 1
        is_freespace = not is_freespace
    return diskmap

Instead of just appending the file ID or free space block, I append a tuple of the file ID or free space block and its size.

The compaction function is different as well:

def make_contiguous2(diskmap: list) -> list:
    files = list(reversed([x for x in (diskmap) if x[0] != FREE_SPACE]))
    for file in files:
        swap_file(diskmap, file)
    return diskmap

This function finds all the files starting from the end of the disk map, and for each one, sees if it can be moved to a free space block to the left.

My swap_file function is where most of the logic is:

def swap_file(diskmap: list, file: tuple):
    file_index = diskmap.index(file)
    free_space_index = -1

    for i in range(file_index):
        if diskmap[i][0] == FREE_SPACE and diskmap[i][1] >= file[1]:
            free_space_index = i
            break

    if free_space_index == -1:
        return

    if diskmap[free_space_index][1] > file[1]:
        diskmap[free_space_index] = (FREE_SPACE, diskmap[free_space_index][1] - file[1])
        diskmap[file_index] = (FREE_SPACE, file[1])
        diskmap.insert(free_space_index, file)

    elif diskmap[free_space_index][1] == file[1]:
        diskmap[free_space_index], diskmap[file_index] = (
            diskmap[file_index],
            diskmap[free_space_index],
        )

    return

Breaking down the code above:

Once this is done, the checksum function is a bit more involved than part 1:

def checksum(diskmap: list):
    index = 0
    checksum = 0
    for x in diskmap:
        if x[0] != FREE_SPACE:
            for i in range(x[1]):
                checksum += (index + i) * x[0]
        index += x[1]
    return checksum

Explained:

Finally, I call the functions in order:

diskmap = convert_with_size(int_line)
contiguous_diskmap = make_contiguous2(diskmap)
print(f"LOGF: checksum for contiguous files {checksum(contiguous_diskmap)}")

Again, I’ve only included the relevant parts of the code here, check out my repository for the full solution.

Update: Optimizations!

My code runs in \(O(n^2)\) time, due to looking for free space blocks that can fit the file for each file. I can optimize this by using 10 priority queues to store the indices of free space blocks of size 0 to 9. This way, I can just pop the leftmost free space block that can fit the file from one of the priority queues. This will reduce the time complexity to \(O(n \log n)\).

The new part 2 logic is almost the same as before, but with a few changes:

Looking at my new conversion function:

def convert_with_heaps(line: list) -> tuple[list, list]:
    is_freespace = False
    id = 0
    diskmap = []
    heaps = [[] for _ in range(10)]
    for i in line:
        if is_freespace:
            # Push index of free space to the heap of the size of the free space
            heapq.heappush(heaps[i], len(diskmap))
            for _ in range(i):
                diskmap.append(FREE_SPACE)
        else:
            for _ in range(i):
                diskmap.append(id)
            id += 1
        is_freespace = not is_freespace
    return diskmap, heaps

Going over the differences from my originial conversion function from my Part 1 Solution:

The new make_contiguous2 function has a bit more changes:

def make_contiguous2_heap(diskmap: list, heaps: list) -> list:
    index = len(diskmap) - 1
    while index >= 0:
        if diskmap[index] == FREE_SPACE:
            index -= 1
            continue

        id = diskmap[index]
        file_width = 0
        # Get the width of the file
        while index >= 0 and diskmap[index] == id:
            file_width += 1
            index -= 1

        best_width = -1
        smallest_index = len(diskmap)
        # Find the leftmost index of free space that can fit the file
        for width in range(file_width, 10):
            if heaps[width]:
                if smallest_index > heaps[width][0]:
                    smallest_index = heaps[width][0]
                    best_width = width

        if smallest_index == len(diskmap):
            continue
        if smallest_index > index:
            continue

        # Remove the smallest index from the heap
        # In-place swap the file with the free space
        heapq.heappop(heaps[best_width])
        for j in range(file_width):
            diskmap[smallest_index + j] = id
            diskmap[index + j + 1] = FREE_SPACE
        # Push the new smaller free space to the heap
        heapq.heappush(heaps[best_width - file_width], smallest_index + file_width)

    return diskmap

Again this is more similar to my part 1 make_contiguous function, but with a few changes:

The rest of the code is the same as Part 1, but now I call the new functions:

def main3():
    with open("input.txt", "r") as f:
        line = f.read().strip()
        int_line = [int(x) for x in line]
    diskmap, heaps = convert_with_heaps(int_line)
    contiguous_diskmap = make_contiguous2_heap(diskmap, heaps)
    checksum = sum(
        i * id for i, id in enumerate(contiguous_diskmap) if id != FREE_SPACE
    )
    print(f"LOGF: {checksum = }")

This code is more optimized than my original solution, and runs much faster than my original solution.

LOG: checksum = files 6415666220005
Function 'main2' executed in 3.6767s
LOG: checksum = 6415666220005
Function 'main3' executed in 0.0288s

I took a lot of time to get to my Part 2 solution today. The problems are only going to get harder from here, so we’ll see how long I last. I hope you enjoyed reading my solution and that’s it for Day 9!

Vinesh Benny

Vinesh Benny

A Software Engineer learning about different things in life and otherwise