Nachos Runtime Conservative Garbage Collector by Collin Jackson written 12/9/2003 for CS 421.5 Assignment 7 Extra Credit ======== FOREWORD (added 8/28/2005) ======== I am publishing my README because I find it amusing. I have not included my source code because it might spoil the Operating Systems assignment for some other wayward computer science student. If think my source code would be useful to you for some purpose other than plagiarism, let me know and I will happily cough it up. ============ INTRODUCTION ============ This directory structure contains an implementation of the Nachos operating system. Nachos is maintained by Tom Anderson and is copyright (c) 1992-1993 The Regents of the University of California; see the file "threads/copyright.h" for more details. The threading, user program, virtual memory, and filesystem components were written by Boting Zhang and myself as a group project for CS 422 (Operating Systems) during the spring 2003 term. I modified it by implementing a new system call, malloc, which returns blocks of memory that may be garbage collected. Note that the network component of Nachos is not implemented. =============== GETTING STARTED =============== You will need about 60 MB of storage to compile this project. I recommend untarring it in the /tmp folder. To compile Nachos, run "make" from the filesys/ directory. To compile the test cases, run "make" from the test/ directory. Use the "nachos" executable in the filesys/ directory, and the test cases in the test/ directory. For example, to run "gctest0", run this command from the test/ directory: ../filesys/nachos -f -cp gctest0 gctest0 -x gctest0 The "-f" flag means "format the Nachos filesystem". The "-cp" flag copies "gctest0" into the Nachos filesystem. The "-x" flag tells Nachos to execute "gctest0". You may also wish to use the "-d [flags]" option to display debugging messages. Some useful flags are as follows: g - displays garbage collection debugging information v - displays virtual memory allocation information e - displays process exit codes To aid testing, I have provided a script (testgc) in the tests/ directory. See the TESTCASES section of this README file for more information. ============== MODIFIED FILES ============== Unfortunately, the compiler (egcs) that we used last semester for Operating Systems is no longer available in the zoo. With assistance from Jim Faulkner and Professor Arvind Krishnamurthy, I was able to get Nachos to compile using a different compiler and C preprocessor (g++296 and 3.2.2/cpp0). Minor changes had to be made to the filesystem and thread modules to allow Nachos to run. For example, realloc was supported by egcs but is not available in g++296. The files I modified for the garbage collector were as follows: in userprog/: syscall.h - added malloc system call declaration exception.cc - added malloc system call addrspace.h, addrspace.cc - address spaces now have a unique GC added Malloc method that calls into GC gc.h, gc.cc - new files for garbage collector in test/: gctest0-9.c - test cases for the garbage collector =========================== USING THE GARBAGE COLLECTOR =========================== The malloc system call is available for user programs that run on top of Nachos. It has the following declaration in syscall.h: /* Returns a pointer to corresponding to a newly allocated region of * the requested size. May trigger garbage collection or heap increase * and is unsafe for threads; if you want to use fork and join, please * do your memory allocation manually with mmap and sbrk. Returns 0 if * the request could not be satisfied. */ void *malloc(int size); Here are some important facts to keep in mind when using pointers from malloc: 1) Do not mask, tag, or XOR your pointers. Doing so may cause the garbage collector to erroneously collect accessible data. 2) If you no longer need a large data structure, you should set all your pointers to it to zero so that it can be reclaimed. 3) Storage cycles can be reclaimed normally. 4) Having a pointer to anywhere inside an allocated region prevents the region from being deallocated. ====================== IMPLEMENTATION DETAILS ====================== The GC creates a heap using the mmap system call I implemented in the virtual memory assignment for CS 422. Only one segment is created for the heap (it would be possible to have a segment for each allocation, but for modularity reasons I chose not to do so). The heap is initialized to 1 KB when malloc is first called. If garbage collection fails to reclaim sufficient space to satisfy an allocation request from the user program, the heap size is doubled. For very large requests, it grows to match the size of the demand. Because our implementation of Nachos uses a fixed-size swap file, heaps of more than about 10 KB are not allowed. The malloc system call returns a null pointer if the request could not be fulfilled by garbage collection or a heap increase. My GC maintains a linked list of the allocated segments and uses a first-fit linear search algorithm to allocate new space. This method is slower than maintaining a free list [1], though both are prone to fragmentation. When a suitable gap between allocated segments, cannot be found, the garbage collector method Collect() is called. I implemented a conservative mark-sweep garbage allocator that also tracks pointers inside objects. The algorithm can be described as follows: 1) The registers are scanned. Accessible segments are marked. 2) The user memory space is scanned, both including global variables and stack variables. Accessible segments are marked. 3) I construct a stack out of the marked segments. When a segment is added to the stack, its "DONE" bit is set so that it will not be added to the stack again. 4) I pop elements off the stack, scan their contents, and mark any other segments they point to. Segments marked in this manner that do not have their "DONE" bit set are added to the stack. 5) I scan the segment list, removing unmarked segments. The segment list is sorted by address and is turned into an array each time garbage collection occurs. Most registers and memory words will be outside the bounds of the heap; they are ignored. Determining the segment pointed to by a pointer can be done efficiently using binary search; this functionality is implemented in my Mark() function. I used many ideas from the paper "Garbage Collection in an Uncooperative Environment" [1], by Boem and Weiser. However, my garbage collector differs from theirs in several significant ways: 1) I check for pointers into the middle of an allocated space. As a result, the optimal data structure for searching the segment list is a sorted array (for binary search), which I implemented with Mark(), rather than a hash table on pointer values. 1) Rather than maintaining a free list, I used a first-fit allocator. This method would allow efficient implementation of realloc() [1]. 2) I use the lowest available address for the base address of the heap. This value is generally around 2000 or so, depending on the code size. I wanted to debug legitimate pointer/integer collisions so I chose this suboptimal value. A better idea for the long run would be to choose a large integer that is unlikely to occur in user code. Change HeapBaseAddr in gc.h to get this functionality. 3) I did not implement an explicit system call free(), instead relying on user programs to zero any pointers they no longer need. 4) I did not implement two different versions of malloc for atomic and composite objects. All objects are considered composite. ======= TESTING ======= I provide ten test cases: gctest0.c Identifying collectible segments gctest1.c Segment array of segments gctest2.c Linked list of segments gctest3.c Various stress tests gctest4.c XORing pointers to mask them gctest5.c Cycles of storage gctest6.c Large binary trees gctest7.c Segments of various sizes gctest8.c Multiple threads and GC gctest9.c Memory-mapped files and GC For more details, please see the comment at the beginning of each test case. To aid testing, I have provided a script (testgc) in the test/ directory. It calls Nachos on each test case. If the test succeeds, it will report "gctestX PASSED," where X is the test number. Otherwise, it will report "gctestX FAILED." The test cases shut down Nachos when they are finished to allow the script to move on to the next test case. Because the test cases are designed to test specific characteristics of the memory environment, they assume they are the only program running. Trying to run a test case from the shell may cause it to erroneously report failure because of the drastically reduced amount of space remaining in the swap file. Please be patient when running this script. Nachos takes a long time to load an executable file into memory, and some of the tests cause a lot of page faults, memory waste, and meaningless computation to stress-test the garbage collector. It's not my garbage collector that's slowing things down; even programs that don't use it are slow. ========== REFERENCES ========== [1] Hans-Juergen Boehm and Mark Weiser. "Garbage collection in an uncooperative environment." Software Practice and Experience, 18(9):80820, 1988.