Reducing Allocation Errors in Testbeds



Network testbeds have become widely used in computer science, both for evaluation of research technologies and for hands-on teaching. This can naturally lead to oversubscription and resource allocation failures, as limited testbed resources cannot meet the increasing demand.

This project examines the causes of resource allocation failures on DeterLab testbed and finds three main culprits that create perceived resource oversubscription, even when available nodes exist: (1) overuse of mapping constraints by users, (2) testbed software errors and (3) suboptimal resource allocation. We propose solutions that could resolve these issues and reduce allocation failures to 57.3% of the baseline. In the remaining cases, real resource oversubscription occurs. We examine testbed usage patterns and show that a small fraction of unfair projects starve others for resources under the current first-come-first-served allocation policy. Due to interactive use of testbeds traditional fair-sharing techniques are not suitable solutions. We then propose two novel approaches - Take-a-Break and Borrow-and-Return - that temporarily pause long-running experiments. These approaches can reduce resource allocation failures to 25% of the baseline case by gently prolonging 1 - 2.5% of instances. While our investigation is done on DeterLab testbed data, it should apply to all testbeds that run Emulab software.



Please click here for most of the data we used in our publication above.