In this work we seek to improve the current testbed infrastructure, by making it more robust and more efficient.
In this project we examined the causes of resource allocation failures on DeterLab testbed and found three main culprits that create perceived resource oversubscription, even when available nodes exist: (1) overuse of mapping constraints by users, (2) testbed software errors and (3) suboptimal resource allocation. We proposed solutions that could resolve these issues and reduce allocation failures to 57.3% of the baseline. In the remaining cases, real resource oversubscription occurs. We examined testbed usage patterns and show that a small fraction of unfair projects starve others for resources under the current first-come-first-served allocation policy. Due to interactive use of testbeds traditional fair-sharing techniques are not suitable solutions. We then proposed two novel approaches – Take-a-Break and Borrow-and-Return – that temporarily pause long-running experiments. These approaches can reduce resource allocation failures to 25% of the baseline case by gently prolonging 1–2.5% of instances.