Today, industry and academic research drive the progress in systems, networking and security fields at lightning speed. But validation and demonstration of research solutions is still largely ad hoc, and is frequently performed using distributed, large-scale and complex experiments on network testbeds. This project builds a strong foundation for rigorous and repeatable testbed-based experimentation by providing testbed-supported experiment lifecycles. Our foundation, called Elie, consists of a new experiment representation and several supporting services that make experiment workflows robust, fault tolerant and easily sharable and reusable. Our experiment representation, called DEW (Distributed Experiment Workflow), enables specification of experiment workflows in a human-readable and machineparsable manner. Elie's supporting services are: basic experiment orchestration, fault detection, and archiving functionalities, as well as tools, which translate DEW to/from current experiment representations and manual workflows. Outcomes of this project will significantly improve ease of testbed experimentation by offloading tedious, repetitive and detail-sensitive tasks to testbeds. This will shorten experiment duration and improve quality and reliability of results.
Our work will further make experiments more robust to failure. Finally, this work will enable easy sharing and reuse of experiments, with minimal user effort, and facilitate repeatability and reproducibility. Overall, such advances in testbed experimentation will enable vertical development (building upon the work of others) in fields, which use testbeds, such as distributed systems, networking and cybersecurity. Sharing and reuse will also improve quality of scientific research in these fields by enabling the creation of larger and more complex experiments built on the shared artifacts of other researchers.
Here is an example experiment script in bash:
And here is the same experiment represented in DEW:
We can also represent DEW workflows as a DAG. Here is the same experiment as a DAG, with color representing the actor where the action is occuring (red=attacker, blue=victim server, green=legitimate client), and with nodes representing actions.
Notice that start_log, restart_server and start_detector are independent actions, but this is not what a user may want. Instead the user may want to run those actions before running legitimate trafic. Here is the corrected DAG: