field | meaning | start_time | flow start in epoch time | end_time | flow end in epoch time | s_IP | anonymized source IP | s_port | original source port | d_IP | anonymized destination IP | d_port | original destination port | proto | protocol number from IANA | flags | TCP flags, cummulative (in case of ICMP flows, this field can be interpreted as ICMP type and code), converted into decimal format |
bytes | total bytes on the flow, upsampled | pkts | total packets on the flow, upsampled | label | A - attack, B - benign, N - not labeled | ex_src | o - old source, n - new source, N - not labeled |
---|---|
Flow fields |
Example:
#start_time end_time s_IP s_port d_IP d_port proto flags bytes pkts label ex_src 1589324697 1589324697 14.181.72.246 64330 137.221.131.228 443 6 17 163840 4096 B o
means there was a flow from 1589324697 to 1589324697 (really only one packet was sampled at time 1589324697) from source 14.181.72.246 port 64330 to destination 137.221.131.228 port 443. The protocol was TCP and flags 17 (16 - ACK and 1 - FIN). The sampling rate for this flow was 1:4096 packets so the captured packet was of length 40 bytes. Upsampled, this maps into 163,840 bytes and 4,096 packets on this flow. The flow was marked as benign. The source of the flow has sent benign traffic to the same destination in the past.
field | meaning | record type | C for commercial defense, I for our inferred record | ID | unique ID for this record | start_time | attack start, as seen by C in epoch time (can be -1 if we don't have mitigation end report) | end_time | attack end, as seen by C in epoch time | target_p | anonymized target /24 prefix | type | attack type, cumulative, shown as decimal (see attack type table for further details) | severity | low/medium/high |
---|---|
Alert fields |
Attack types and their values as decimal numbers are shown below:
type | signature | decimal value |
---|---|---|
DNS Amplification | src port 53 and proto udp | 1 |
ICMP Flood | proto icmp | 2 |
Total Traffic | 4 | |
IP Fragmentation | src port 0 | 8 |
CLDAP Amplification | src port 389 | 16 |
TCP SYNACK Amplification | proto tcp and flags & 18 != 0 | 32 |
TCP RST Flood | proto tcp and flags & 1 != 0 | 64 |
UDP Flood | proto udp | 128 |
NTP Amplification | proto udp and src port 123 | 256 |
mDNS Amplification | src port 5353 and proto udp | 2048 |
TCP SYN Flood | proto tcp and flags & 2 != 0 | 8192 |
Chargen Amplification | src port 19 and proto udp | 16384 |
L2TP Amplification | src port 1701 and proto udp | 32768 |
Memcached Amplification | src port 11211 and proto udp | 65536 |
DNS Flood | dst port 53 and proto udp | 131072 |
RPCbind Amplification | src port 111 and proto udp | 262144 |
TCP ACK Flood | proto tcp and flags & 16 != 0 | 524288 |
Attack type table |
Attack types are encoded as cumulative values, e.g., if an attack has fragmented flows and CLDAP amplification flows it will be encoded as type 8+16=24.
There are four confounders that prevent us from just labeling traffic directly based on attack alerts. First, alert start and stop may lag after the actual onset of attacks, since commercial defenses delay alerts in some cases to reduce false positives. Second, link observed by C is different than set of links generating Netflow records, so some attacks may be observable in our records and not by C and vice versa. Third, our traffic records are generated from sampled traffic (and in some cases may be double-sampled) and therefore may be skewed. Fourth, our attack alerts are anonymized at the prefix level, but our flows are anonymized at the IP level.
We label traffic by performing the following steps:
Labeling
We also release our matched alerts. The table below details their fields. We show the matched alerts along with the C's alerts.
field | meaning | record type | C for commercial defense, I for our inferred record | ID | unique ID for this record | start_time | attack start, inferred in epoch time | end_time | attack end, inferred in epoch time | target | target IP, inferred | type | attack type, inferred, cumulative, shown as decimal (see attack type table for further details) | cID | unique ID of commercial attack record that matches this inferred record |
---|---|
Alert match fields |
Additionally, we would like to assemble a collection of algorithms that may be helpful in accurately labeling DDoS attack data. We expect the participants to develop their own attack detection approaches using the data we will release. If we (as a community) label more data during the hackathon participants are welcome to revise their algorithms, as needed. All code should be uploaded to our shared Google drive prior to evaluation. The code should ingest all flow fields except the label and produce flows with B or A labels.
Hackathon Instructions
Please follow these steps to participate in hackathon.