{"id":22355288,"url":"https://github.com/drdub/tcp2d","last_synced_at":"2025-03-26T12:43:12.724Z","repository":{"id":6997232,"uuid":"8261764","full_name":"DrDub/tcp2d","owner":"DrDub","description":"A utility to produce data sets for machine learning and statistical analysis from tcp streams.","archived":false,"fork":false,"pushed_at":"2013-01-16T05:53:40.000Z","size":2607,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-31T13:44:37.121Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DrDub.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-02-18T04:39:32.000Z","updated_at":"2019-02-21T22:00:46.000Z","dependencies_parsed_at":"2022-09-17T14:43:25.803Z","dependency_job_id":null,"html_url":"https://github.com/DrDub/tcp2d","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrDub%2Ftcp2d","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrDub%2Ftcp2d/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrDub%2Ftcp2d/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DrDub%2Ftcp2d/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DrDub","download_url":"https://codeload.github.com/DrDub/tcp2d/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245658982,"owners_count":20651520,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-04T13:17:30.833Z","updated_at":"2025-03-26T12:43:12.667Z","avatar_url":"https://github.com/DrDub.png","language":"C","readme":"//ABOUT\n\ntcp2d is a result of my wanting to produce pcap based data sets in Weka for testing machine learning algorithms.  The specific features I needed were the ability to 'mash' tcp streams by displaying the request and response packets side by side (for flow classification) and the ability to easily classify packets within a data set by label.\n\nI found several projects which allowed for transferring pcap files to csv, however, they lacked the features I needed in order by build nice Weka data sets.  So, I started working on a (very simple and limited) tcp stream reassembler so that I could mash and classify packets and create the kinds of data sets I wanted to test.  It seems I wasn't alone in this either: https://list.scms.waikato.ac.nz/pipermail/wekalist/2010-August/049433.html\n\nI've also added various filters for payload data and a formatting scheme that allows any field in the ip/tcp header to be displayed as a column of data, as well as the ability to dump binaries of reassembled payloads (see the '-B' option).  I'm also working on auto generating ARFF headers.  I feel like this project is just starting to get to a usable state, and I'll be building some training sets with it in the coming weeks/months that I'll share.  Hopefully some other people out there can find use in it as well.  \n\n--M\n\n//USAGE OPTIONS\n\nOptions with arguments:\n-f --filename     \t  (path to a packet capture file to be used)\n-o --output \t  \t  (name/path to an output file)\n-d --delimitor\t  \t  (change the field delimitor, default is ',')\n-c --classification-rules (allows classification of individual packets)\n-r --rule-file\t\t  (load classification rules from a file)\n-l --default-label\t  (set default label for classification rules)\n-m --format\t  \t  (change the fields displayed)\n-h --format-file\t  (load format options from a file)\n\nformat example: -m 'srcip;dstip;tcpflags;ip_len;payload;'\nwill output the source ip, destination ip, tcp flags, ip length, and payload.\nAll data available in the header structs from netinet/ip.h and netinet/tcp.h isavailable and generally follows the naming convention of the struct variable name(i.e. ip_len) and must be followed by a semicolon.\n\nFlags:\n-L --list-conversations (shows all tcp conversations available in a data set)\n-A --arff-header  (adds an arff header to the top of any output)\n-H --no-header\t  (disables column headings)\n-T --text-payload (remove all non-printable values from payloads)\n-X --hex-payload  (show all payloads as hex, most useful for document classification of payloads)\n-S --split-output (split output streams into individual files)\n-M --mash\t  (print requests and responses on a single line)\n-B --binary-dump  (dumps binary payloads from streams into indidual files for pcap mining)\n-D --keep-duplicates (turns off dropping of duplicate packets)\n-Q --quiet\t  (disable stdout)\n-N --no-wrap\t  (print results as a single line)\n-! --wait-for-args(load pcap into memory and wait for arguments)\n\n//FOR INFO ON FORMATTING AND LABELING SEE:\n\t\n\tHOW-TO-FORMAT-OUTPUT\n\tand\n\tHOW-TO-LABEL-PACKETS\n\n//A FEW USAGE EXAMPLES\n\nmisc.)\n\nDumping binary files for each conversation recorded into current directory: ./tcp2d -f \u003cpcap\u003e -o \u003coutfile\u003e -B\n\nSaving regular output to individual files: ./tcp2d -f \u003cpcap\u003e -o \u003coutfile\u003e -S \n\n1.) Mashed output of sequence number, ack number, and tcpflags.\n\n./tcp2d -f samples/browsing_google_for_gnu.pcap -m \"seq;ack;tcpflags;\" -M\n\nseq_number,ack_number,fin,syn,rst,psh,ack,urg,response_seq_number,response_ack_number,response_fin,response_syn,response_rst,response_psh,response_ack,response_urg\n2180390404,1220652710,0,0,0,1,1,0,1220652710,1190600196,0,0,0,1,1,0\n1190600196,297971366,0,0,0,0,1,0,297971366,200809988,0,0,0,1,1,0\n200809988,3670191782,0,0,0,0,1,0,3670191782,3505921540,0,0,0,0,1,0\n3505921540,2378411686,0,0,0,0,1,0,2378411686,2516131332,0,0,0,0,1,0\n\n2.) The default display (\"default;\") which is also equivalent to the format options shown below:\n\n./tcp2d -f samples/browsing_google_for_gnu.pcap -m \"ts;seq;ack;srcip;srcport;dstip;dstport;\"\n\ntimestamp|sequence#|acknowledgement#|srcip|srcport|dstip|dstport\n\n1357328815.3646,3859921346,0,199.47.216.173,443,192.168.1.104,43881\n1357328792.707250,333991214,3981934104,192.168.1.104,36907,199.47.219.159,443\n1357328792.780175,3981934104,0,199.47.219.159,443,192.168.1.104,36907\n1357328790.542219,1974316442,1610865336,199.47.218.160,443,192.168.1.104,45379\n1357328790.542259,1610865336,2595073434,192.168.1.104,45379,199.47.218.160,443\n1357328790.556821,2595073434,1610865336,199.47.218.160,443,192.168.1.104,45379\n\n\n3.) Showing the default display along with tcp flags, ip length, and a text only payload:\n\n./tcp2d -f samples/browsing_google_for_gnu.pcap -m \"default;tcpflags;ip_len;payload;\" -T\n\npayload|timestamp|sequence#|acknowledgement#|srcip|srcport|dstip|dstport|fin|syn|rst|psh|ack|urg|ip_len\n\n\"POST / HTTP/1.1^M\nHost: ocsp.thawte.com^M\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0) Gecko/20100101 Firefox/17.0^M\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8^M\nAccept-Language: en-US,en;q=0.5^M\nAccept-Encoding: gzip, deflate^M\nConnection: keep-alive^M\nContent-Length: 115^M\nContent-Type: application/ocsp-request^M\n^M\n0q0o0M0K0I0     ^F^E+^N^C^B^Z^E\",1357328845.532917,955990035,2680340203,192.168.1.104,50674,199.7.52.72,80,1,0,0,1,1,0,65025\n\"HTTP/1.0 200 Ok^M\nlast-modified: Wed, 02 Jan 2013 01:41:24 GMT^M\nexpires: Wed, 09 Jan 2013 01:41:24 GMT^M\ncontent-type: application/ocsp-response^M\ncontent-transfer-encoding: binary^M\ncontent-length: 1084^M\ncache-control: max-age=366841, public, no-transform, must-revalidate^M\ndate: Fri, 04 Jan 2013 19:47:23 GMT^M\nnncoection: close^M\nConnection: Keep-Alive^M\n^M\n0\u003c82\u003e^D8\n^A\",1357328851.485156,2680340203,251478035,199.7.52.72,80,192.168.1.104,50674,0,0,0,1,1,0,50437\n\n4.) hex payloads, syn flags, ack flags, ip offsets\n\n./tcp -f samples/browsing_google_for_gnu.pcap -m \"synflag;ackflag;ip_off;payload;\" -X\n\n 656e 742d 6c62 0764 726f 7062 6f786e 742d 6c62 0764 726f 7062 6f78 0363 6f6d \",0,0,64\n\"0101 080\",0,1,64\n\"0204 05b4 0402 0\",0,0,64\n\"0101 080a 0148 c\",0,1,64\n\"0101 080\",0,1,64\n\"0101 080a 0148 c\",0,1,64\n\"0101 080\",0,1,64\n\"0101 080\",0,0,64\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrdub%2Ftcp2d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdrdub%2Ftcp2d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdrdub%2Ftcp2d/lists"}