Current release is 0.21.0 of October 30, 2020.
• Documents: http://bioinf.shenwei.me/csvtk/
• Usage: http://bioinf.shenwei.me/csvtk/usage/
• Tutorial: http://bioinf.shenwei.me/csvtk/tutorial/
From author site on GitHub:
FeaturesSimilar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data science.
People usually use spreadsheet software like MS Excel to process table data. However this is all by clicking and typing, which is not automated and is time-consuming to repeat, especially when you want to apply similar operations with different datasets or purposes.
You can also accomplish some CSV/TSV manipulations using shell commands, but more code is needed to handle the header line. Shell commands do not support selecting columns with column names either.
csvtk is convenient for rapid data investigation and also easy to integrate into analysis pipelines. It could save you lots of time in (not) writing Python/R scripts.
• Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
• Light weight and out-of-the-box, no dependencies, no compilation, no configuration
• Fast, multiple-CPUs supported (some commands)
• Practical functions provided by N subcommands
• Support STDIN and gziped input/output file, easy being used in pipe
• Most of the subcommands support unselecting fields and fuzzy fields, e.g. -f "-id,-name" for all fields except "id" and "name", -F -f "a.*" for all fields with prefix "a."
• Support some common plots (see usage down in this page)
csvtk Help
from command-line type: csvtk -h
if you like help in a text file, type: csvtk -h>csvtk_help.txt
Subcommands
45 subcommands in total.
If you go on program page on GitHub and click on one subcommand, you you are brought to the author's page where is the command with examples of use (for example): https://bioinf.shenwei.me/csvtk/usage/#freq for the subcommand 'freq'.
Code: Select all
Information
• headers: prints headers
• dim: dimensions of CSV file
• nrow: print number of records
• ncol: print number of columns
• summary: summary statistics of selected digital fields (groupby group fields)
• watch: online monitoring and histogram of selected field
• corr: calculate Pearson correlation between numeric columns
Format conversion
• pretty: converts CSV to readable aligned table
• csv2tab: converts CSV to tabular format
• tab2csv: converts tabular format to CSV
• space2tab: converts space delimited format to CSV
• transpose: transposes CSV data
• csv2md: converts CSV to markdown format
• csv2json: converts CSV to JSON format
• xlsx2csv: converts XLSX to CSV format
Set operations
• head: prints first N records
• concat: concatenates CSV/TSV files by rows
• sample: sampling by proportion
• cut: selects parts of fields
• grep: greps data by selected fields with patterns/regular expressions
• uniq: unique data without sorting
• freq: frequencies of selected fields
• inter: intersection of multiple files
• filter: filters rows by values of selected fields with arithmetic expression
• filter2: filters rows by awk-like arithmetic/string expressions
• join: join files by selected fields (inner, left and outer join)
• split splits CSV/TSV into multiple files according to column values
• splitxlsx: splits XLSX sheet into multiple sheets according to column values
• collapse: collapses one field with selected fields as keys
• comb: compute combinations of items at every row
Edit
• add-header: add column names
• del-header: delete column names
• rename: renames column names with new names
• rename2: renames column names by regular expression
• replace: replaces data of selected fields by regular expression
• round: round float to n decimal places
• mutate: creates new columns from selected fields by regular expression
• mutate2: creates new column from selected fields by awk-like arithmetic/string expressions
• sep: separate column into multiple columns
• gather: gathers columns into key-value pairs
Ordering
• sort: sorts by selected fields
Plotting
plot see usage [url]http://bioinf.shenwei.me/csvtk/usage/#plot[/url]
• plot hist histogram
• plot box boxplot
• plot line line plot and scatter plot
Misc
• cat stream file and report progress
• version print version information and check for update
• genautocomplete generate shell autocompletion script
• csvtk - CSV/TSV Toolkit (Author's page): https://bioinf.shenwei.me/csvtk/
• GitHub project page: https://github.com/shenwei356/csvtk
• GitHub download page: https://github.com/shenwei356/csvtk/rel ... ag/v0.21.0
• direct download release 0.21.0 for win32: https://github.com/shenwei356/csvtk/rel ... exe.tar.gz
• direct download release 0.21.0 for win64: https://github.com/shenwei356/csvtk/rel ... exe.tar.gz
Note_1:
File name for win32: 'csvtk_windows_386.exe.tar.gz' size 7.95 MB - unzipped 16.711 MB
File name for win64: 'csvtk_windows_amd64.exe.tar.gz' size 8.17 MB - unzipped 18.885 MB
(you must unzip these files for obtain CLI *.exe programs)
Note_2:
the author suggest for Windows users to copy 'csvtk.exe' to 'C:\WINDOWS\system32'.
Why ? Because 'C:\WINDOWS\system32' is in system PATH and you can run program from any directory. But the easy way to add a PATH to your system is use the program 'AdvancedRun' by NirSoft https://www.portablefreeware.com/index.php?id=2734 for run (launch) 'csvtk'.