q - command line tool that run SQL directly on CSV or TSV files

Submit command line tools that you find here.
Post Reply
Message
Author
User avatar
rbon
Posts: 361
Joined: Wed Mar 28, 2007 2:16 am
Location: Italy

q - command line tool that run SQL directly on CSV or TSV files

#1 Post by rbon »

q - command line tool that run SQL directly on CSV or TSV files
q - Text as Data

Overview
q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).
q treats ordinary files as database tables, and supports all SQL constructs, such as WHERE, GROUP BY, JOINs etc.
It supports automatic column name and column type detection, and provides full support for multiple encodings.
It can be used for execute queries on web Servers log files https://en.wikipedia.org/wiki/Server_log (standardized text file format used by web servers)
It works on CSV https://en.wikipedia.org/wiki/Comma-separated_values or TSV https://en.wikipedia.org/wiki/Tab-separated_values files.

Operative Systems supported:
• OSX https://github.com/harelba/q/releases/d ... _64-Darwin
• RPM Package: https://github.com/harelba/q/releases/d ... x86_64.rpm
• DEB Package: https://github.com/harelba/q/releases/d ... _amd64.deb
• Windows Installer: https://github.com/harelba/q/releases/d ... taller.exe
• Windows CLI: https://github.com/harelba/q/releases/d ... indows.exe
Please note that q.exe runs only on 64 bit Systems.

About Windows packages, author writes:
• Run the installer executable and hit next next next... q.exe will be added to the PATH so you can access it everywhere.
• Windows doesn't update the PATH retroactively for open windows, so you'll need to open a new cmd window after the installation is done.
q.exe details:
Windows version current: 2.0.19
Windows programs sizes: 6,5/6,52 MB

Program links:
• q program page: https://harelba.github.io/q/#installation
• GitHub program page: https://github.com/harelba/q
• Softpedia: https://www.softpedia.com/get/PORTABLE- ... /q-H.shtml

q program page contains everything you need to download and use q immediately:
• Overview https://harelba.github.io/q/#overview
• Installation: https://harelba.github.io/q/#installation
• Requirements: https://harelba.github.io/q/#requirements
• Usage: https://harelba.github.io/q/#usage
• Query: https://harelba.github.io/q/#query
• Flags: https://harelba.github.io/q/#flags
• Additional functions: https://harelba.github.io/q/#additional-functions
• Examples: https://harelba.github.io/q/#examples
• Implementation: https://harelba.github.io/q/#implementation
• Development: https://harelba.github.io/q/#development
• Limitations: https://harelba.github.io/q/#limitations
• Rationale: https://harelba.github.io/q/#rationale
• Benchmark: https://harelba.github.io/q/#benchmark
• Future: https://harelba.github.io/q/#future

Abtain Help file:
From command line run:

Code: Select all

q -h
q --help
Help file

Code: Select all

Options:
  -h, --help            show this help message and exit
  -v, --version         Print version
  -V, --verbose         Print debug info in case of problems
  -S SAVE_DB_TO_DISK_FILENAME, --save-db-to-disk=SAVE_DB_TO_DISK_FILENAME
                        Save database to an sqlite database file
  --save-db-to-disk-method=SAVE_DB_TO_DISK_METHOD
                        Method to use to save db to disk. 'standard' does not
                        require any deps, 'fast' currenty requires manually
                        running `pip install sqlitebck` on your python
                        installation. Once packing issues are solved, the fast
                        method will be the default.

  Input Data Options:
    -H, --skip-header   Skip header row. This has been changed from earlier
                        version - Only one header row is supported, and the
                        header row is used for column naming
    -d DELIMITER, --delimiter=DELIMITER
                        Field delimiter. If none specified, then space is used
                        as the delimiter.
    -p, --pipe-delimited
                        Same as -d '|'. Added for convenience and readability
    -t, --tab-delimited
                        Same as -d <tab>. Just a shorthand for handling
                        standard tab delimited file You can use $'\t' if you
                        want (this is how Linux expects to provide tabs in the
                        command line
    -e ENCODING, --encoding=ENCODING
                        Input file encoding. Defaults to UTF-8. set to none
                        for not setting any encoding - faster, but at your own
                        risk...
    -z, --gzipped       Data is gzipped. Useful for reading from stdin. For
                        files, .gz means automatic gunzipping
    -A, --analyze-only  Analyze sample input and provide information about
                        data types
    -m MODE, --mode=MODE
                        Data parsing mode. fluffy, relaxed and strict. In
                        strict mode, the -c column-count parameter must be
                        supplied as well
    -c COLUMN_COUNT, --column-count=COLUMN_COUNT
                        Specific column count when using relaxed or strict
                        mode
    -k, --keep-leading-whitespace
                        Keep leading whitespace in values. Default behavior
                        strips leading whitespace off values, in order to
                        provide out-of-the-box usability for simple use cases.
                        If you need to preserve whitespace, use this flag.
    --disable-double-double-quoting
                        Disable support for double double-quoting for escaping
                        the double quote character. By default, you can use ""
                        inside double quoted fields to escape double quotes.
                        Mainly for backward compatibility.
    --disable-escaped-double-quoting
                        Disable support for escaped double-quoting for
                        escaping the double quote character. By default, you
                        can use \" inside double quoted fields to escape
                        double quotes. Mainly for backward compatibility.
    --as-text           Don't detect column types - All columns will be
                        treated as text columns
    -w INPUT_QUOTING_MODE, --input-quoting-mode=INPUT_QUOTING_MODE
                        Input quoting mode. Possible values are all, minimal
                        and none. Note the slightly misleading parameter name,
                        and see the matching -W parameter for output quoting.
    -M MAX_COLUMN_LENGTH_LIMIT, --max-column-length-limit=MAX_COLUMN_LENGTH_LIMIT
                        Sets the maximum column length.
    -U, --with-universal-newlines
                        Expect universal newlines in the data. Limitation: -U
                        works only with regular files for now, stdin or .gz
                        files are not supported yet.

  Output Options:
    -D OUTPUT_DELIMITER, --output-delimiter=OUTPUT_DELIMITER
                        Field delimiter for output. If none specified, then
                        the -d delimiter is used if present, or space if no
                        delimiter is specified
    -P, --pipe-delimited-output
                        Same as -D '|'. Added for convenience and readability.
    -T, --tab-delimited-output
                        Same as -D <tab>. Just a shorthand for outputting tab
                        delimited output. You can use -D $'\t' if you want.
    -O, --output-header
                        Output header line. Output column-names are determined
                        from the query itself. Use column aliases in order to
                        set your column names in the query. For example,
                        'select name FirstName,value1/value2 MyCalculation
                        from ...'. This can be used even if there was no
                        header in the input.
    -b, --beautify      Beautify output according to actual values. Might be
                        slow...
    -f FORMATTING, --formatting=FORMATTING
                        Output-level formatting, in the format X=fmt,Y=fmt
                        etc, where X,Y are output column numbers (e.g. 1 for
                        first SELECT column etc.
    -E OUTPUT_ENCODING, --output-encoding=OUTPUT_ENCODING
                        Output encoding. Defaults to 'none', leading to
                        selecting the system/terminal encoding
    -W OUTPUT_QUOTING_MODE, --output-quoting-mode=OUTPUT_QUOTING_MODE
                        Output quoting mode. Possible values are all, minimal,
                        nonnumeric and none. Note the slightly misleading
                        parameter name, and see the matching -w parameter for
                        input quoting.
    -L, --list-user-functions
                        List all user functions

  Query Related Options:
    -q QUERY_FILENAME, --query-filename=QUERY_FILENAME
                        Read query from the provided filename instead of the
                        command line, possibly using the provided query
                        encoding (using -Q).
    -Q QUERY_ENCODING, --query-encoding=QUERY_ENCODING
                        query text encoding. Experimental. Please send your
                        feedback on this
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning." - Rick Cook.

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: q - command line tool that run SQL directly on CSV or TSV files

#2 Post by webfork »

So I do a lot of text manipulation so its important to first say that it was definitely on my roadmap to start doing things with SQL queries, followed hopefully one day by some Python or other more involved languages. Something that treats text files as SQL files definitely changes gear for me on this. Regular expressions can do a lot but of course SQL manipulation does a lot more (in part by using Regular Expressions).
Rationale

Have you ever stared at a text file on the screen, hoping it would have been a database so you could ask anything you want about it? I had that feeling many times, and I've finally understood that it's not the database that I want. It's the language - SQL.

SQL is a declarative language for data, and as such it allows me to define what I want without caring about how exactly it's done. This is the reason SQL is so powerful, because it treats data as data and not as bits and bytes (and chars).

The goal of this tool is to provide a bridge between the world of text files and of SQL.

... to bring SQL expressive power to manipulating text data using the command line.
Great post, thank you.

Post Reply