cdx can handle text files with a special header line.
This line starts with “ CDX<delim>”, that is a space, the letters CDX, and the column delimiter used in the file.
That tag is followed by the column names, delimited by the given delimiter. Column names must start with an alphabetic character which is then followed by one or more alphanumeric characters or underscores.
So Legal names include “title” “the_author” “x” and “X_12345”
Illegal names include “7” “_title” “the-author”
A header line is malformed if it contains multiple columns that share the same name.
When a tool asks for a column, you can use a name or number, e.g. “7” or “title”
When a tool asks for a ColumnSet, that is a comma delimited list of Ranges
A Range starts with an optional “Name:”, giving a new name to the column, followed by a optional “~” signaling “not”, followed by one of
(range,<=dog>=cat)
The full column set then represents all the regular ranges, in order, with any and all not’ed column removed.
This means that repeating a “yes” column is significant, but repeating a negated column changes nothing. For example
1,3,5
- columns 1, 3 and 51-3,5
- columns 1,2,3 and 51-5,~3
- columns 1,2,4,51-
- all the columns1,stuff:1
- column 1 with its original name, column 1 again but with the column name “stuff”~2-3,1-5,~3-4
- columns 1 and 5A scoped value lets you specify a value that is potentiall different for every column. It is a value followed by a comma followed by a Column Set. These can be specified multiple times, to give different values to different columns. If the same column is given multiple values, the rightmost one on the command line take priority. If no comma is present, the vthe alue applies to all columns. For example, if the command like option is “-s”, and assuming 5 columns,
-s foo
– columns 1-5 get “foo”-s foo -s bar,2-4
– column1 1 and 5 get foo, 2-4 get bar-s ,,3
– column 3 gets “,”-s foo,~2-4
– columns 1 and 5 get foo.A column set can be followed by a +
and a Transform chain, for example 1-3+lower+to_base64
.
This will give you the values of those fields, each individually transformed as requested.