cdxdoc

Name Columns and how to use them

cdx can handle text files with a special header line.

This line starts with “ CDX<delim>”, that is a space, the letters CDX, and the column delimiter used in the file.

That tag is followed by the column names, delimited by the given delimiter. Column names must start with an alphabetic character which is then followed by one or more alphanumeric characters or underscores.

So Legal names include “title” “the_author” “x” and “X_12345”

Illegal names include “7” “_title” “the-author”

A header line is malformed if it contains multiple columns that share the same name.

Named Column

When a tool asks for a column, you can use a name or number, e.g. “7” or “title”

Column Set

When a tool asks for a ColumnSet, that is a comma delimited list of Ranges

A Range starts with an optional “Name:”, giving a new name to the column, followed by a optional “~” signaling “not”, followed by one of

The full column set then represents all the regular ranges, in order, with any and all not’ed column removed.

This means that repeating a “yes” column is significant, but repeating a negated column changes nothing. For example

Scoped Value

A scoped value lets you specify a value that is potentiall different for every column. It is a value followed by a comma followed by a Column Set. These can be specified multiple times, to give different values to different columns. If the same column is given multiple values, the rightmost one on the command line take priority. If no comma is present, the vthe alue applies to all columns. For example, if the command like option is “-s”, and assuming 5 columns,

Transforms

A column set can be followed by a + and a Transform chain, for example 1-3+lower+to_base64. This will give you the values of those fields, each individually transformed as requested.