@hackage setdown0.1.2.0

Treating files as sets to perform rapid set manipulation.

Setdown - Line based set manipulation

Version: 0.1.2.0 | Hackage

Author: Robert Massaioli Created in: 2015

Installation

Via nix-shell (quickest, no local setup required)

$ nix-shell -p haskellPackages.setdown
$ setdown --help

Via Hackage

stack install setdown

This works because setdown is on Hackage.

What is setdown and how does it work?

Setdown is a command line tool for line based set operations. To use setdown you write a "setdown definitions file" often suffixed with .setdown. If you are familiar with Make then you can think of this .setdown file much like a Makefile. Inside that file you write a number of definitions of the form:

definitionName: "file-1.txt" /\ "file-2.txt"

This line says that "definitionName" is a new set definition that is a label for the intersection of "file-1.txt" and "file-2.txt". You can write more complicated expressions than this.

Example Setdown Projects

Checkout the setdown-examples project on Bitbucket; it will show you how setdown works.

However, to get an in-depth description of setdown and its abilities you should read the sections below.

Input Files

In setdown each file is treated as a list of elements where each line is an element. Input files do not need to begin as sets; they can contain duplicate and unsorted elements. Setdown will automatically sort and de-duplicate all input files, turning them into sets.

Another important point is that of relativity: specifically, if you have a .setdown file that references the input file "some-elements.txt" and you run the setdown executable from a directory that is not the same directory as the .setdown file, where will setdown look for some-elements.txt? The answer is that setdown always looks for files relative to the .setdown file. That is where you wrote your definitions so the paths are relative to that. It was designed in this way so that you could run setdown from anywhere in the directory tree and still get the same result. Setdown has been designed to be current working directory invariant, as opposed to many other command line programs. Please keep this in mind.

Output

When setdown runs, it creates an output/ directory next to your .setdown file. Each named definition produces a result file in that directory. The result files are named with a UUID and contain one element per line, sorted and de-duplicated.

Progress and status messages are written to stdout as setdown works through your definitions. At the end of a successful run, a summary table is printed showing each definition name alongside the path to its result file.

You can choose a different output directory with the --output flag:

setdown --output=results mydefinitions.setdown

The path given to --output is relative to the .setdown file, not the current working directory.

Set Operations and Precedence

In the setdown language there are a number of supported operators:

  • Intersection: /\
  • Union: \/
  • Difference: -

For example, they might be used in the following way:

definition: (A - B) \/ (C /\ D)

You may be wondering what operator precedence the setdown language uses and the answer is: there is no operator precedence at all, instead you must clearly specify the precedence of nested expressions with brackets. This is very important because it will result in parsing errors otherwise. To explain the reasoning for explicit operator precedence:

-- Here is a simple expression
def: A /\ B \/ C
-- Now, should this be parsed as:
defV1: (A /\ B) \/ C
-- or as:
defV2: A /\ (B \/ C)
-- If you pretend that B is the empty set (E) then you can see that these expressions evaluate
-- completely differently. If we simplify them with that assumption then they become:
defV1-bempty: E
defV2-bempty: A /\ C

So as you can see, order of operations really matters for set operations. Because it is so critical the use of brackets is mandatory.

Comments

In the setdown language you can add comments by writing a double-dash (--) and then writing the comment to the end of the line. Comments can appear anywhere on a line — at the start, or inline after an expression.

-- This is a definition for A, created because we wanted to do X
A: "y.txt" - "z.txt"

-- This is an example of a comment halfway through an expression
B: (A \/ C) -- \/ D   This is still a comment and \/ D never happens

You can use comments to leave messages for any people that might read your setdown definitions in the future.

Language Reference

Identifier rules

Definition names (identifiers) may contain letters (upper and lowercase), digits, hyphens, and underscores:

[a-zA-Z0-9_-]+

For example, mySet, result-2, and Final_Output are all valid identifiers. Spaces and punctuation other than - and _ are not permitted.

Definition ordering

Definitions may appear in any order in your .setdown file. A definition may reference another that is defined later in the file. Setdown resolves all identifiers by name after parsing the complete file.

Circular definitions

Definitions must not form a cycle. For example:

A: "file.txt" \/ B
B: A /\ "other.txt"

This is invalid because A depends on B and B depends on A. Setdown detects cycles and exits with an error before performing any operations.

Writing your own definitions

In the setdown language you can write a definition in the following format:

<definitionName>: <expression>

Where the definition name is the identifier that you give to that expression. An expression is the application of set operations on identifiers or files. A practical example of what this looks like should help cement what this means. Here is a valid setdown file:

-- A is the intersection of the file b-1.out and the set B
A: "b-1.out" /\ B

-- B is the union of the file a-1.out and a-2.out
B: "a-1.out" \/ "a-2.out"

-- C is the difference of the file b-1.out and the set B
C: "b-1.out" - B

Usually, when you write these definitions you put them in a file that has a suffix of .setdown. You can then feed this file into the setdown executable like so:

setdown path/to/mydefinitions.setdown

Command-line flags

setdown evaluates a .setdown definitions file to perform set operations
(intersection, union, difference) on line-based text files, writing one result
file per definition to an output directory.

setdown [OPTIONS]

Common flags:
  -o --output[=DIR]               Directory in which to place output files,
                                  relative to your .setdown file. Defaults to
                                  'output' if omitted.
  -i --input=definitions.setdown  The .setdown definitions file to evaluate.
                                  If omitted, setdown looks for a single
                                  .setdown file in the current directory and
                                  uses it automatically. Exits with an error if
                                  zero or more than one are found.
     --show-transient             Also show intermediate results for
                                  sub-expressions generated internally to
                                  evaluate your definitions. Useful for
                                  debugging complex .setdown files.
  -? --help                       Display help message
  -V --version                    Print version information

Building the code

To build the code for this project, have Stack installed and then:

stack build

To run setdown during development:

stack exec -- setdown --help
stack exec -- setdown mydefinitions.setdown

Troubleshooting

Setdown prints a short error message to stdout and exits with a non-zero code when something goes wrong. The error codes are:

Exit code Cause
1 The file specified with --input does not exist.
2 Multiple .setdown files found in the current directory; use --input to select one.
3 No .setdown files found in the current directory; use --input to specify one.
11 Two or more definitions share the same name.
12 A definition references an identifier that has not been defined.
13 One or more input files referenced in the definitions could not be found.
20 A cyclic dependency was detected between definitions.

All file paths in error messages are relative to the .setdown file, not the current working directory.

Contributing to the setdown project

Contributions are welcome. The preferred workflow is:

  1. Open an issue describing what you intend to fix or improve.
  2. Write the code.
  3. Open a pull request and ask Robert Massaioli to review it.
  4. Iterate until the code is clean and merged.
  5. Celebrate!