How do I format my data file?

Data should be formatted as linear TSV or JSON. In the above config endpoint / file you will have defined the column data types.

JSON Format #

The data should be formatted as a JSON array with no wrapping object.

The data types supported are listed in the loaderOptions >> columns section of the schema. Example values for common supported data types are listed below.

TypeFormatExample value as json
stringdefault{“fieldname”: “foo”}
stringuri{“fieldname”: “”}
(data needs to be in ISO 6709 string expression (Annex H))
{“fieldname”: “+48.499998+23.3833318/”}
(data needs to be in ISO 8601)
{“fieldname”: “2011-10-05T14:48:00.000Z”}
numberdefault{“fieldname”: 123.6}

Examples values of common supported data types #

There are other formats and types listed in our schema, these are things that we would like to support in future, please let us know if any are of particular importance.

Linear TSV Format #

A tabular data file consists of zero or more records consisting of fields. Data types are formatted as above, quoting is optional. Encoding should be utf-8.

Records are separated by ASCII newlines (0x0a) (unix line endings commonly abbreviated as \n). Fields within a record are separated with ASCII tab (0x09 – tabs commonly abbreviated as \t). It is permitted but discouraged to separate records with carriage-return-newline (0x0d and 0x0a). (A literal carriage return or unix line ending in any other position is non-conforming.)

Zegami expects there to be a header line across the top of the file. This should give names to each of the columns and should not contain empty values. Field names should not be repeated.

All tabs, carriage returns and newlines in text fields must be removed and replaced with spaces to allow Zegami to process the data.

Records must contain at least one field. All fields must be present in every record. To indicate missing data for a field, the field is left blank. or the character sequence \N (bytes 0x5c and 0x4e) is used. Note that the N is capitalized.

This character sequence is exactly that used by SQL databases to indicate SQL NULL in their tab-separated output mode.

If a single backslash is encountered at the end of a field, it is an error. If a backslash precedes another character but does not form a null sequence \N, it is a “superfluous backslash” and is removed from the field on read. Such a “superfluous backslash” must never be written by a conforming implementation.

Powered by BetterDocs