
1 - DDF file format
2 - Character conversions
3 - Invoking

------------------------------------------------------------------------
1) DDF file format

Those files are used to define how l2disasm and l2asm are supposed to treat L2's dat files. A DDF file is a shortcut for "dat definition file", and is common for both of them. Also, its format is a bit similar to C language.

You can use C and C++ style comments. At the top of your files, you can specify a few control variables (all of them are are optional). Note, that certain literals have numeric value: NO = 0, YES = 1, OFF = -1. You can use them exchangibly.

List of available control variables, with their default values.

FS = "\t";

FS is a literal string (like in sed) used as a field separator. You can use any string you like. \t and \" are detected and replaced by tabulation and doublequote char. L2asm ignores this field and always assumes "\t".

RECCNT = OFF;

RECCNT is a variable used to specify number of the fields. Pretty much all dat files, except very few ones (i.e. chargrp.dat), include this information implicitly. If you specify it, l2disasm will explicitly read only RECCNT rows and assume that the counter stored in dat files is not available. L2asm will not write one.

MTXCNT_OUT = YES;

This boolean variable controls output of counters in MTX and MTX2 pairs. L2asm silently assumes it to be YES.

MATCNT_OUT = YES;

Similarly to above, but for MAT fields. L2asm silently assumes it to be YES.

ORD_IGNORE = NO;

A variable controlling if l2disasm should ignore per-field ORD properties. L2asm always ignores per-field ORD properties, as well as this global variable.

HEADER = YES;

For l2disasm - controls if the first row is to be the line with tag names. For l2asm - if to skip this line. Command line -q option always override this variable.

MAGIC = 0;

Amount of unusual dwords appearing at the end of a file. No dat file needs it now.



The main body of some ddf file looks like could look like this:


{
        UINT ID;
        UINT val1;
        UINT val_enb;
        UINT cnt;
        ASCF str1;
                ORD = 1;
        UINT val2[10];
                SKIPIF = [(1, 3), (4 .. 8, -4 .. -2)];
        ASCF str2;
        ASCF str3;
	CNTR cntc;
	UINT tabc[cntc];
        CHAR c;
        UINT val3[val1];
                SOFT = 4;
        MTX tab[cnt];
                SOFT = 4;
                SOFTM = 2;
                SOFTT = 3;
                ENBBY = [(val_enb,1)];
                ENBBY = [(val_enb,2),(val1,3)];
	FILLER void{50};
	MTX2 arr;
}

Each field consists of the type and ident. Ident must start with a non-number. It can contain anything except:

whitspaces, '[', ']', '(', ')', '{', '}', '=', ',', ';' and ';'.

Type must be one of the following (in round brackets - C's counterparts): UINT (uint32_t), INT (int32_t), UWORD (uint16_t), WORD (int16_t), UCHAR (uint8_t), CHAR (int8_t), HEX (uint32_t printed as hex), CHEX (uint8_t printed as hex), FLOAT (float), UNICODE, ASCF, MAT, MTX, MTX2, FILLER, CNTR.

ASCF is specific half-ascii/half-unicode format used commonly in dat files. UNICODE is just a plain unicode with int size in front of the string. Both of them are written as UTF-8, unless forced otherwise.

CNTR is a special 1-n bytes "packed" counter (similar counter is used by ASCF internally).

The other fields can be described as (as you can see, you can easily emulate MTX, but not the rest):

MTX {
	INT cntm;
	UNICODE mesh[cntm];
	INT cntt;
	UNICODE text[cntt];
}

MTX2 {
	INT cntm;
	{
	        UNICODE mesh;
		UINT val1;
		UCHAR val2;
	} submtx2[cntm]
	INT cntt;
	UNICODE text[cntt];
}

MAT {
        INT cnt;
	{
		INT id;
		INT val;
	} mats[cnt];
}

Square brackets denote a table - its counter can be controlled by a direct number (we'll call it a static table), or by the name of some other field (we'll call it a dynamic table). In the latter case, the field must be of numeric type.

FILLER uses curly braces to distinguish itself from regular tables. It fills {cnt} bytes with predefined value. It uses only one column in a decoded file. FILLER cannot be a table - so for example, FILLER dat{10}[10] is not allowed.



Every field may also have certain properties:

ENBBY controls which other fields enable current one. This is primarily a feature for weapongrp.dat - that has more fields for 2H weapons. In the example above, MTX dynamic table 'tab' will be read from the dat file only, if ( val_enb == 1 || ( val_enb == 2 && val1 == 3 ) ).

ORD controls order. ORD == OFF makes the l2disasm not output the whole column at all. Any other integer sets the relative order. The remaining fields are written in the default order at the end. In the example above the first field will be ASCF str1. The rest will come after in their default order. Mentioned earlier ORD_IGNORE variable can disable the custom order globally. L2asm ignores both order properties and the global variable.

SOFT, SOFTM, SOFTT - these are properties that control, how many columns are to be printed, when a dynamic table is used. SOFT is for regular fields like UINT val3[val1] in the example above. SOFTM and SOFTT control the internals of MTX and MTX2. SOFTM also controls MAT.

L2disasm updates the fields in the first pass, if they are not specified, or are too low. L2asm REQUIRES these properties to be set on all MTX, MTX2, MAT and dynamic table fields. For easy generating DDF files for l2asm - please use l2disasm's -e option.

SKIPIF is used to skip printing some of columns' data, depending on the values. Looking at the example below:

        UINT val2[10];
                SKIPIF = [(1, 3), (4 .. 8, -4 .. -2)];

In such case, val2's certain values will be printed as empty strings:

val2[1] if it equals 3
val2[4] to val[8] if current column's value is in [-4 .. -2] inclusive.

You can use float ranges for float fields, and integer ranges for integer-like fields. Specifying one number is like specifying the range 'number .. number'. Don't forget spaces around '..' (or range will be mistaken for invalid float number).



------------------------------------------------------------------------
2) Character conversions

DAT files have two type of strings:

 - plain UCS-2LE unicode
 - hybrid 8-bit / UCS-2LE

L2disasm always saves them as UTF-8.

If -l flag is specified, it forces behaveiour analogous to the one in older versions. In such case - strings are saved as almost-dumb conversions from ucs-2le to ascii (some basic transliterations are executed), or 1-1 from ascii to ascii. This will likely produce invalid utf-8, so remember about -l during l2asm execution.

Regardless of the mode of operation - backslashes, tabs, nuls, crs, lfs are saved respectively as \\, \t, \0, \r and \n.

ASCFs are saved as a,<string> or u,<string>. Initial code-letter is a hint how should it be treated later by l2asm.

Another flag that can be used is -f, that will force all ASCF strings to be saved with 'a' hint (l2disasm), or force encoding as 8-bit regardless of the hint (l2asm). Note though, that if you operate in non-legacy mode, l2asm will usually fail in forced translation of complex charsets (kanji, etc.) to some plain 8-bit.

Generally, -f should be used as an addition to -l, to get results like in pre-1.05 versions.

Finally, -a lets you choose how 8-bit characters should be interpreted as. Default is ISO-8859-1, if nothing is specified.


--------------------------------------------------------------------------
3) Invoking


l2disasm <-d ddf_file> [-q] [-e export] input_file output_file

-d is mandatory, and so are input and output files

-e is optional, and outputs beautified ddf file with automatically updated options (particularly useful for SOFT* options for l2asm).

-q overrides HEADER variable, and supresses printing header line

-l force 'dumb' translation with basic transliterations

-f force saving of all ASCF's with 'a' hint

-a <chartab> let you select how 8-bit chars should be interpreted - defaults to ISO-8859-1



l2asm <-d ddf_file> [-q] input_file output_file

-q overrides HEADER variable - l2asm will assume header line is not present

-l force 'dumb' translation with basic transliterations

-f force encoding of all ASCF strings as 8-bit

-a <chartab> let you select how 8-bit chars should be interpreted - defaults to ISO-8859-1
