Saturday, May 25, 2019

Loading Varying Column Names

Imagine you have a number of text files to load; for example extract files from different regions.  The files are similar but have slight differences in field name spelling.   For example the US-English files use “Address” for a field name, the German file “Adresse” to represent the same field and the Spanish file “Dirección”.
We want to harmonize these different spellings so we have a single field in our final loaded table.  While we could code up individual load statements with “as xxx” clause to handle the rename, that approach could be difficult to maintain with many variations.  Ideally we want to load all files in a single load statement and describe any differences in a clear structure.  That’s where ALIAS is useful.  Before we load the files, use a set of ALIAS statements only for the fields we need to rename.
ALIAS Adresse as Address;
ALIAS Dirección as Address;
ALIAS Estado as Status;
The ALIAS will apply the equivalent “as” clause to those fields if found in a Load.
We can now load the files using wildcard “*” for both the fieldlist and the filename:
Clients:
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;
What if the files have some extra fields picked up by “LOAD *” that we don’t want?  It’s also possible that the files have different numbers of fields in which case automatic concatenation won’t work.  We would get some number of “Client-n” tables which is incorrect.
First we will add the Concatenate keyword to force all files to be loaded into a single table.   As the table doesn’t exist, the script will error with “table not found” unless we are clever.  Here is my workaround for that problem.
Clients:
LOAD 0 as DummyField AutoGenerate 0;
Concatenate (Clients)
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;
DROP Field DummyField;
Now let’s get rid of those extra fields we don’t want.  First build a mapping list of the fields we want to keep.
MapFieldsToKeep:
 Mapping
 LOAD *, 1 Inline [
 Fieldname
 Address
 Status
 Client
 ]
 ;
I’ll use a loop to DROP fields that are not in our “keep list”.
For idx = NoOfFields('Clients') to 1 step -1
  let vFieldName = FieldName($(idx), 'Clients');
  if not ApplyMap('MapFieldsToKeep', '$(vFieldName)',   0) THEN
    Drop Field [$(vFieldName)] From Clients;
EndIf
Next idx
The final “Clients” table contains only the fields we want, with consistent fieldnames.

No comments:

Post a Comment