Saturday, May 25, 2019

Loading Variables from another QVW

Sometimes you encounter a need for an adhoc import or export of variables. 

Code for load variables from another QVW.
VariableDescription:
LOAD 
 Name,
 RawValue
FROM [..\..\data\StudentFile.qvw] 
(XmlSimple, Table is [DocumentSummary/VariableDescription])
Where IsConfig = 'false' and IsReserved = 'false' // Exclude system vars
// Any addtional filtering here
;

FOR idx=0 to NoOfRows('VariableDescription')-1
 LET vVarname = Peek('Name',$(idx),'VariableDescription');
 LET [$(vVarname)] = Peek('RawValue',$(idx),'VariableDescription');
NEXT idx

SET idx=;
SET vVarname=;
DROP Table VariableDescription

Loading Varying Column Names

Imagine you have a number of text files to load; for example extract files from different regions.  The files are similar but have slight differences in field name spelling.   For example the US-English files use “Address” for a field name, the German file “Adresse” to represent the same field and the Spanish file “Dirección”.
We want to harmonize these different spellings so we have a single field in our final loaded table.  While we could code up individual load statements with “as xxx” clause to handle the rename, that approach could be difficult to maintain with many variations.  Ideally we want to load all files in a single load statement and describe any differences in a clear structure.  That’s where ALIAS is useful.  Before we load the files, use a set of ALIAS statements only for the fields we need to rename.
ALIAS Adresse as Address;
ALIAS Dirección as Address;
ALIAS Estado as Status;
The ALIAS will apply the equivalent “as” clause to those fields if found in a Load.
We can now load the files using wildcard “*” for both the fieldlist and the filename:
Clients:
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;
What if the files have some extra fields picked up by “LOAD *” that we don’t want?  It’s also possible that the files have different numbers of fields in which case automatic concatenation won’t work.  We would get some number of “Client-n” tables which is incorrect.
First we will add the Concatenate keyword to force all files to be loaded into a single table.   As the table doesn’t exist, the script will error with “table not found” unless we are clever.  Here is my workaround for that problem.
Clients:
LOAD 0 as DummyField AutoGenerate 0;
Concatenate (Clients)
LOAD *
FROM addr*.csv (ansi, txt, delimiter is ',', embedded labels, msq)
;
DROP Field DummyField;
Now let’s get rid of those extra fields we don’t want.  First build a mapping list of the fields we want to keep.
MapFieldsToKeep:
 Mapping
 LOAD *, 1 Inline [
 Fieldname
 Address
 Status
 Client
 ]
 ;
I’ll use a loop to DROP fields that are not in our “keep list”.
For idx = NoOfFields('Clients') to 1 step -1
  let vFieldName = FieldName($(idx), 'Clients');
  if not ApplyMap('MapFieldsToKeep', '$(vFieldName)',   0) THEN
    Drop Field [$(vFieldName)] From Clients;
EndIf
Next idx
The final “Clients” table contains only the fields we want, with consistent fieldnames.

Tuesday, May 21, 2019

QVD Questions & answers



Q: ­Does the QVD data get stored in an RDBMS like Oracle, or is it in a file system? ­
QVD files are stored in the file system.
Q: ­what is the compression factor for QVD’s­?
QVD files are stored uncompressed. A QVD contains the physical representation of an in-memory Qlikview Table. This “RAM image” format is what allows an optimized QVD load to be so quick. The physical blocks of disk are read directly into RAM, “ready to go”. Because QVD is the RAM image, there is no compression.
Q: ­Can we trace back QVD to its source?
As of QV10SR2, the XML header in a QVD file contains the name of the QVW that created the QVD as well as file sources and database connections/SQL statements.
Q: ­Why is sorting not possible while loading QVD?­
Sorting (ORDER BY) is only possible with Resident (already in memory) files. Sorting is not possible when reading from files.
Q: ­Could you go over again the concept of “forcing” un-optimized load for the MAPPING function, respective to the qvd?­
MAPPING tables may be loaded from a QVD, but it must be an un-optimized load (this is sometimes called “unwrapping”).
MyMap:
MAPPING LOAD F1F2 FROMsometable.qvd (qvd);
The above mapping table will be created but it will appear to be empty when used in MAP USING or ApplyMap().  No error, just no resulting mapping.  One workaround is to create a condition that will cause an un-optimized load.  We want all the rows, so we create an always-true condition that will return all rows.
MyMap:
MAPPING LOAD F1F2 FROMsometable.qvd (qvd)
WHERE 1=1; 

Note: In QV10+, the MAPPING prefix will trigger an unoptimized load. The 1=1 trick is not necessary.
A corollary to this is that the target of a mapping operation cannot be an optimized QVD.
MAP Country USING MyMap;
// Optimized load, Country will not get mapped.
LOAD CustomerCountry FROMcustomer.qvd (qvd);
Q: ­The use of the Where 1=1 is something that will be good for mapping fields in the future or is possible that qlikview will determine that where 1=1 will allow optimization?­
Good question. We use WHERE 1=1 to force the un-optimized load required by MAPPING LOAD. I’m hopeful that if Qlikview were changed to consider 1=1 as an optimized load, they will also recognize that MAPPING LOAD should be non-optimized.
(Note: In QV10, MAPPING LOAD is automatically non-optimized).
Q: ­How would you handle the need to load multiple models (ie multiple qvws)?? I don’t think you can do multiple binary loads, so what do you recommend.­
You can generate QVDs from each model and then load all the QVDs to form the larger model.  You can generate all QVDs from a qvw with a simple loop. You can add this code to each of your model qvws.
FOR i = 1 to NoOfTables()
  LET vTableName = TableName($(i)-1);
  LET vOutfile = ‘$(vTableName).qvd’;
  STORE [$(vTableName)] INTO [$(vOutfile)] (qvd);
NEXT i
Q: ­How are QVD refresh scheduled?­
QVDs are created by script in a QVW executed by the reload process. Schedule the reload as you would the reload of a user facing qvw, using the Qlikview Enterprise Management Console (QEMC) or a batch file.
Q: ­Is QVD Optimized load really worthwhile since it is fairly limited? In other words, should we load data to the memory striving for QVD optimized and then work with the memory tables within the script?­
Optimized vs non-optimized load has two impacts: Load duration and Server RAM usage. If your application is relatively small or you do not have concerns about the impact, don’t spend time trying to maintain an optimized load. Some of the script techniques used to maintain optimized can make your script harder to follow.
If, for a given document, you have concerns about load duration or RAM usage, then making the effort to maintain an optimized load would be worthwhile.
Q: ­Can a QVD be accessed from a AS400 DB2 database to get some data?­
Nothing but Qlikview can read from QVDs, so no, DB2 cannot read directly from a QVD.
In the same script that creates the QVD with the store statement:
STORE mytable INTO mytable.qvd (qvd);
You can also create a CSV copy for other consumers:
STORE mytable INTO mytable.csv (txt); 
The csv file can be read by any number of programs, including an ODBC text driver or a bulk database loader. You can use QV to do the ETL and then push csv files back into a Data Warehouse, using something like SQL Server DTS or other data pump.
Q: ­I’m pulling data from a database over a slow WAN link.  Would using a qvd speed this up?  If so, would the qvd file reside on the same side as the database or at the end of WAN Link (client side)?­
Using QVDs could speed up your overall process by allowing multiple reloads to load from the qvd instead of going to the database over the slow WAN link. The QVD should live at the client end of the link – where the qvw is reloading.
Q:  If the data source is constantly changing (such as portfolio management software) can we refresh qvds frequently? Will this overburden the process?
QVDs may be refreshed frequently. Exactly how frequently depends on your data volumes and architecture. Refreshing every 30 minutes is common, and I have seen intervals of 5 minutes.  Frequent refresh of large volumes usually requires incremental load, which is covered in the Reference Guide and the Forums.
Q: ­If add the BUFFER command before each load statement pulling from DBMS, the first execution pulls from the DB but all after are incremental loads pulling from a file system created batch of QVDs?­
The BUFFER prefix does not provide incremental load when loading from a DBMS.  Subsequent reloads will load from the buffered file system QVD, but new rows will not be fetched from the database.
When used with a load from txt files, BUFFER will provide automatic incremental load. Subsequent reloads will add new data from the file to the buffered QVD.
Q: I am running SBE Server so documents are reloaded right from the Documents folder.  What is your recommendation for location of the QVD generator documents?  In other words, do you place them in the Documents folder alongside your production QVW’s? 
I recommend putting the QVD generators in a separate “Loaders” folder. Make this a mounted folder in QVS and schedule reloads as needed. Use NTFS permissions to hide the folder from standard AP users
A number of questions were asked about the QVX format. I haven’t had much experience with QVX yet. Rob Patterson has indicated he will schedule a QlikLearn webinar specifically on the topic of QVX.
Q: Does ­QVX also have two types of load, optimized and not optimized?­
No optimized load only applies to QVD.
Q: ­What are the other differences between QVD and QVX?­
QVD is a proprietary file format provided by QlikView for storage. Only QlikView software can read and write to QVD files.
The QVX is in an open format performance file for storage of QlikView data. A customer or third party can create QVX files on any platform, without needing Qlikview software.
QVD files will typically load faster than a QVX file.
Q: ­Is QVX used as a source to other source systems or is it used to pull the data from source systems which has no ODBC provider?­
The use cases for QVX are still being discovered, and I’m sure we’ll see some interesting uses.  The scenario I currently understand is to provide data to Qlikview when there is no ODBC provider.
Q: ­How can I create a QVX?
Q: ­How do you write out to a QVX?­
Q ­How can you read QVX from other software than Qlikview .­
Documentation of the internal QVX format is available in the Qlikview SDK. The SDK can be installed from the Qlikview Server installation package. Also look for examples in the “Share Qlikviews” section of QlikCommunity.
You can also create a QVX with a script STORE statement:
               STORE mytable INTO mytable.qvx (qvx);
This is useful to generate a sample QVX for examination or testing.

INCREMENTAL LOAD USING SQL SERVER “TIMESTAMP” DATA TYPE




Incremental Load (extracting only new or changed rows from the database) requires a table column that identifies when a row has been updated. This is usually a datetime column like “LastUpdate”. The script extracts the max timestamp from the existing QVD and uses it to create a select predicate like:
WHERE LastUpdate >= ’01-20-2010 13:55:01′
If the database is Microsoft SQL Server, you may have another option for identifying updated rows. A table column of type “timestamp” is incremented whenever a row is changed. The name “timestamp” in somewhat confusing, as this column does not contain time values. Here is the description from the 2005 Reference:
“timestamp is a data type that exposes automatically generated, unique binary numbers within a database. timestamp is generally used as a mechanism for version-stamping table rows. The storage size is 8 bytes. The timestamp data type is just an incrementing number and does not preserve a date or a time. To record a date or time, use a datetime data type.”

In Sql  Server  2008, the type “rowversion” was introduced as an alias for “timestamp”. Rowversion is preferred for 2008 and forward.

If your table does not already have a timestamp column, you can add one with an ALTER DDL statement, For example, to add a timestamp column named “Revision” to a table named “Orders”:

alter table Orders ADD [Revision] [timestamp] NOT NULL;

The Revision column will be returned by default as a hexadecimal string, so it’s easiest to convert to an integer before storing in QV. 

SQL SELECT 
    OrderID, etc, 
    cast(Revision as bigint) as Revision
FROM Orders …
In subsequent incremental loads, add a predicate to select only rows greater than the last revision. 
// Select the max value from the qvd
tempmax:
LOAD max(Revision) as Revision FROM myqvd.qvd (qvd);
LET vMaxRevision = PEEK(‘Revision’);
DROP TABLE tempmax:
SQL SELECT
  OrderID, etc,
  cast(Revision as bigint) as Revision
FROM Orders
WHERE Revision > $(vMaxRevision);
I find the timestamp value, when available, to be easier to use than a datetime column. It’s just a numeric, so no literal formatting is required. 

Because it’s a precise and unique value, you avoid the “always one row” problem. When selecting from a datetime, you usually have to specify “>=” because a datetime is not a unique value. This means that a select will return at least one row, even if there were no real updates. 

LOADING MULTIPLE EXCEL SHEETS




Load from Excel is usually pretty straightforward, but sometimes you’ll need to load multiple sheets and make some determinations at runtime. Details such as sheetnames may not be known at script creation time. The QV statements “SQLTables” and “SQLColumns” may be used to discover information about the sheets and columns available in a workbook. Both of these statements require an ODBC connection. The ODBC connection may also be used to subsequently read the data, but I find using the LOAD biff more convenient. First make a OLEDB connection to the workbook:

CONNECT TO [Provider=Microsoft.Jet.OLEDB.4.0;Data Source=workbook.xls;Extended Properties=”Excel 8.0;”];

 Specify the workbook name, relative to the current directory, in the “Data Source=” parameter. This example uses a “DSN-less” connection. It does not require you to predefine an ODBC datasource. The SQLTables statement return a set of fields describing the tables in the currently connected ODBC datasource, in this case the workbook. A “Table” is an Excel Sheet. tables: SQLtables; Now I’ve got a list of sheets in the QV “tables” table. The field name that contains the sheetname is “TABLE_NAME”. I’ll loop through the set of TABLE_NAME values and load each one using a standard biff LOAD.
 FOR i = 0 to NoOfRows(‘tables’)-1 LET sheetName = purgeChar(peek(‘TABLE_NAME’, i, ‘tables’), chr(39)); Sales: LOAD * FROM workbook.xls (biff, embedded labels, table is [$(sheetName)]); NEXT Sheetnames that contain blanks will be surrounded by single quotes. The purgeChar() function above removes any single quotes that may be present in the sheetname. What if I only want to load those sheets names whose name begins with “Sales”? Wrap the LOAD statement in an IF statement to test the sheetname: IF wildmatch(‘$(sheetName)’, ‘Sales*’) THEN LOAD ….. END IF How about this case? I want to load any sheet that contains the three columns “Sales”, “Year” and “Quarter”: columns: SQLColumns; // Get list of columns // Join list with columns of interest RIGHT JOIN (columns) LOAD *; LOAD * INLINE [ COLUMN_NAME Quarter Sales Year ] ; // Create a count of how many columns of interest each sheet has selectSheets: LOAD TABLE_NAME as SheetName, count(*) as count RESIDENT columns GROUP BY TABLE_NAME ; // Keep only the SheetName that have all 3 columns RIGHT JOIN LOAD SheetName RESIDENT selectSheets WHERE count = 3 // Load the selected sheets FOR i = 0 to NoOfRows(‘selectSheets’)-1 LET sheetName = purgeChar(peek(‘SheetName’, i, ‘selectSheets’), chr(39)); LOAD…. NEXT You may wonder if you could use the Excel Driver instead of the Jet provider like this: CONNECT TO [Provider=MSDASQL;Driver={Microsoft Excel Driver (*.xls)};DBQ=workbook.xls]; The connection will complete and you can use this connection for SQL SELECTs. However, when SQLTables is called, the connection will enumerate tables/columns for all the *.xls files in the current directory. This provider uses the parameter “DefaultDir=” (default is .) to control which directory is enumerated for SQLTables and SQLColumns calls. The DBQ parm plays no part. You may find this useful as an alternative to using a traditional “for each filelist…” loop to process multiple files.