Tuesday, May 21, 2019

LOADING MULTIPLE EXCEL SHEETS




Load from Excel is usually pretty straightforward, but sometimes you’ll need to load multiple sheets and make some determinations at runtime. Details such as sheetnames may not be known at script creation time. The QV statements “SQLTables” and “SQLColumns” may be used to discover information about the sheets and columns available in a workbook. Both of these statements require an ODBC connection. The ODBC connection may also be used to subsequently read the data, but I find using the LOAD biff more convenient. First make a OLEDB connection to the workbook:

CONNECT TO [Provider=Microsoft.Jet.OLEDB.4.0;Data Source=workbook.xls;Extended Properties=”Excel 8.0;”];

 Specify the workbook name, relative to the current directory, in the “Data Source=” parameter. This example uses a “DSN-less” connection. It does not require you to predefine an ODBC datasource. The SQLTables statement return a set of fields describing the tables in the currently connected ODBC datasource, in this case the workbook. A “Table” is an Excel Sheet. tables: SQLtables; Now I’ve got a list of sheets in the QV “tables” table. The field name that contains the sheetname is “TABLE_NAME”. I’ll loop through the set of TABLE_NAME values and load each one using a standard biff LOAD.
 FOR i = 0 to NoOfRows(‘tables’)-1 LET sheetName = purgeChar(peek(‘TABLE_NAME’, i, ‘tables’), chr(39)); Sales: LOAD * FROM workbook.xls (biff, embedded labels, table is [$(sheetName)]); NEXT Sheetnames that contain blanks will be surrounded by single quotes. The purgeChar() function above removes any single quotes that may be present in the sheetname. What if I only want to load those sheets names whose name begins with “Sales”? Wrap the LOAD statement in an IF statement to test the sheetname: IF wildmatch(‘$(sheetName)’, ‘Sales*’) THEN LOAD ….. END IF How about this case? I want to load any sheet that contains the three columns “Sales”, “Year” and “Quarter”: columns: SQLColumns; // Get list of columns // Join list with columns of interest RIGHT JOIN (columns) LOAD *; LOAD * INLINE [ COLUMN_NAME Quarter Sales Year ] ; // Create a count of how many columns of interest each sheet has selectSheets: LOAD TABLE_NAME as SheetName, count(*) as count RESIDENT columns GROUP BY TABLE_NAME ; // Keep only the SheetName that have all 3 columns RIGHT JOIN LOAD SheetName RESIDENT selectSheets WHERE count = 3 // Load the selected sheets FOR i = 0 to NoOfRows(‘selectSheets’)-1 LET sheetName = purgeChar(peek(‘SheetName’, i, ‘selectSheets’), chr(39)); LOAD…. NEXT You may wonder if you could use the Excel Driver instead of the Jet provider like this: CONNECT TO [Provider=MSDASQL;Driver={Microsoft Excel Driver (*.xls)};DBQ=workbook.xls]; The connection will complete and you can use this connection for SQL SELECTs. However, when SQLTables is called, the connection will enumerate tables/columns for all the *.xls files in the current directory. This provider uses the parameter “DefaultDir=” (default is .) to control which directory is enumerated for SQLTables and SQLColumns calls. The DBQ parm plays no part. You may find this useful as an alternative to using a traditional “for each filelist…” loop to process multiple files.

No comments:

Post a Comment