Friday, September 12, 2008

Setting Up TPC-H: Part 1

TPC-H is the data warehouse benchmark of the Transaction Processing Council (their web site has lots of results submitted by vendors trying to display the prowess of their hardware and/or software). As in the case of all benchmarks, TPC-H is not perfect - it's not even a star schema so doesn't really represent 99% of real data warehouses, comparing results between systems, groups, companies etc is fraught with difficulty and complication but we're gonna do it anyways! Just remember all the usual benchmark caveats.

Download and untar the files from the TPC-H web site then make a copy/rename makefile.suite to makefile and edit the four lines that specify the compiler on your system (CC), database, machine and workload

CC = gcc
DATABASE= ORACLE
MACHINE = LINUX
WORKLOAD = TP
CH

When setting the database you'll notice that there is no predefined type for Oracle. Huh? The company that has 45% of the RDBMS market is not listed here? Either Oracle requested this or the TPC guys are extremely biased in favor of IBM or Microsoft (their web site does use ASP.NET :-) Because of this we need to define an Oracle section ourselves. Edit tpcd.h and add section for Oracle (with all variables defined to empty strings, this is the simplest setup that works)

#ifdef ORACLE
#define GEN_QUERY_PLAN ""
#define START_TRAN ""
#define END_TRAN ""
#define SET_OUTPUT ""
#define SET_ROWCOUNT ""
#define SET_DBASE ""
#endif /* ORACLE */

Then just type make to compile, this will generate two executables dbgen and qgen which, respectively, are used to generate flat files for loading into the database and the queries to run. See the README for gory details.

1 comment: