CmpMain.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Using the language manager for UDF compiler interface

blueprint cmp-tmudf-compile-time-interface

This change includes new CLI calls, to be used in the compiler to

invoke routines. Right now, only trusted routines are supported,

executed in the same process as the caller, but in the future we may

extend this to isolated routines. Using a CLI call allows us to share

the language manager between compiler and executor, since language

manager resources such as the JVM and loaded DLLs exist only once per

process. This change is in preparation for Java UDFs.

Changes in a bit more detail:

- Added 4 new CLI calls to allocate a routine, invoke it, retrieve

updated invocation and plan infos and deallocate (put) the routine.

The CLI globals now have a C/C++ and a Java language manager that

is allocated on demand.

- The compiler no longer loads a DLL for the UDF compiler interface,

it uses the new CLI calls instead.

- DDL syntax is changed to allow TMUDFs in Java (not officially

supported, so don't use it quite yet).

- TMUDFs in C are no longer supported, only C++ and Java are.

Converted remaining TMUDF tests to C++.

- C++ TMUDFs now do a basic verification at DDL time, so errors

like missing entry points are detected earlier. Validation for

Java TMUDFs is also done through the CLI.

- Make sure we have no memory or resource leaks:

- CmpContext keeps track of UDF-related objects allocated on

system heap and in the CLI, cleaned up at the end of a statement

- CLI keeps a list of allocated trusted routines, cleaned up

when a CLI context is deallocated

- Using ExeCliInterface class to make the new CLI calls (4 new calls

added).

- Removed CmpCli class in the optimizer directory and converted

tracking compiler to use ExeCliInterface as well.

- Compile-time parameter values are no longer baked into the

UDRInvocationInfo. Instead, they are provided as an input row, the

same way as they are provided at runtime.

- Bug fixes in C++ UDR code, mostly related to serialization and

to multiple interactions with the UDF through serialized objects.

- Added more info to UDRInvocationInfo (SQL access type, etc.).

- Since there are multiple plans per invocation, each of which

can have multiple interactions with the UDF, plans need to be

numbered so the UDF side can tell them apart to attach the

right state (owned by the UDF) to it.

- The language manager needs some functions that are provided by

the process it's running in. Added those (empty, for now) functions

as cli/CliImplLmExtFunc.cpp.

- Added a new class for Java TMUDFs, LmRoutineJavaObj. Added methods

to allocate such routines and to load their class as well as to

create Java objects by invoking the default constructor through JNI.

- Java TMUDFs use the new UDR interface (to be provided by Suresh and

Pavani). In the language manager, the container is the class of

the UDF, the external path is the fully qualified jar name. The

Java method name is <init>, the default constructor, with signature

"()V". Some code changes were required to do this.

- Created a new directory trafodion/core/sql/src for Java sources in

the sql engine. Right now, only language manager java

sources are in this directory, but I am planning to move the other

java sources under sql in a future checkin. Suresh and Pavani

will add their UDF-related Java files there as well.

- Renamed the udr jar to trafodion-sql-<version>.jar, in anticipation

of combining all the sql Java sources into this jar.

- Created a maven project file trafodion/core/sql/pom.xml and

changed makefiles to invoke maven to build java sources.

- More work to separate new UDR interface from older SPInfo object,

so that we can get rid of SPInfo if/when we don't support the older

style anymore.

- Small fix to odb makefile, make clean failed when executed twice.

Patch set 2: Adding a custom filter for test regress/udr/TEST108.

Change-Id: Ic827a42ac25505fb1ee451b79636c0f9349d8841

  1. … 98 more files in changeset.
metadata fixes and 'sqlmp' code cleanup

-- NATable struct for metadata was being created multiple

times whenever information for a new table was read

from metadata. That has been fixed.

-- an 'initialize trafodion, drop' followed by 'initialize traf'

from the same session was failing due to priv info not getting

reset. This would show up if 'initialize authorization' was

done earlier. That has been fixed.

-- code cleanup mostly related to sqlmp legacy code and reference.

Change-Id: I346e3f3bbc6c7784b38e7e2e1f11d487854c281c

  1. … 54 more files in changeset.
Add new PCode Expression Cache feature.

This new cache is maintained by the SQL Compiler. The purpose of this

cache is to avoid the fairly expensive logic involved in transforming

unoptimized PCode to optimized PCode and, where applicable, to also

avoid

the logic involved in transforming optimized PCode to a Native

Expression. This cache is accessed ONLY by the SQL Compiler code.

NOTES:

* This is second attempt to check in this code. First attempt had to

be abandoned as other developers made changes which prevented

automatic merging.

* This code has been pre-reviewed by Justin, Qifan, Selva, Mike,

Ravisha, Suresh, and Dave B. Many thanks to them for various

suggestions. Most of those suggestions have been incorporated into

this delivery. A few are left for future improvements.

* There is one instance of this new cache per CmpContext.

* There are 5 new CQDs used to control this cache. To be effective for

all instances of the cache, these need to be set in the system

defaults table. The CQD command given to sqlci will affect only the

instance of the cache for the current CmpContext.

The 5 CQDs are:

PCODE_EXPR_CACHE_ENABLED - set to 0 to disable the cache. Default is 1

PCODE_EXPR_CACHE_SIZE - max size in bytes. Default is 2,000,000.

PCODE_EXPR_CACHE_CMP_ONLY - Compare Only mode - useful to QA and

Development only.

PCODE_EXPR_CACHE_DEBUG - set to 1 to enable debug mode. Default is 0

PCODE_DEBUG_LOGDIR - pathname of existing directory where debug log

files will be placed -- one log file per cache

instance. Log files are designed to be easily

imported into an Excel Spreadsheet. No default.

* Also included are a small number of changes to the Native Expressions

feature to (a) Use the new PCODE_DEBUG_LOGDIR cqd to specify where to

put the Native Expressions debug log files, (b) measure cpu-time

rather than wall-clock time for measuring how long it took to produce

a Native Expression, and (c) add a CQD named PCODE_NE_ENABLED so we

can easily disable the Native Expressions feature [though there is

currently no known reason for doing so.]

Change-Id: I58f833f63099743ff6c1107acdff94fe8aef4b70

  1. … 14 more files in changeset.
OSIM (Optimizer Simulator) redesign 1.

Simulate query plan generation of production cluster on dev workstation,

by collecting information from production cluster, and restore it on dev worksation.

--runnig on production clusters, collect table DDLs, statistics, CQD, to osim-directory,

--the directory path can either full(absolute) or relative.

osim capture location '<osim-directory>'[, force];

--runing queries on cluster

osim capture stop;

--restore DDLs, CQDs, statistics and cluster information.

osim load from '<osim-directory>';

--setup runtime information, like cpu number, node number.

osim simulate start|continue '<osim-directory>';

Change-Id: I30882e87a6ea0f08c9aa64685705eebebcbb3bf0

  1. … 38 more files in changeset.
Fixes for security gaps

Fix summary:

1389791 – Create table with 128 character-long schema & table names hangs on HortonWorks

fix 1 - Privilege checks not working for UDRs

fix 2 - QI not working when UDR's are involved

fix 3 - Routines are not being removed from NARoutineDB cache

Code cleanup

Miscellaneous changes

1389791: Create table with 128 character-long schema & table names hangs on HortonWorks

Check to make sure the total name length is not longer than supported value,

see: https://issues.apache.org/jira/browse/HDFS-6055

bin/SqlciErrors.txt - new error message

sqlcomp/CmpCatSqlErrorCodes.h - new error message

sqlcomp/CmpSeabaseDDLmd.h - new literal describing length of generated HBase name

sqlcomp/CmpSeabaseDDLcommon.cpp - new check for maxmum HBase name length

fix 1: privilege checks are not working correctly for UDR's

The method RelRoot::checkPrivileges is called to verify privileges for all object types.

However, some UDR objects checks were skipped because they were not added to the UDR Stoi list.

optimizer/BindItemExpr.cpp - add function to Stoi list

optimizer/BindRelExpr.cpp - add procedures to Stoi List

optimzier/RelMisc.h - signature changes for privilege related work

optimizer/BindRelExpr.cpp - rewrote checkPrivileges

optimizer/NARoutine.h/NARoutineDB.cpp - added method

moveRoutineToDeleteList

fix 2: QI is not working when UDR's are dropped

Code to drop items from NARoutineDB cache was missing.

Code to set security keys for the user in the plan was missing

Code to set objectUIDs in the plan was missing

When security keys were added, they were incorrect

sqlcomp/CmpMain.h (.cpp) - added calls to compare invalidation keys with objects stored in

NARoutineDB cache; if found, then remove item from cache by

calling helper methods in NARoutineDB class.

optimizer/NARoutineDB.h (NARoutine.cpp) - added helper method to remove entries from the cache

free_entries_with_QI_key - based off of similar method for table cache

ComSecurityKey.h (.cpp) - new method to check invalidation keys shared by tables/routines

qiCheckForInvalidObject

optimizer/NATable.cpp - rewrote table invalidation code so it could be shared with routines.

generator/GenUdr.cpp - add the routine's object UID to the query plan

sqlcomp/CmpSeabaseDDLroutine.cpp - code to send invalidations keys during drop routine

common/ComSmallDefs.h - new QI actions for USAGE and REFERENCES

common/ComDistribution.cpp - add EXECUTE as a privilege for QI, also added USAGE and REFERENCES

sqlcomp/PrivMgrPrivileges.cpp - not generating correct security keys

fix 3: Routines were not being removed from NARoutineDB cache

Added new fields to the various routine structures for objectOwnerID, schemaOwnerID, and privInfo.

Set up the correct routineID in various routine structures

At drop time, made sure routine was removed from NARoutineDB cache

comexe/ComTdb.h - added new fields to routine descriptor and TDB

generator/Generator.cpp - new fields for routines

optimizer/NARoutine.h (.cpp) - new fields for routines

removeNARoutine - based off similar method for table cache

optimizer/NARoutine.cpp - added new field to store privilege information in NARoutine,

which also gets security keys needed for query invalidation

sqlcat/desc.h - new fields for routines

sqlcomp/CmpSeabaseDDLtable.cpp - set up new values in NARoutine structure

sqlcomp/CmpSeabaseDDLroutine.cpp - code to remove entries from cache at drop time

Other changes:

sqlcomp/PrivMgrCommand.h (.cpp) - performance change, don't check authorization enabled

sqlcomp/PrivMgrMD.h (.cpp) - performance change, don't check authorization enabled

sqlcomp/PrivMgrDesc.cpp - missing object_type

parser/sqlparser.y - incorrect object type set for grant/revoke on UDRs

ustat/hs_globals.cpp - incorrect error returned

Code cleanup:

cli/Statement.h - remove obsolete code

cli/Statement.cpp - remove obsolete code

common/Collections.h - remove obsolete code

generator/GenRelMisc.cpp - remove obsolete code

optimizer/ItemCache.cpp - remove obsolete code

optimizer/RelCache.cpp - remove obsolete code

optimizer/NARoutine.h - remove obsolete code

optimizer/NARoutine.cpp - remove obsolete code

executor/SqlTableOpenInfo.h - new helper methods to check privileges

sqlcomp/PrivMgrMD.h - new helper methods to check privileges and get text for error

sqlcomp/PrivMgrDefs.h - simplification of code for checkPrivileges method

Change-Id: I981ad7f094b79a25f5e0aca30dedea4601b424ea

  1. … 39 more files in changeset.
Fix HQC Bugs: LP1421374 LP1409863 LP1409830

Change-Id: Icdf2c983d83456feb1120af28d2320549ecc7638

  1. … 14 more files in changeset.
Hybrid Query Cache feature implemented.

The Hybrid Query Cache (HQC) is an enhancement of existing Query Cache,

which is trying to find match queries in existing query cache at an earlier point,

i.e. just after parser and before binder so as to avoid binder overhead if there's a hit.

Two virutal table ISPs are added to show stats of Hybrid Query Cache.

Add control of ISP to run locally or remotely.

Changes after reviewers' comments.

Fixup errors in SqlciErrors.txt that cause core/TEST014 failure.

Fix minor issues about (hybrid)query cache ISP.

Add HQC virtual table ISP tests to compGeneral/TEST042.

Change-Id: Ib5be56e04990639153747255834b30fc9c3f3829

  1. … 40 more files in changeset.
Fix performance regression due to QI for DDL

This check-in omits object UIDs from query plans for the tables

SB_HISTOGRAMS and SB_HISTOGRAMS_INTERVALS. Previously, when the

code generator tried to add the object UIDs for these, it had to

make a special query to the metadata, since the corresponding

internal cached structure omitted object UIDs when they were

created via methods like Generator::createVirtualTableDesc. The

special query to lookup these object UIDs was shown to be

responsible for a large pathlengh regression.

Change-Id: Id5046c5c55a4fc8dd2ba3f891449ea87d35a5534

Closes-Bug: #1398600

  1. … 7 more files in changeset.
Query Invalidation triggered by DDL, phase 1

This first check-in implements most of the framework which will

be used to complete the QI DDL feature. It redefines the old

security invalidation key (SQL_SIKEY) to handle DDL operations in

addition to REVOKE. In a limited number of DDL operations, the object

UIDs of affected Seabase objects are propagated to all nodes for

use by the compiler to invalidate NATable cache entries, as

well as a limited number of types of cached queries. Later this

month, the framework will be complete by allowing prepared queries

that have already been returned from the compiler to be invalidated.

Then the next step for the framework will be support for invalidating

the HTable cache. Finally an effort will be made to cover all of

the necessary DDL operations and all types of cached queries.

The check-in include a new regression test (executor/TEST122) that

demonstrates the cases that are covered. Specifically, a table will

be dropped and recreated with the same name but different definition

in one sqlci session. In another session, which has already populated

NATable cache and query cache for INSERT, UPDATE, DELETE, SELECT,

SELECT COUNT(*), INVOKE and SHOWDDL statements, those some types

of statements will be resubmitted and correctly compiled.

Change-Id: Ie61ce751089b57ce1894f1764c338e9400bb7b8a

Closes-Bug: #1329358

Implements: blueprint ddl-query-invalidation

  1. … 41 more files in changeset.
Index-join scan trimming heuristics rework II

Change-Id: I9cf63be7967a012559e27dc4ca950bb28b8ccd3b

  1. … 15 more files in changeset.
Changes to support OSS poc.

This checkin contains multiple changes that were added to support OSS poc.

These changes are enabled through a special cqd mode_special_4 and not

yet externalized for general use.

A separate spec contains details of these changes.

These changes have been contributed and pre-reviewed by Suresh, Jim C,

Ravisha, Mike H, Selva and Khaled.

All dev regressions have been run and passed.

Change-Id: I2281c1b4ce7e7e6a251bbea3bf6dc391168f3ca3

  1. … 143 more files in changeset.
Move three global variables into opDefauls.

Change-Id: I1588c746e1a61418c19d4ded13f5c201c581d0bd

  1. … 13 more files in changeset.
Security changes to support authorization

Added support for authorization commands:

- initialize authorization [, drop]

- create/drop roles

- register/unregister components

- create/drop component operations

- grant/revoke object privileges

- grant/revoke role privileges

- grant/revoke component privileges

- updates to GET and SHOWDDL statements

- checking of privileges for DML requests

- checking of privileges for DDL requests

- regression tests added to catman1 library

Fixed a testware problem in catman1 TEST135 and TEST139

Fixed a parser problem introduced by compGeneral/TESTTOK2 which was recently

introduced.

More details:

This delivery was part of code worked on by many people for several

months on a remote branch. This team held bi-weekly meetings

for several months to design and implement these features. These

meetings also included extensive code reviews.

The security features which include authentication (which was delivered

in June) and authorization is turned off by default. The

traf_authentication_setup script located in $MY_SQROOT/sql/scripts needs

to be run to enable both authentication and authorization. This

procedure is described on the Trafodion Twiki page and will be updated once this

delivery completed to include authorzation.

Delivery updates:

Updated traf_authentication_setup to return consistent error messages

and added a comment to ComSmallDefs.h to address a buf size issue for

metadata tables.

Change-Id: I896f1ee006590284653b2c9882901c05b5f2ba22

  1. … 100 more files in changeset.
changes to embedded compiler logic to fix compiler GUI debugger feature

The compiler GUI debugger feature was affected by changes to enable the

embedded compiler. Recent behavior in the compiler GUI debugger

include displaying internal queries when displaying a user query

and occasionally crashing the sqlci process. The changes to fix this

issue is to ensure that the GUI debugger is only enabled when the

(compiler) context of the user query is the active context. Another change is

to ensure that the GUI debugger is disabled and related structures are

appropriately reset when the user queries fails during compilation.

fixes to the rentrant compiler logic to support spjs so external

compilers are avoided when compiling "call" statements. Changes

were made to file CmpSeabaseDDLtable.cpp method

CmpSeabaseDDL::getSeabaseRoutineDesc.

Moved the global DisplayGraph to CmpStatement class for

readability and maintainability.

Closes-Bug: #1340960

Change-Id: Ia75f889c2ed24efabc4501432cc6b5706094b3e6

Closes-Bug: #1339205

  1. … 17 more files in changeset.
Merge "Squashed commit of the following:"

  1. … 1 more file in changeset.
Squashed commit of the following:

commit 1b8106079000418f4afa6ce0e247e81f3f5b2e2c

Merge: 7b7c311 e79bbdf

Author: Justin Du <justin.du@hp.com>

Date: Thu Jun 26 13:41:50 2014 -0800

Merge remote branch 'gerrit/master' into bp/reentrant_cmp

Conflicts:

sql/common/arkcmp_proc.cpp

Change-Id: I56a9cd33544e863c233a1b4e6a811bfb15efe27e

commit 7b7c3110c55f837672240f25dd5496540212ba17

Author: Justin Du <justin.du@hp.com>

Date: Mon Jun 23 13:33:27 2014 -0800

Expect file changes for executor/TEST013

To report the expected error (8193) when the schema name has reserved

word as perfix for either types of statements.

Change-Id: I8ce1854afa1d916eecd01c70d27a89857fb6c2a9

commit 376666657d99b200fbcf8cd5e177096619d44c0e

Author: Justin Du <justin.du@hp.com>

Date: Thu Jun 19 13:03:06 2014 -0800

Pass diags info between CmpContext instances

1) Preserve the diags info when restoring cqds and controls after meta

data access.

2) Pass diags info from current CmpContext to the previous CmpContext at

the CmpContext switch back call.

Change-Id: Ibac9ff19c82f8dc17f278ec4327f39837085503e

commit b22407a2b78ff8c2b7f1ea14c908833fe8e98540

Author: Justin Du <justin.du@hp.com>

Date: Fri Jun 13 17:08:24 2014 -0800

Rework after review

Added warning (SQLCODE 2032) and assert when improperly using CmpContext

switch logic

Change-Id: I32e6e4f2168e3cf52dc58ec85da1555f4fd29051

commit 2870519283471f6b3c5ea844875546b0dda32a04

Author: Justin Du <justin.du@hp.com>

Date: Thu Jun 12 10:14:12 2014 -0800

Bug fixes related to CmpContext switch

1. CmpContext switch takes place only if the embedded compiler is

involved.

2. Fix for TMF error 75 (process doesn't have the active transaction)

see in DDL operations.

Change-Id: Icdca929c4c782464e1f8267d497bf91518a8b3a1

commit 1cebd4b2c6134097a2b4bd419b00fdee71d93f1f

Merge: 8145ba5 0862f91

Author: Justin Du <justin.du@hp.com>

Date: Tue Jun 10 11:21:11 2014 -0800

Merge remote branch 'gerrit/master' into bp/reentrant_cmp

Conflicts:

sql/arkcmp/CmpStatement.cpp

Change-Id: Iba12fa3fb64809f9c9393d06db0215d7bbf6ee7d

commit 8145ba58927d0d954b68183031747a77367e000f

Author: Justin Du <justin.du@hp.com>

Date: Tue Jun 10 10:12:59 2014 -0800

Associate global empty input LogProp with CmpContext

1. Changed GLOBAL_EMPTY_INPUT_LOGPROP as alias (via #define) instead of

thread pointer to access the default input LogProp from curent

CmpContext.

2. Restored recursion counter for embedded compiler so that histogram

access is done by external compiler

Change-Id: I32560e3b0b1dfe5fc2b1ee9a839d75bfdb57fa9a

commit 80b36c32a0db1cf0dc5ae5cead2a04522da1231e

Author: Justin Du <justin.du@hp.com>

Date: Mon Jun 2 20:58:47 2014 -0800

First set of recursive compilation with CmpContext switch

1. Disable recursion counter when entering compileDirect

2. Starting or reuse CmpContext for metadata access during compilation

3. Fixed few problems in CmpContext switching code.

Change-Id: Iff66309319e989a247b80d92ea8e3e32a35e1755

Change-Id: Ib096a46288a8616fc26c9f52474873778e76ed8b

  1. … 18 more files in changeset.
Fix for query cache issue for Hive selects.

Two changes

1)Fix for bug #1293816.

2)Discontinue linking in libprotobuf.so, since it is currently unused.

For the hive query cache bug, the issue was that any change to a HDFS file

in a Hive directory, or to the directory itself (add/drop a file), was not

reflected in the query cache key. So the compiler could give a plan with an

incorrect list of HDFS files to the HDFSScan operator. The fix is to add

max(fileInfo.mLastMod) to the query cache key. The max is taken over all

files for a given Hive table. The number of files for a given Hive table has

also been added to the query cache key to cover cases where a file is deleted

from a Hive directory. Both query cache and query text cache are addressed.

The mLastMod time for each file and the number of files are determined through

the libHdfs call hdfsListDirectory(), which we already make.

Linking in libprotobuf.so is causing issues on certain MapR clusters since

MapR also uses this library and sometimes the version used by MapR is

different from what Trafodion uses. Since this library is not being used by

Trafodion stack right now, we will no longer link in this library in SQL or

connectivity layers. When a fix is found for the version incompatibility

issue, this change will be reversed.

Patche Set 2.

Thank you Dave for catching these issues. They have been resolved in

Patch Set 2.

Change-Id: Idbe599a876fdcaf77d2bdb9fdbf4b77a3f431e46

  1. … 10 more files in changeset.
Code Drop Update - 5/23/14

Change-Id: If478e8857cbfa9652227af7ed83cd61dd075a889

  1. … 163 more files in changeset.
Initial code drop of Trafodion

  1. … 4886 more files in changeset.