ExExeUtil.h

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[TRAFODION-2654] Change the location of trafodion-site.xml from $TRAF_HOME/etc to config [TRAFODION-2653] Sort operator loops at times

All HBase client connections from Trafodion uses Trafodion configuration

concept. Any client side property can now be set in trafodion-site.xml.

The scratch files are now open in blocking mode. Hence any scratch file

operation shouldn't return EAGAIN.

The internal table TDDL of table is now created without any table

coprocessors attached to it.

  1. … 23 more files in changeset.
[TRAFODION-2420] RMS enhancements

New counters introduced as part of BMO

InterimRowCount - Counter to keep track intermediate rows

produced by hash join and for intermediate

runs in sort

phase - BMO Phase for hash-join and sort

get statistics for statement s1, options 'cs' now works.

Removed attaching BMO stats to Non-root Paritial and leaf partition hash group by operators.

These operators are non-blocking BMOs.

Introduced 3 phases for hash grby operator to be visible in RMS Stats

options 'sl' now works with get statistics for qid.

The relevant new counters are now part of accumulated statistics type.

Extended the support upto 512 opeartors from 256 operators in RMS

Fixed both Type 2 and mxosrvr code to obtain all the externalized counters in accumualted stats correctly.

  1. … 27 more files in changeset.
[TRAFODION-2584] Add support to register hive objects in traf metadata

Syntax:

register hive table/view [if not exists] <object-name> [cascade]

unregister hive table/view [if exists] <object-name> [cascade]

cascade option: register/unregister all underlying objects that are

part of the specified view

update statistics, grant/revoke, traf views or external table creation

on hive objects will automatically and internally register those objects

in trafodion metadata.

invoke/showddl will show if this object is registered and whether

that registration was internal or explicit.

Get command extensions:

get hive registered tables/view/objects in catalog trafodion;

get hive external tables in catalog trafodion;

Cleanup command extensions:

cleanup metadata command will clean up inconsistent hive objects

(underlying hive object is missing but object is registered or

an external table exists)

cleanup [hive table | hive view] on <object-name>;

Existing hive objects that had implicit or explicit external tables

created prior to this checkin will have no change in behavior.

ObjectUID of those external tables will continue to be used for

relevant operations.

One can drop those external tables and explicitly register them,

or a subsequent operation(upd stats, grant, etc) that needs objectUID will

automatically register them.

minor changes based on review comments of previous checkin

get all objects command on hive metadata no longer fails.

get views on objects return 3-part name that could be used to

differentiate between a traf and hive view.

regress/hive/TEST007 has been extended.

TBD: Add register/unregister privileges

  1. … 52 more files in changeset.
[TRAFODION-2420] RMS enhancements Introduced 2 new counters HeapWM: - "SQL Heap WM" - Tracks the high water mark of the used memory including space buffer and bmo operator memory. It is enough to monitor this counter alone to track the memory utilization of the query.

scrIOTime: "Scr. IO Time" - Tracks the IO time taken to read/write to

the scratch files at BMO operator level

Introduced a 'Single line Format' option to display BMO stats and SE stats

in a single line format to enable exporting the output to Excel easily.

GET STATISTICS FOR QID CURRENT PROGRESS, OPTIONS 'SL'

Removed and cleaned up the code further

Also fixed memory corruption issues seen with GET STATISTICS command.

  1. … 16 more files in changeset.
TRAFODION-2498 Add support to run hive stmts from traf interface

Syntax:

process hive statement '<string>';

<string>: hive statement starting with create/drop/alter/truncate.

These are the only stmts currently supported.

Ex:

>>process hive statement 'create database trafsch';

will create hive database 'trafsch'

>>process hive statement 'create table trafsch.t (a int)';

will create hive table 't' in hive schema 'trafsch'.

'process hive statment ..' could be issued from any traf interface

(sqlci/trafci/jdbc...)

  1. … 24 more files in changeset.
TRAFODION-2492 Support for native hive views, plus few more changes

-- added support for native hive views created externally. These views

can be accessed in a traf sql query.

There are some restrictions as listed below:

-- privileges support has not yet been added. That will be done, as

needed, in the next checkin.

-- usage info for tables contained in a native hive view is not maintained.

-- native hive views cannot be inserted/updated/deleted

-- cannot update stats on a native hive view

-- support for GET statements to retrieve hive data

get tables/views/objects in schema hive.sch;

-- this will return all tables/views in specified schema(hive database)

get tables/views/objects in catalog hive;

-- this will return all tables/views in hive

get schemas in catalog hive;

-- this will return all schemas(databases) that exist in hive

-- GET statements to retrieve hive data now supports optional MATCH clause.

This can be used to filter results.

-- showddl will show if an external table on hive was created implicitly

for internal usage (upd stats, privileges, views) or explicitly by user.

Internal creation of external table now use 'create implicit external table'

syntax and that info is stored in metadata.

-- drop external table will drop that table even if underlying hive table

is missing. Currently it returns an error.

-- at runtime, accessing a missing hive table now returns a better error

message instead of an internal "data mod check failed" message.

-- regress/tools/runallsb has 2 new options:

-basetests (this will run core,compGeneral,executor,seabase,hive)

-othertests (this will run all other tests suites)

-- optimizer/NATable.cpp was incorrectly indented by some checkin from

a few days back. TABs were used instead of spaces which caused code

to become all crooked. That has been fixed.

-- regress/hive/TEST007 has been updated with additional hive view tests

  1. … 26 more files in changeset.
TRAFODION-2175 a user should only see specific schemas/tables that he has privs TRAFODION-1573 Additional GET commands for privileges

This is one of several deliveries to improve GET commands:

- return details about objects that are only visible to the current_user.

- support additional statements

This delivery adds visibility checking to to the following commands:

get users;

get roles;

get users for role <role>;

get roles for user <user>;

It now supports the following GET statements:

get privileges for user <user>;

get privileges for role <role>;

When getting privileges for a user or role, the output is as follows:

<privileges> <object> <optional column name>

<privileges> output is a character string containing granted privileges:

char 1: SELECT_PRIV - "S"

char 2: INSERT_PRIV - "I"

char 3: DELETE_PRIV - "D"

char 4: UPDATE_PRIV - "U"

char 5: USAGE_PRIV - "G"

char 6: REFERENCES_PRIV - "R"

char 7: EXECUTE_PRIV - "E"

If the user/role has no privilege then "-" is specified for the character

Example: get privileges for user userx;

SI--G-E TRAFODION.MYSCH.MYTABLE1

S------ TRAFODION.MYSCH.MYTABLE2 <Column> column1

S--U--- TRAFODION.MYSCH.MYTABLE2 <Column> column2

When getting privileges for a role, only those privileges granted to the role

are returned.

When getting privileges for a user, privileges directly granted to the user

plus privileges granted to any roles associated with the user are returned.

A new regression test - privs1/TEST123 has been added to test these changes.

In addition, unused methods were removed from Get code.

  1. … 4 more files in changeset.
[TRAFODION-2351] Bulk load with log error rows enhancements

LOAD with log error rows failed when ERRORCOUNTER hbase table already exists.

LOAD command now returns the number of error rows skipped.

LOAD with truncate option dumps core when encounters an error.

Similar issue exist with PURGEDATA command too.

  1. … 18 more files in changeset.
Merge remote branch 'origin/master' into empty_lob_work

  1. … 22 more files in changeset.
Changes to support new syntax and implementation empty_blob/empty_clob for insert/updates Changes to support new syntax for updating a lob directly through a lobhandle without scanning the table.This is done via new ExeUtil operator. Changes to add columnname info to LOBMD_ table. This helps in mapping any given lobhandle back to the table and the column. Added new sections to the regression test executor/TEST130 to test these changes

  1. … 40 more files in changeset.
Merge [TRAFODION-2351] PR 876 Bulk load with log error rows enhancements

  1. … 4 more files in changeset.
TRAFODION-1586 Add support for external Traf tables mappedd to HBase tables

This is the initial support for HBase mapped tables.

A doc for external manuals will be added later.

regress/seabase/TEST022 contains various testcases.

  1. … 77 more files in changeset.
[TRAFODION-2351] Bulk load with log error rows enhancements

The LOAD command now outputs as follows:

Load with log error rows into selva.customer select * from hive.hive.customer ;

Task: LOAD Status: Started Object: TRAFODION.SELVA.CUSTOMER

Task: CLEANUP Status: Started Time: 2016-12-11 00:54:37.642

Task: CLEANUP Status: Ended Time: 2016-12-11 00:54:37.672

Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.030

Logging Location: /bulkload/logs/ERR_TRAFODION.SELVA.CUSTOMER_20161211_005437

Task: LOADING DATA Status: Started Time: 2016-12-11 00:54:37.672

Rows Processed: 99997

Task: LOADING DATA Status: Ended Time: 2016-12-11 00:54:58.296

Task: LOADING DATA Status: Ended Elapsed Time: 00:00:20.624

Task: COMPLETION Status: Started Time: 2016-12-11 00:54:58.296

Task: COMPLETION Status: Ended Time: 2016-12-11 00:54:59.521

Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.756

In addition, currently the "LOADING DATA" task status is shown as it happens. In future,

the task status will be shown as it happens for all tasks.

  1. … 13 more files in changeset.
Merge [TRAFODION-2365] PR 850 HBase snapshot concept needs to be contained within HBaseClinet in Trafodion

[Trafodion 2351] HBase snapshot concept needs to be contained within HBaseClient in Trafodion

Moved snapshot creation/deletion from SequenceFileWriter to HBaseClient.

  1. … 14 more files in changeset.
[TRAFODION-2356] HBase snapshot concept needs to be contained within HBaseClient in Trafodion

Moved snapshot creation/deletion from SequenceFileWriter to HBaseClient.

  1. … 13 more files in changeset.
TRAFODION-2363 Add support to return hive MD info in relational format

  1. … 21 more files in changeset.
[TRAFODION-2317] Infrastructure for common subexpressions

This is a first set of changes to allow us to make use of CTEs

(Common Table Expressions) declared in WITH clauses and to create

a temp table for them that is then read multiple times in the query.

This also includes a fix for

[TRAFODION-2280] Need to remove salt columns from uniqueness constraints

Summary of changes:

- Adding a unique statement number in CmpContext

- Moving the execHiveSQL method from the OSIM code to CmpContext

- Adding a list of common subexpressions and their references

to CmpStatement

- Adding the ability to the Hive Truncate operator to drop the

table when the TCB gets deallocated

- Adding the ability to the HDFS scan to compute scan ranges at

runtime. Those are usually determined in the compiler. This is

only supported for simple, non-partitioned, delimited tables.

We need this because we populate the temp table and read from

in in the same statement, without the option of compiling

after we inserted into the temp table.

- Special handling in the MapValueIds node of common subexpressions.

See the comment in MapValueId::preCodeGen().

- Moved the binder code to create a FastExtract node into a new

method FastExtract::makeFastExtractTree(), to be able to call

it from another place.

- MapValueIds no longer looks at the "used by MVQR flag" to determine

the method for VEGRewrite. Instead it checks whether a list of

values has been provided to do this.

- Adding a new method, RelExpr::prepareMeForCSESharing, that is

kind of an "unnormalizer", undoing some of the normalizer

transformations.

- Implementing the steps for common subexpression materializations

described below.

- Adding the ability to suppress the Hive timestamp modification

check when truncating a Hive table

- Adding an optimizer rule to eliminate CommonSubExprRef nodes.

These nodes should not normally survive past the SQO phase, but

if the SQO phase gets interrupted by an exception, that could

happen, since we then fall back to a copy of the tree before

SQO. In the future, we can consider cost-based decision on

what to do with common subexpressions.

- Adding CommonSubExprRef nodes in the parser whenever we expand

a CTE reference.

- Adding cleanup code to the "cleanup obsolete volatile tables"

command that removes obsolete Hive tables used for common

subexpressions.

Other changes contained in this change set:

- Optimization for empty scans, like select * from t where 1=0

This now generates a cardinality constraint with 0 rows, which

can be used later to eliminate parts of a tree.

(file OptLogRelExpr.cpp)

- [TRAFODION-2280] Need to remove salt columns from uniqueness

constraints generated on salted tables.

(file OptLogRelExpr.cpp)

- Got rid of the now meaningless "seamonster" display in EXPLAIN.

(file GenExplain.cpp and misc. expected files)

- Suppress display of "zombies" in the cstat command. Otherwise,

these zombies (marked as <defunct>) prevent Trafodion from

starting, because they are incorrectly considered "orphan"

processes. This could require a reboot when no reboot is necessary.

(file core/sqf/sql/scripts/pstat)

Incomplete list of things left to be done:

- TRAFODION-2316: Hive temp tables are not secure. Use volatile

tables instead.

- TRAFODION-2315: Add heuristics to decide when to use the temp table

approach.

- TRAFODION-2320: Make subquery unnesting work with common subexpressions.

Generated Plans

---------------

The resulting query plan for a query Q with n common

subexpressions CSE1 ... CSEn looks like this:

Root

|

MapValueIds

|

BlockedUnion

/ \

Union Q

/ \

... CTn

/

Union

/ \

CT1 CT2

Each of the CTi variables looks like the following, an

INSERT OVERWRITE TABLE tempi ...

BlockedUnion

/ \

Truncate FastExtract TEMPi

TEMPi |

CSEi

The original query Q has the common subexpressions replaced

with the following:

MapValueIds

|

scan TEMPi

Here is a simple query and its explain:

prepare s from

with cse1 as (select d_date_sk, d_date, d_year, d_dow, d_moy from date_dim)

select x.d_year, y.d_date

from cse1 x join cse1 y on x.d_date_sk = y.d_date_sk

where x.d_moy = 3;

>>explain options 'f' s;

LC RC OP OPERATOR OPT DESCRIPTION CARD

---- ---- ---- -------------------- -------- -------------------- ---------

11 . 12 root 1.46E+005

5 10 11 blocked_union 1.46E+005

7 9 10 merge_join 7.30E+004

8 . 9 sort 1.00E+002

. . 8 hive_scan CSE_TEMP_CSE1_MXID11 1.00E+002

6 . 7 sort 5.00E+001

. . 6 hive_scan CSE_TEMP_CSE1_MXID11 5.00E+001

1 4 5 blocked_union 7.30E+004

2 . 4 hive_insert CSE_TEMP_CSE1_MXID11 7.30E+004

. . 2 hive_scan DATE_DIM 7.30E+004

. . 1 hive_truncate 1.00E+000

--- SQL operation complete.

>>

CQDs to control common subexpressions

-------------------------------------

CSE_FOR_WITH is the master switch.

CQD Value Default Behavior

--------------------- --------- ------- ---------------------------------------

CSE_FOR_WITH OFF Y No change

ON Insert a CommonSubExprRef node in the

tree whenever we reference a CTE

(table defined in a WITH clause)

CSE_USE_TEMP OFF Y Disable creation of temp tables

for common subexpressions

SYSTEM Same as OFF for now

ON Always create a temp table for

common subexpressions where possible

CSE_DEBUG_WARNINGS OFF Y No change

ON Emit diagnostic warnings that show why

we didn't create temp tables for

common subexpressions

CSE_CLEANUP_HIVE_TABLES OFF Y No change

ON Cleanup Hive tables used for CSEs with

the "cleanup obsolete volatile tables"

command

CommonSubExprRef relational operators

-------------------------------------

This is a new RelExpr class that is introduced. It marks the common

subexpressions in a RelExpr tree. This operator remembers the name of

a common subexpression (e.g. the name used in the WITH clause).

Multiple such operators can reference to the same name. Each of

these references has a copy of the tree.

Right now, these operators are created in the parser when we expand a

CTE (Common Table Expression), declared in a WITH clause. If the CTE

is referenced only once, then the CommonSubExprRef operator is removed

in the binder - it also doesn't live up to its name in this case.

The remaining CommonSubExprRef operators keep track of various changes

to their child trees, during the binder and normalizer phases. In

particular, it tracks which predicates are pushed down into the child

tree and which outputs are eliminated.

The CmpStatement object keeps a global list of all the

CommonSubExprRef operators in a statement, so the individual operators

have a way to communicate with their siblings:

- A statement can have zero or more named common subexpressions.

- Each reference to a common subexpression is marked in the RelExpr

tree with a CommonSubExprRef node.

- In the binder and normalizer, common subexpressions are expanded,

meaning that multiple copies of them exist, one copy per

CommonSubExprRef.

- Common subexpressions can reference other common subexpressions,

so they, together with the main query, for a DAG (directed

acyclic graph) of dependencies.

- Note that CTEs declared in a WITH clause but not referenced are

ignored and are not part of the query tree.

In the semantic query optimization phase (SQO), the current code makes

a heuristic decision what to do with common subexpressions - to

evaluate them multiple times (expand) or to create a temporary table

once and read that table multiple times.

If we decide to expand, the action is simple: Remove the

CommonSubExprRef operator from the tree and replace it with its child.

If we decide to create a temp table, things become much more difficult.

We need to do several steps:

- Pick one of the child trees of the CommonSubExprRefs as the one to

materialize.

- Undo any normalization steps that aren't compatible with the other

CommonSubExprRefs. That means pulling out predicates that are not

common among the references and adding back outputs that are

required by other references. If that process fails, we fall back

and expand the expressions.

- Create a temp table tmp.

- Prepare an INSERT OVERWRITE TABLE tmp SELECT * FROM cse tree

that materializes the common subexpression in a table.

- Replace the CommonSubExprRef nodes with scans of the temp table.

- Hook up the insert query tree with the main query, such that it

is executed before the main query starts.

Temporary tables

----------------

At this time, temporary tables are created as Hive tables, with a

fabricated, unique name, including the session id, a unique statement

number within the session, and a unique identifier of the common

subexpression within the statement. The temporary table is created at

compile time. The query plan contains an operator to truncate the

table before populating it. The "temporary" Hive table is dropped when

the executor TCB is deallocated.

Several issues are remaining with this approach:

- If the process exits before executing and deallocating the statement,

the Hive table is not cleaned up.

Solution (TBD): Clean up these tables like we clean up left-over

volatile tables. Both are identified by the session id.

- If the executor runs into memory pressure and deallocates the TCB,

then allocates it again at a later time, the temp table is no longer

there.

Solution (TBD): Use AQR to recompile the query and create a new table.

- Query cache: If we cache a query, multiple queries may end up with

the same temporary table. This works ok as long as these queries are

executed serially, but it fails if both queries are executed at the

same time (e.g. open two cursors and fetch from both, alternating).

Solution (TBD): Add a CQD that disables caching queries with temp tables

for common subexpressions.

In the future we also want to support volatile tables. However, those also

have issues:

- Volatile tables aren't cleaned up until the session ends. If we run

many statements with common subexpressions, that is undesirable. So,

we have a similar cleanup problem as with Hive tables.

- Volatile tables take a relatively long time to create.

- Insert and scan rates on volatile Trafodion tables are slower than

those on Hive tables.

To-do items are marked with "Todo: CSE: " in the code.

  1. … 78 more files in changeset.
[JIRA TRAFODION-2141] Cluster view of hbase stats

-- added column REGION_SERVER to region stats virtual table

-- added cluster view of hbase stats

Syntax: select * from table(cluster stats());

-- added support for 'where' pred in stats queries

Syntax: select * from table(cluster stats()) where schema_name = 'SCH';

  1. … 22 more files in changeset.
Merge remote branch 'origin/master' into lobglobals_fix

Conflicts:

core/sql/cli/Context.cpp

core/sql/exp/ExpLOBaccess.cpp

  1. … 13 more files in changeset.
Changes to move hdfFs handing to the context globals level and remove all disconnects from hdfs from the various components.

  1. … 27 more files in changeset.
JIRA TRAFODION-2022 Some more changes related to hive truncate

-- support for 'TRUNCATE' command for hive tables

-- support for truncate with PARTITION specification

-- separation of traf purgedata and hive truncate code into

different classes and tdbs

-- some cleanup of obsolete older purgedata code

-- new tests added to regress/hive/TEST005

  1. … 23 more files in changeset.
Merge remote branch 'origin/pr/507/head' into merge_507

  1. … 9 more files in changeset.
hive data modification detection: commit #4

  1. … 24 more files in changeset.
Support for external lobs. Data files are stored externally in HDFS and only filehandles and file locations are stored in internal Trafodion tables.

  1. … 25 more files in changeset.
[TRAFODION-1955] JNI optimization at the time of compilation

Removed ByteArrayList class and used array of byte array instead.

Only one JNI to java transistion is done to getRegionStartKeys

and getRegionEndKeys at the time of query compilation

Also fixed an issue with lib_mgmt Makefile.

  1. … 15 more files in changeset.
Various fixes (details below)

-- ddl_transactions cqd is now on by default.

All ddl, where allowed and supported, will run

within one transaction

-- drop schema has been disabled to run within one transaction.

When jira 1948 is fixed, it will be enabled.

-- hdfs scan handles hive col values > 32K

-- pcode handles varchar with len > 32K

-- errors (gaps, unsorted order) for cqd values in nadefaults are

detected without crashing arkcmp.

-- volatile and regular schema drop returns the reason if an error occured

(for ex: name of objects that could not be dropped)

-- dateformat displays european format correctly with a blank

seperator between date and time values

-- dateformat timestamp in usa format displayes time with AM/PM

-- dateformat inside a cast function returns correct values.

  1. … 19 more files in changeset.
support for get command for LOB tables. Support for tracing LOB operations.

  1. … 30 more files in changeset.
JIRA TRAFODION-1798 (ddl xns) and few other fixes, details below.

-- support for sql part of ddl xns. Section 1 of JIRA TRAFODION-1798

-- cqd ddl_transactions to enable or disable ddl xns.

Default is currently off. Once it is tested, it will be turned on.

Dev regressions are run with cqd set to ON

-- get stmts run with read committed to get changes in current xns

-- support for where preds with get stmts

-- scan to pass in transid even if running with read uncommitted access.

This enables rows modified in current xn to be returned.

-- cleanup no longer return multiple duplicate error messages if

objects id is not found.

-- cleanup no longer includes internallay created schemas (_HV_ , _HB_)

during cleanup operations.

-- Correct error msg was not getting returned if an invalid index

existed in table and the same index was created again.

-- init traf, drop md views was giving an error if views didnt exist.

That has been fixed.

-- regressions with -diff option now show original file timestamps

instead of the timestamp when the diff command was run.

  1. … 72 more files in changeset.
Merge remote branch 'origin/master' into lob_work_buffer

Conflicts:

core/sql/regress/executor/EXPECTED130

core/sql/regress/executor/TEST130

  1. … 9 more files in changeset.