parser.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
CTAS fixes for ddl on hive objects

  1. … 26 more files in changeset.
TRAFODION-3086 Traf support for DDL operations on Hive objects

-- Support for TRAFODION-3086. Details in document attached to jira.

Other changes:

-- support for "if not exists", "if exists" clause for create/drop view

-- Support for: truncate T, truncate table T.

-- same as purgedata

-- showddl <tab>, detail

-- unregister hive schema <sch>

-- will unregister all objects in specified schema

-- Support for "if not registered", "if registered" clauss for

register/unregister command.

  1. … 90 more files in changeset.
[TRAFODION-2888] Streamline setjmp/longjmp concepts in Trafodion

First set of changes to streamline setjmp/longjmp.

a) Removed the setjmp in heap management within Trafodion.

b) Removed obsolete code related to No-wait operation concepts in SQL

  1. … 30 more files in changeset.
JIRA TRAFODION-2731 CodeCleanup: Phase 6. Cleanup of obsolete/unused cqds.

  1. … 25 more files in changeset.
TRAFODION-2731 CodeCleanup: Phase 4. Remove legacy/obsolete pragmas

  1. … 392 more files in changeset.
TRAFODION-2731 CodeCleanup: Phase 2: Remove obsolete code

This phase handles the following:

-- removed files:

cli/rtdu.h, rtdu2.h, rtdu.cpp, rtdu.cpp

executor/dmeasql.h

executor/ExMeas.h, ExMeas.cpp

executor/tempfile.h, .cpp

executor/rcb.h

executor/stubs.cpp, stubs2.cpp

exp/srlversion.cpp

exp/exp_space.h

cli/VicKeyValuePair.h

cli/CliDll.cpp

cli/CliStubsStaticBuild.cpp

cli/globalsrlversion.cpp

cli/globalstubs.cpp

cli/sqlciSRLStubs.cpp

cli/test.cpp

cli/privsrlversion.cpp

common/SqlExpDllDefines.h

common/SqlExportDllDefines.h

sqlcat/enum.h

sqlcat/ReadTableDef.h, cpp

sqlcat/readRealArk.h, cpp

sqlshare/catapirequest.*

-- removed defines and code referencing them:

-- NA_STD_NAMESPACE

-- NA_NO_CMPCONTEXT

-- NA_CATMAN_SIM, NA_CATMAN_SIM_FS

-- common/purify.h

-- DONT_USE_MATH_H

-- NT_PORT

-- NA_MSVC

-- NA_NO_FRIENDS_WITH_TEMPLATE_REFS

-- NA_FLEXBUILD

-- removed multiple obsolete sqlci features and syntax:

(report writer, MACL, Help, Simulators, Utils, MXCS mode, Help,

and few others).

-- removed following files in sqlci dir:

CSInterface.h

CharSetConstants.cpp

CharSetConstants.h

MsgCat.cpp

MsgCat.h

MxciEHCallBack.cpp

MxciEHCallBack.h

RWInterface.cpp

RWInterface.h

SqlciCSCmd.cpp

SqlciCSCmd.h

SqlciCSSimulator.cpp

SqlciHelp.cpp

SqlciRWCmd.cpp

SqlciRWCmd.h

SqlciRWSimulator.cpp

SqlciUsage.cpp

SqlciUtil.cpp

SqlciUtil.h

UtilInt.cpp

UtilInt.h

UtilMsg.cpp

UtilMsg.h

immudefs.cpp

immudefs.h

  1. … 85 more files in changeset.
code cleanup, commit #1

  1. … 128 more files in changeset.
Multiple fixes: REPLACE func maxlen, hbase unregister, DDLExpr cleanup

-- REPLACE function max length is no longer limited to 32K

-- HBase object unregister was not working correctly. That is fixed.

-- class DDLExpr has been cleaned up to have one constructor for all cases.

All options are now set using flags instead of constructor params.

-- "showddl <tab>, explain " command now returns an error instead of crash.

  1. … 10 more files in changeset.
TRAFODION-2498 Add support to run hive stmts from traf interface

Syntax:

process hive statement '<string>';

<string>: hive statement starting with create/drop/alter/truncate.

These are the only stmts currently supported.

Ex:

>>process hive statement 'create database trafsch';

will create hive database 'trafsch'

>>process hive statement 'create table trafsch.t (a int)';

will create hive table 't' in hive schema 'trafsch'.

'process hive statment ..' could be issued from any traf interface

(sqlci/trafci/jdbc...)

  1. … 24 more files in changeset.
[TRAFODION-2317] Infrastructure for common subexpressions

This is a first set of changes to allow us to make use of CTEs

(Common Table Expressions) declared in WITH clauses and to create

a temp table for them that is then read multiple times in the query.

This also includes a fix for

[TRAFODION-2280] Need to remove salt columns from uniqueness constraints

Summary of changes:

- Adding a unique statement number in CmpContext

- Moving the execHiveSQL method from the OSIM code to CmpContext

- Adding a list of common subexpressions and their references

to CmpStatement

- Adding the ability to the Hive Truncate operator to drop the

table when the TCB gets deallocated

- Adding the ability to the HDFS scan to compute scan ranges at

runtime. Those are usually determined in the compiler. This is

only supported for simple, non-partitioned, delimited tables.

We need this because we populate the temp table and read from

in in the same statement, without the option of compiling

after we inserted into the temp table.

- Special handling in the MapValueIds node of common subexpressions.

See the comment in MapValueId::preCodeGen().

- Moved the binder code to create a FastExtract node into a new

method FastExtract::makeFastExtractTree(), to be able to call

it from another place.

- MapValueIds no longer looks at the "used by MVQR flag" to determine

the method for VEGRewrite. Instead it checks whether a list of

values has been provided to do this.

- Adding a new method, RelExpr::prepareMeForCSESharing, that is

kind of an "unnormalizer", undoing some of the normalizer

transformations.

- Implementing the steps for common subexpression materializations

described below.

- Adding the ability to suppress the Hive timestamp modification

check when truncating a Hive table

- Adding an optimizer rule to eliminate CommonSubExprRef nodes.

These nodes should not normally survive past the SQO phase, but

if the SQO phase gets interrupted by an exception, that could

happen, since we then fall back to a copy of the tree before

SQO. In the future, we can consider cost-based decision on

what to do with common subexpressions.

- Adding CommonSubExprRef nodes in the parser whenever we expand

a CTE reference.

- Adding cleanup code to the "cleanup obsolete volatile tables"

command that removes obsolete Hive tables used for common

subexpressions.

Other changes contained in this change set:

- Optimization for empty scans, like select * from t where 1=0

This now generates a cardinality constraint with 0 rows, which

can be used later to eliminate parts of a tree.

(file OptLogRelExpr.cpp)

- [TRAFODION-2280] Need to remove salt columns from uniqueness

constraints generated on salted tables.

(file OptLogRelExpr.cpp)

- Got rid of the now meaningless "seamonster" display in EXPLAIN.

(file GenExplain.cpp and misc. expected files)

- Suppress display of "zombies" in the cstat command. Otherwise,

these zombies (marked as <defunct>) prevent Trafodion from

starting, because they are incorrectly considered "orphan"

processes. This could require a reboot when no reboot is necessary.

(file core/sqf/sql/scripts/pstat)

Incomplete list of things left to be done:

- TRAFODION-2316: Hive temp tables are not secure. Use volatile

tables instead.

- TRAFODION-2315: Add heuristics to decide when to use the temp table

approach.

- TRAFODION-2320: Make subquery unnesting work with common subexpressions.

Generated Plans

---------------

The resulting query plan for a query Q with n common

subexpressions CSE1 ... CSEn looks like this:

Root

|

MapValueIds

|

BlockedUnion

/ \

Union Q

/ \

... CTn

/

Union

/ \

CT1 CT2

Each of the CTi variables looks like the following, an

INSERT OVERWRITE TABLE tempi ...

BlockedUnion

/ \

Truncate FastExtract TEMPi

TEMPi |

CSEi

The original query Q has the common subexpressions replaced

with the following:

MapValueIds

|

scan TEMPi

Here is a simple query and its explain:

prepare s from

with cse1 as (select d_date_sk, d_date, d_year, d_dow, d_moy from date_dim)

select x.d_year, y.d_date

from cse1 x join cse1 y on x.d_date_sk = y.d_date_sk

where x.d_moy = 3;

>>explain options 'f' s;

LC RC OP OPERATOR OPT DESCRIPTION CARD

---- ---- ---- -------------------- -------- -------------------- ---------

11 . 12 root 1.46E+005

5 10 11 blocked_union 1.46E+005

7 9 10 merge_join 7.30E+004

8 . 9 sort 1.00E+002

. . 8 hive_scan CSE_TEMP_CSE1_MXID11 1.00E+002

6 . 7 sort 5.00E+001

. . 6 hive_scan CSE_TEMP_CSE1_MXID11 5.00E+001

1 4 5 blocked_union 7.30E+004

2 . 4 hive_insert CSE_TEMP_CSE1_MXID11 7.30E+004

. . 2 hive_scan DATE_DIM 7.30E+004

. . 1 hive_truncate 1.00E+000

--- SQL operation complete.

>>

CQDs to control common subexpressions

-------------------------------------

CSE_FOR_WITH is the master switch.

CQD Value Default Behavior

--------------------- --------- ------- ---------------------------------------

CSE_FOR_WITH OFF Y No change

ON Insert a CommonSubExprRef node in the

tree whenever we reference a CTE

(table defined in a WITH clause)

CSE_USE_TEMP OFF Y Disable creation of temp tables

for common subexpressions

SYSTEM Same as OFF for now

ON Always create a temp table for

common subexpressions where possible

CSE_DEBUG_WARNINGS OFF Y No change

ON Emit diagnostic warnings that show why

we didn't create temp tables for

common subexpressions

CSE_CLEANUP_HIVE_TABLES OFF Y No change

ON Cleanup Hive tables used for CSEs with

the "cleanup obsolete volatile tables"

command

CommonSubExprRef relational operators

-------------------------------------

This is a new RelExpr class that is introduced. It marks the common

subexpressions in a RelExpr tree. This operator remembers the name of

a common subexpression (e.g. the name used in the WITH clause).

Multiple such operators can reference to the same name. Each of

these references has a copy of the tree.

Right now, these operators are created in the parser when we expand a

CTE (Common Table Expression), declared in a WITH clause. If the CTE

is referenced only once, then the CommonSubExprRef operator is removed

in the binder - it also doesn't live up to its name in this case.

The remaining CommonSubExprRef operators keep track of various changes

to their child trees, during the binder and normalizer phases. In

particular, it tracks which predicates are pushed down into the child

tree and which outputs are eliminated.

The CmpStatement object keeps a global list of all the

CommonSubExprRef operators in a statement, so the individual operators

have a way to communicate with their siblings:

- A statement can have zero or more named common subexpressions.

- Each reference to a common subexpression is marked in the RelExpr

tree with a CommonSubExprRef node.

- In the binder and normalizer, common subexpressions are expanded,

meaning that multiple copies of them exist, one copy per

CommonSubExprRef.

- Common subexpressions can reference other common subexpressions,

so they, together with the main query, for a DAG (directed

acyclic graph) of dependencies.

- Note that CTEs declared in a WITH clause but not referenced are

ignored and are not part of the query tree.

In the semantic query optimization phase (SQO), the current code makes

a heuristic decision what to do with common subexpressions - to

evaluate them multiple times (expand) or to create a temporary table

once and read that table multiple times.

If we decide to expand, the action is simple: Remove the

CommonSubExprRef operator from the tree and replace it with its child.

If we decide to create a temp table, things become much more difficult.

We need to do several steps:

- Pick one of the child trees of the CommonSubExprRefs as the one to

materialize.

- Undo any normalization steps that aren't compatible with the other

CommonSubExprRefs. That means pulling out predicates that are not

common among the references and adding back outputs that are

required by other references. If that process fails, we fall back

and expand the expressions.

- Create a temp table tmp.

- Prepare an INSERT OVERWRITE TABLE tmp SELECT * FROM cse tree

that materializes the common subexpression in a table.

- Replace the CommonSubExprRef nodes with scans of the temp table.

- Hook up the insert query tree with the main query, such that it

is executed before the main query starts.

Temporary tables

----------------

At this time, temporary tables are created as Hive tables, with a

fabricated, unique name, including the session id, a unique statement

number within the session, and a unique identifier of the common

subexpression within the statement. The temporary table is created at

compile time. The query plan contains an operator to truncate the

table before populating it. The "temporary" Hive table is dropped when

the executor TCB is deallocated.

Several issues are remaining with this approach:

- If the process exits before executing and deallocating the statement,

the Hive table is not cleaned up.

Solution (TBD): Clean up these tables like we clean up left-over

volatile tables. Both are identified by the session id.

- If the executor runs into memory pressure and deallocates the TCB,

then allocates it again at a later time, the temp table is no longer

there.

Solution (TBD): Use AQR to recompile the query and create a new table.

- Query cache: If we cache a query, multiple queries may end up with

the same temporary table. This works ok as long as these queries are

executed serially, but it fails if both queries are executed at the

same time (e.g. open two cursors and fetch from both, alternating).

Solution (TBD): Add a CQD that disables caching queries with temp tables

for common subexpressions.

In the future we also want to support volatile tables. However, those also

have issues:

- Volatile tables aren't cleaned up until the session ends. If we run

many statements with common subexpressions, that is undesirable. So,

we have a similar cleanup problem as with Hive tables.

- Volatile tables take a relatively long time to create.

- Insert and scan rates on volatile Trafodion tables are slower than

those on Hive tables.

To-do items are marked with "Todo: CSE: " in the code.

  1. … 78 more files in changeset.
Post merge commit. All files here relate to NAList

  1. … 64 more files in changeset.
[TRAFODION-2252] Fix two memory leaks during UPSERT statements

  1. … 2 more files in changeset.
JIRA TRAFODION-2180 enable/externalize non-ansi sql syntax/functionality

Enabled and externalized support for following functionality:

-- RANDOM and RANDOM(seed) functions

-- CEIL (same as CEILING) math function

-- GREATEST(val1, val2), LEAST(val1, val2)

-- return greater or lesser value between val1 and val2

-- MONTHS_BETWEEN(date1, date2)

-- number of months between date1 and date2.

+ve, if date1 > date2. -ve, otherwise

-- LAST_DAY(<date>)

-- date of the last day of the month of <date>

-- NEXT_DAY(<date>, <string>)

-- returns the date of the first weekday named by <string>

that is later than the date <date>

-- TRUNC. Same as DATE_TRUNC

-- TO_NUMBER. Limited support.

-- TO_TIMESTAMP. Limited support.

-- syntax BYTEINT (same as TINYINT)

-- select UNIQUE ... from <tab>. Same as DISTINCT

-- NOT NULL ENABLE syntax in col defn (same as NOT NULL)

-- Removal of all reserved keywords

-- support for 'select * from DUAL'

Other changes:

-- error 8413 to indicate that source data being displayed is in hex

-- "-failed" option to runregr and runallsb.

It will rerun only the tests that have failed

-- removed mode_special_2, mode_special_3, mode_special_5 cqds

-- fixed an issue with hive data modification check

-- alter rename stmt now writes generated object into metadata

-- Infrastructure support for couple of JIRAs.

These have not been enabled as default for this ckin but

developer regressions are run after enabling them.

-- JIRA TRAFODION-2181 Incompatible operations

-- JIRA TRAFODION-2184 Groupby/Orderby extensions

  1. … 86 more files in changeset.
initial support of WITH

  1. … 4 more files in changeset.
Part 1 of updates to licensing info in Trafodion source

Added NOTICE.txt file in root directory per ASF guidelines.

Updated copyright text in one directory (core/sql/sqlcomp)

as a test of a tool to update such text. One or more later

check-ins will take care of the remaining directories.

  1. … 63 more files in changeset.
Move core into subdir to combine repos

  1. … 10768 more files in changeset.
Move core into subdir to combine repos

  1. … 10622 more files in changeset.
Move core into subdir to combine repos

Use: git log --follow -- <file>

to view file history thru renames.

  1. … 10837 more files in changeset.