GenRelScan.cpp

Clone Tools
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
initial support for returning multiple versions and column timestamps

This feature is not yet externalized.

Support added to:

-- return multiple versions of rows

-- select * from table {versions N | MAX | ALL}

-- get hbase timestamp of a column

-- select hbase_timestamp(col) from t;

-- get version number of a column.

-- select hbase_version(col) from t

Change-Id: I37921681fc606a22c19d2c0cb87a35dee5491e1e

  1. … 48 more files in changeset.
Merge "Avoid scanner timeout for Update Statistics"

  1. … 3 more files in changeset.
Move core into subdir to combine repos

  1. … 10768 more files in changeset.
Move core into subdir to combine repos

  1. … 10622 more files in changeset.
Move core into subdir to combine repos

Use: git log --follow -- <file>

to view file history thru renames.

  1. … 10837 more files in changeset.
Avoid scanner timeout for Update Statistics

For performance reasons, Update Stats pushes sampling down into HBase,

using a filter that returns only randomly selected rows. When the

sampling rate is very low, as is the case when the default sampling

protocol (which includes a sample limit of a million rows) is used on

a very large table, a long time can be taken in the region server

before returning to Trafodion, with the resultant risk of an

OutOfOrderScannerNextException. To avoid these timeouts, this fix

reduces the scanner cache size (the number of rows accumulated before

returning) used by a given scan based on the sampling rate. If an

adequate return time can not be achieved in this manner without

going below the scanner cache minimum prescribed by the

HBASE_NUM_CACHE_ROWS_MIN cqd, then the scanner cache reduction is

complemented by a modification of the sampling rate used in HBase.

The sampling rate used in HBase is increased, but the overall rate

is maintained by doing supplementary sampling of the returned rows in

Trafodion. For example, if the original sampling rate is .000001,

and reducing the scanner cache to the minimum still results in an

excessive average time spent in the region server, the sampling

may be split into a .00001 rate in HBase and a .01 rate in Trafodion,

resulting in the same effective .000001 overall rate.

Change-Id: Id05ab5063c2c119c21b5c6c002ba9554501bb4e1

Closes-Bug: #1391271

  1. … 6 more files in changeset.
Enabling Bulk load and Hive Scan error logging/skip feature

Also Fixed the hanging issue with Hive scan (ExHdfsScan operator) when there

is an error in data conversion.

ExHbaseAccessBulkLoadPrepSQTcb was not releasing all the resources when there

is an error or when the last buffer had some rows.

Error logging/skip feature can be enabled in

hive scan using CQDs and in bulk load using the command line options.

For Hive Scan

CQD TRAF_LOAD_CONTINUE_ON_ERROR ‘ON’ to skip errors

CQD TRAF_LOAD_LOG_ERROR_ROWS ‘ON’ to log the error rows in Hdfs files.

For Bulk load

LOAD WITH CONTINUE ON ERROR [TO <location>] – to skip error rows

LOAD WITH LOG ERROR ROWS – to log the error rows in hdfs files.

The default parent error logging directory in hdfs is /bulkload/logs. The error

rows are logged in subdirectory ERR_<date>_<time>. A separate hdfs file is

created for every process/operator involved in the bulk load in this directory.

Error rows in hive scan are logged in

<sourceHiveTableName>_hive_scan_err_<inst_id>

Error rows in bulk upsert are logged in

<destTrafTableName>_traf_upsert_err_<inst_id>

Bulk load can also aborted after a certain number of error rows are seen using

LOAD WITH LOG ERROR ROWS, STOP AFTER <n> ERROR ROWS option

Change-Id: Ief44ebb9ff74b0cef2587705158094165fca07d3

  1. … 33 more files in changeset.
Changes to enable Rowset select - Fix for bug 1423327

HBase always returns an empty result set when the row is not found. Trafodion

is changed to exploit this concept to project no data in a rowset select.

Now optimizer has been enabled to choose a plan involving Rowset Select

where ever possible. This can result in plan changes for the queries -

nested join plan instead of hash join,

vsbb delete instead of delete,

vsbb insert instead of regular insert.

A new CQD HBASE_ROWSET_VSBB_SIZE is now added to control the hbase rowset size.

The default values is 1000

Change-Id: Id76c2e6abe01f2d1a7b6387f917825cac2004081

  1. … 19 more files in changeset.
Trafodion Metadata Cleanup command support.

Various changes to support cleanup command has been added.

A separate external spec contains the details.

Summary of syntax:

cleanup [ table t | index i | sequence s | object o] [, uid <value>]

cleanup [private | shared] schema sch

cleanup uid <value>

cleanup metadata, check, return details

In addition, a new command to get names of various hbase objects

has also been added:

get [ all | user | system | external ] hbase objects;

Change-Id: I93f1f45e7fd78091bacd7c9f166420edd7c1abee

  1. … 79 more files in changeset.
Snapshot Scan changes

The changes in this delivery include:

-decoupling the snapshot scan from the bulk unload feature. Setup of the

temporary space and folders before running the query and cleanup afterwards

used to be done by the bulk unload operator because snapshot scan was specific

to bulk unload. In order the make snapshot scan indepenednt from bulk unload

and use it in any query the setup and cleanup tasks are now done by the query

itself at run time (the scan and root operators).

-caching of the snapshot information in NATable to optimize compilation time

Rework for chaching: when the user sets TRAF_TABLE_SNAPSHOT_SCAN to LATEST

we flush the metadata and then we set the caching back to on so that metadata

get cached again. If newer snapshots are created after setting the cqd they

won't be seen if they are already cached unless the user issue a command/cqd

to invalidate or flush the cache. One way for doing that can be to issue

"cqd TRAF_TABLE_SNAPSHOT_SCAN 'latest';" again

-code cleanup

below is a description of the CQds used with snapshot scan:

TRAF_TABLE_SNAPSHOT_SCAN

this CQD can be set to :

NONE--> (default)Snapshot scan is disabled and regular scan is used ,

SUFFIX --> Snapshot scan is enabled for the bulk unload (bulk unload

behavior is not changed)

LATEST --> Snapshot Scan is enabled independently from bulk unload and

the latest snapshot is used if it exists. If no snapshot exists

the regular scan is used. For this phase of the project the user

needs to create the snapshots using hbase shell or other tools.

And in the next phase of the project new comands to create,

delete and manage snapshots will be add.

TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX

This CQD is used with bulk unload and its value is used to build the

snapshot name as the table name followed by the suffix string

TRAF_TABLE_SNAPSHOT_SCAN_TABLE_SIZE_THRESHOLD

When the estimated table size is below the threshold (in MBs) defined by

this CQD the regular scan is used instead of snapshot scan. This CQD

does not apply to bulk unload which maintains the old behavior

TRAF_TABLE_SNAPSHOT_SCAN_TIMEOUT

The timeout beyond which we give up trying to create the snapshot scanner

TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION

Location for temporary links and files produced by snapshot scan

Change-Id: Ifede88bdf36049bac8452a7522b413fac2205251

  1. … 44 more files in changeset.
Merge "Change to avoid placing large scan results in RegionServer cache"

  1. … 9 more files in changeset.
Remove code and cqds related to Thrift interface

ExpHbaseInterface_Thrift class was removed a few months ago. Completing

that cleanup work. exp/Hbase_types.{cpp,h} still remain. These are Thrift

generated files but we use the structs/classes generated for JNI access.

Change-Id: I7bc2ead6cc8d6025fb38f86fbdf7ed452807c445

  1. … 19 more files in changeset.
Change to avoid placing large scan results in RegionServer cache

By default the result of every Scan and Get request to HBase is placed in

the RegionServer cache. When a scan returns a lot of rows this can lead to

cache thrashing, causing results which are being shared by other queries

to be flushed out. This change uses cardinality estimates and hbase row size

estimates along with the configured size of region server cache size to

determine when such thrashing may occur. Heap size for region server is

specified through a cqd HBASE_REGION_SERVER_MAX_HEAP_SIZE. The units are in MB.

The fraction of this heap allocated to block cache is read from the config

file once per session through a lightweight (no disk access) JNI call. The

hueristic used is approximate as it does not consider total number of region

servers or that sometimes a scan may be concentrated in one or a few region

servers. We simply do not place rows in RS cache, if the memory used by all

rows in a scan will exceed the cache in a single RS. Change can be overridden

with cqd HBASE_CACHE_BLOCKS 'ON'. The default is now SYSTEM. Change applies

to both count(*) coproc plans and regular scans.

Change-Id: I0afc8da44df981c1dffa3a77fb3efe2f806c3af1

  1. … 20 more files in changeset.
Bulk unload optimization using snapshot scan

resubmitting after facing git issues

The changes consist of:

*implementing the snapshot scan optimization in the Trafodion scan operator

*changes to the bulk unload changes to use the new snapshot scan.

*Changes to scripts and permissions (using ACLS)

*Rework based on review

Details:

*Snapshot Scan:

----------------------

**Added support for snapshot scan to Trafodion scan

**The scan expects the hbase snapshots themselves to be created before running

the query. When used with bulk unload the snapshots can created by bulk unload

**The snapshot scan implementation can be used without the bulk-unload. To use

the snapshot scan outside bulk-unload we need to use the below cqds

cqd TRAF_TABLE_SNAPSHOT_SCAN 'on'; --

-- the snapshot name will the table name concatenated with the suffix-string

cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'suffix-string';

-- temp dir needed for the hbase snapshotsca

cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/'; n

**snapshot scan can be used with table scan, index scans etc…

*Bulk unload utility :

-------------------------------

**The bulk unload optimization is due the newly added support for snapshot scan.

By default bulk unload uses the regular scan. But when snapshot scan is

specified it will use snapshot scan instead of regular scan

**To use snapshot scan with Bulk unload we need to specify the new options in

the bulk unload syntax : NEW|EXISTING SNAPHOT HAVING SUFFIX QUOTED_STRING

***using NEW in the above syntax means the bulk unload tool will create new

snapshots while using EXISTING means bulk unload expect the snapshot to

exist already.

***The snapshot names are based on the table names in the select statement. The

snapshot name needs to start with table name and have a suffix QUOTED-STRING

***For example for “unload with NEW SNAPSHOT HAVING SUFFIX ‘SNAP111’ into ‘tmp’

select from cat.sch.table1; “ the unload utiliy will create a snapshot

CAT.SCH.TABLE1_SNAP111; and for “unload with EXISTING SNAPSHOT HAVING SUFFIX

‘SNAP111’ into ‘tmp’ select from cat.sch.table1; “ the unload utility will

expect a snapshot CAT.SCH.TABLE1_SNAP111; to be existing already. Otherwise

an error is produced.

***If this newly added options is not used in the syntax bulk unload will use

the regular scan instead of snapshot scan

**The bulk unload queries the explain plan virtual table to get the list of

Trafodion tables that will be scanned and based on the case it either creates

the snapshots for those tables or verifies if they already exists or not

*Configuration changes

--------------------------------

**Enable ACLs in hdfs

**

*Testing

--------

**All developper regression tests were run and all passed

**bulk unload and snapshot scan were tested on the cluster

*Examples:

**Example of using snapshot scan without bulk unload:

(we need to create the snapshot first )

>>cqd TRAF_TABLE_SNAPSHOT_SCAN 'on';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_SNAP_SUFFIX 'SNAP777';

--- SQL operation complete.

>>cqd TRAF_TABLE_SNAPSHOT_SCAN_TMP_LOCATION '/bulkload/temp_scan_dir/';

--- SQL operation complete.

>>select [first 5] c1,c2 from tt10;

C1 C2

--------------------- --------------------

.00 0

.01 1

.02 2

.03 3

.04 4

--- 5 row(s) selected.

**Example of using snapshot scan with unload:

UNLOAD

WITH PURGEDATA FROM TARGET

NEW SNAPSHOT HAVING SUFFIX 'SNAP778'

INTO '/bulkload/unload_TT14_3' select * from seabase.TT20 ;

Change-Id: Idb1d1807850787c6717ab0aa604dfc9a37f43dce

  1. … 35 more files in changeset.
Reworked fix for LP bug 1404951

The scan cache size for an mdam probe is now set to the hbase default of 100.

Setting it values like 1 or 2 resulted in intermittent failures. The cqd

COMP_BOOL_184 can be set ON to get a cache size of 1 for mdam probes.

Root cause for this intermittent failure will be investigated later.

Change-Id: Ic05a77ecb0deeb260784f156de251a0f0dbdf49c

  1. … 5 more files in changeset.
Fixes for SQL security

LP bugs fixed:

1392805 – DB_ROOT incorrectly gets “NOT AUTHORIZED” messages

1398546 – revoke priv from role fails when view is present

1401233 – USAGE privilege not checked when creating procedure (and

revoking privileges)

1403995 – Update stats failures due to schema PUBLIC_ACCESS_SCHEMA

1401683 – (Partial) DDLoperations see error 8841 about transaction

started by SQL

Regressions updated:

catman1/TEST135 & EXPECTED138

catman1/EXPECTED138 (fix in common/ComUser.cpp)

Bug descriptions:

1392805:

Changed create view code to allow DB__ROOT to create views. Some

reorganization required to make sure create view sets the updatatable

and insertable privilege correctly. This also fixed the problem where

the incorrect privileges were set when created by DB__ROOT.

Sqlcomp/CmpSeabaseDDLview.cpp

Sqlcomp/PrivMgrPrivileges.h (sets default privileges)

1398546:

The check to see if the "select" privilege is still in existence needed

to be changed until after all the privilege descriptors were analyzed.

Sqlcomp/PrivMgrPrivileges.cpp (gatherViewPrivileges)

Sqlcomp/PrivMgrDesc.h

1401233:

Missing checks at create UDR and revoke USAGE privilege were added.

Sqlcomp/CmpSeabaseDDLroutine.cpp

Sqlcomp/PrivMgrMD (getUdrsThatReferenceLibraries)

Sqlcomp/PrivMgrPrivileges.cpp (dealWithUdrs)

1403995:

This is a critical case QA filed because the PUBLIC_ACCESS_SCHEMA does

not exist for temporary sample tables during Update Statistics. If the

PUBLIC_ACCESS_SCHEMA does not exist, the temporary sample table will be

created in the same schema as the source table. Also fixed an issue for

private schemas not owned by DB__ROOT to make the histogram table's

owner the current user.

ustat/hs_cli.cpp

ustat/hs_globals.cpp

1401683:

There are several 8841 issues being detected. This is a fix for one of

them related to Update Statistics where an embedded "get" statement

causes a transaction to be started in a child tdm_arkcmp process. The

fix is to not automatically start a transaction for the get request.

generator/GenRelScan.cpp

Change-Id: Ied42fdea6c6f8c43f29dab661b06b74f0f07ff99

  1. … 14 more files in changeset.
Fix performance regression due to QI for DDL

This check-in omits object UIDs from query plans for the tables

SB_HISTOGRAMS and SB_HISTOGRAMS_INTERVALS. Previously, when the

code generator tried to add the object UIDs for these, it had to

make a special query to the metadata, since the corresponding

internal cached structure omitted object UIDs when they were

created via methods like Generator::createVirtualTableDesc. The

special query to lookup these object UIDs was shown to be

responsible for a large pathlengh regression.

Change-Id: Id5046c5c55a4fc8dd2ba3f891449ea87d35a5534

Closes-Bug: #1398600

  1. … 7 more files in changeset.
Native external hbase table access (select, IUD) changes.

-- IUD on external hbase tables is now enabled by default

-- predicates on native hbase tables can now be pushed down to

hbase region server

-- traf varchar col maxlength is now 200K by default,

can be changed by cqd max_character_col_size

-- executor handles column values length greater than 32K during

move to/from JNI

-- error is correctly returned if data retrieved from hbase exceeds expected

max row length

-- hbase column_create function now takes an expression/param as its

column name operand

Change-Id: Ieb3fcabfebaa22008eff2a049fc1e2000e68861e

  1. … 46 more files in changeset.
Various LP fixes, bugs and code cleanup.

-- removed obsolete code (label create/alter/delete, get disk/label/buffer stats,

dp2 scan)

-- metadata structs are now created as classes and initialized during

creation. LP 1394649

-- warnings are now being returned from compiler to executor after DDL operations.

-- duplicate constraint names now return error.

-- handle NOT ENFORCED constraints: give warning during creation and not enforce

during use. LP 1361784

-- drop all indexes (enabled and disabled indexes) on a table during drop table

and schema now works. LP 1384380

-- drop constraint on disabled index succeeds. LP 1384479

-- string truncation error is now returned if default value doesn't fit in

column. LP 1394780

-- fixed issue where a failure during multiple constraints creation in a create

stmt was not cleaning up metadata. LP 1389871

-- update where current of is now supported. LP 1324679

Change-Id: Iec1b0b4fc6a8161a33b7f69228c0f1e3f441f330

  1. … 54 more files in changeset.
Query Invalidation triggered by DDL, phase 1

This first check-in implements most of the framework which will

be used to complete the QI DDL feature. It redefines the old

security invalidation key (SQL_SIKEY) to handle DDL operations in

addition to REVOKE. In a limited number of DDL operations, the object

UIDs of affected Seabase objects are propagated to all nodes for

use by the compiler to invalidate NATable cache entries, as

well as a limited number of types of cached queries. Later this

month, the framework will be complete by allowing prepared queries

that have already been returned from the compiler to be invalidated.

Then the next step for the framework will be support for invalidating

the HTable cache. Finally an effort will be made to cover all of

the necessary DDL operations and all types of cached queries.

The check-in include a new regression test (executor/TEST122) that

demonstrates the cases that are covered. Specifically, a table will

be dropped and recreated with the same name but different definition

in one sqlci session. In another session, which has already populated

NATable cache and query cache for INSERT, UPDATE, DELETE, SELECT,

SELECT COUNT(*), INVOKE and SHOWDDL statements, those some types

of statements will be resubmitted and correctly compiled.

Change-Id: Ie61ce751089b57ce1894f1764c338e9400bb7b8a

Closes-Bug: #1329358

Implements: blueprint ddl-query-invalidation

  1. … 41 more files in changeset.
Initial changes for ORC file support.

Access to ORC (optimized row columnar) format tables is not enabled by

default yet. This checkin is initial and infrastructure changes for

that support.

Change-Id: I683c1b63c502dd4d2c736181952cb40f9f299cfd

  1. … 53 more files in changeset.
Enabling runtime stats for hbase operators

This is the first set of changes to collect the runtime stats info for

hbase tables and operators. It contains:

1) Populate the estimated row count in hbase access TDB.

2) Collect the hbase access time and accessed row count at the JNI layer

(only for select operations now).

Partially reviewed by Mike H. and Selva G.

Removed the part that devides the estimated rows by number of ESPs based on the comments

Change-Id: I5a98a8ae9c4462aa53ad889edfe4cd8563502477

  1. … 11 more files in changeset.
Changes to support OSS poc.

This checkin contains multiple changes that were added to support OSS poc.

These changes are enabled through a special cqd mode_special_4 and not

yet externalized for general use.

A separate spec contains details of these changes.

These changes have been contributed and pre-reviewed by Suresh, Jim C,

Ravisha, Mike H, Selva and Khaled.

All dev regressions have been run and passed.

Change-Id: I2281c1b4ce7e7e6a251bbea3bf6dc391168f3ca3

  1. … 143 more files in changeset.
Reducing the path length for scan, single row get, batch get operations

Earlier, Key-values per row is buffered as a ByteBuffer and shipped to JNI side.There were 2 C->java transitions to get the Hbase row into tuple format.

Now, we avoid this copy on the Java side and ship the length and offsets of

the Key-Value for all rows in a batch along with reference to the

Key-Value buffer. The data is directly moved from Hbase Buffers to

row format buffer. There are only 2 C->java transitions per batch of row.

This has shown 3-4 times reduction in path length in Trafodion SQL processes

while scanning a table with 12 million rows.

Trafodion SQL processes dump core when ex_assert function is called or when it

exits with non-zero value. These processes will also dump core when an

environment variable ABORT_ON_ERROR is set in release mode.

Fix for seabase/TEST010 failure with the patchSet 1

Rowset Select was dumping core sometimes. A wrong tuple was used to calculate the row length. It caused the proces to dump core sometimes.

Change-Id: I0ec4669b54971b6a0c699c0fa0662c85f68bd25d

  1. … 13 more files in changeset.
Fix for two hive scan issues

1) Hive string columns can be wider in their DDL defnition than their target

column when inserting from hive. With the fix the output of hive scan will

be the narrower target column rather than the original wider hive source

column.

2) option to ignore conversion error during hive/hdfs scan and continue to

next one. The cqd HDFS_READ_CONTINUE_ON_ERROR should be set to ON to enable

this option.

Change-Id: Id8afe497e6c043c4b7bae6d556cee76c42ab0163

  1. … 5 more files in changeset.
Change to automatically set the HBase client side cache size.

Previously the size of HBase Scan cache size was set through a CQD.

With this change this cache size will be set to a value between [min, max]

based of the estimated number of rows accessed per scan node instance.

The values of min and max can be specified through cqds. The default values

are [100, 10000]. The previous default value was 100. The desired effect

of this change is to improve throughput for large scans. Simple experiments

are showing an improvement in the range of to 2-3X.

Patch Set 2:

Set cache size for update and delete too, this will be functional when

card estimate code is added later

Added stubs for estRowsAccess for Update and Delete

Changed name of cqds

Added cache size info to explain output

modularized code by putting logic to set cache size in a method

Patch Set 3:

For scan the number of rows is estimated as MAX(rowsAccessed, maxCardEst).

This is to guard against cardinality underestimations as suggested by Qifan

Change-Id: I5fac104666bd125671c818fc71ddad666f186ad7

  1. … 9 more files in changeset.
Launchpad fixes_2

-- hbase access errors were not being returned during process bringup phase.

arkcmp/CmpContext.*, bin/SqlciErrors.txt

sqlcomp/CmpSeabaseDDLcommon.cpp, CmpSeabaseDDLtable.cpp, nadefaults.cpp

launchpad #1343061

-- showddl now shows hbase options, salted partitions.

-- hbase_options and salt info is kept in metadata.

comexe/ComTdb.h, optimizer/NATable.cpp, NATable.h

parser/sqlparser.y

sqlcat/desc.h, sqlcomp/CmpDescribe.cpp, CmpSeabaseDDLcommon.cpp

CmpSeabaseDDL.h, CmpSeabaseDDLTable.cpp

launchpad #1342465

launchpad #1342996

-- added support for update where current of cursor.

generator/GenRelScan.cpp, GenRelUpdate.cpp,

launchpad #1324679

-- check constrs in update and views

generator/GenRelUpdate.cpp, optimizer/BindRelExpr.cpp

Launchpad #1320034, #1321479

-- long column name returned assertion.

optimizer/ObjectNames.cpp

launchpad #1324689

-- purgedata does not recreate with the original salt and hbase options.

optimizer/RelExeUtil.cpp, sqlcomp/CmpSeabaseDDLcommon.cpp

launchpad #1322400

-- long keys got assertion failure:

optimizer/ValueDesc.cpp

launchpad #1332607

-- cqd hide_indexes was causing constraint creation to fail.

sqlcomp/CmpSeabaseDDLcommon.cpp

launchpad #1340385

-- current_timestamp/time functions are now non-nullable.

Change-Id: Ib3a071894d11d0e3719b98f0cfddfc5ce8624519

  1. … 26 more files in changeset.
Various Launchpad and other fixes.

-- metadata and statistics tables will no longer be created with

serialization attribute even if cqd hbase_serialization is set to ON.

also, reenabled HBASE_SERIALIZATION for regressions run

(sqlcomp/CmpSeabaseDDLcommon.cpp, regress/tools/sbdefs)

-- rowwise hbase rows from native hbase tables are now being created correctly

in all cases.

executor/ExHbaseAccess.*

exp/exp_function.*, optimizer/BindItemExpr.cpp, ItemExpr.cpp

-- IUD and SELECT execution state is now being correctly initialized at the

beginning of a run. Multiple executions were failing otherwise.

(executor/ExHbaseIUD.cpp, ExHbaseSelect.cpp)

-- sign is now allowed in an interval literal

(generator/GenItemFunc.cpp, GenRelScan.cpp, ItemFunc.h, ValueDesc.cpp)

-- location value being returned from updates was not being set correctly in

some cases. That has been fixed

(generator/GenRelUpdate.cpp)

-- self referencing updates were not returning the right values due to

halloween issue. It is fixed by transforming it to insert/delete.

(optimizer/BindRelExpr.cpp)

-- purgedata now returns an error if issued on hbase, hive, neoview tables,

or on a view.

(optimizer/RelExeUtil.cpp, sqlcomp/CmpSeabaseDDLcommon.cpp)

-- referencing and referenced columns in a foreign key are now enforced

to have the same datatype attributes

(sqlcomp/CmpSeabaseDDLtable.cpp)

-- drop schema now works with delimited schema names

-- in some cases, a create constraint failure was not dropping the table on

which the constraint was being created.

that has been fixed.

(sqlcomp/CmpSeabaseDDLtable.cpp)

-- some additional infra changes for Traf as a mysql storage engine.

(cli/*, executor/ExExeUtilCli.*)

Change-Id: I94d5eb13c826efdf44ba10c04ac52a671f86553e

  1. … 23 more files in changeset.
Update Statistics performance improved by sampling in HBase

Update Statistics is much slower on HBase tables than it was for

Seaquest. A recent performance analysis revealed that much of the

deficit is due to the time spent retrieving the data from HBase that

is used to derive the histograms. Typically, Update Statistics uses a

1% random sample of a table’s rows for this purpose. All rows of the

table are retrieved from HBase, and the random selection of which rows

to use is done in Trafodion.

To reduce the number of rows flowing from Hbase to Trafodion for queries

using a SAMPLE clause that specifies random sampling, the sampling logic

was pushed into the HBase layer using a RandomRowFilter, one of the

built-in filtersrovided by HBase. In the typical case of a 1% sample,

this reduces the number of rows passed from HBase to Trafodion by 99%.

After the fix was implemented, Update Stats on various tables was 2 to 4

times faster than before, when using a 1% random sample.

Change-Id: Icd40e4db1dde444dec76165c215596755afae96c

  1. … 13 more files in changeset.
Fix for a few Hive issues

a) Improved Hive file drectory parsing to include a MapR format with no hostName or portNum. The logic to parse the directory name was previously repeated in three places in code, now we have one common place to encapsulate this parsing logic. Most files have been touched due to this refactoring

b) Fix for Launchpad bug #1320344. Hive Delete will now raise an error

c) Fix for Launchpad bug #1320385. Hive Update will now raise an error

d) Hive Merge will also raise an error. A query cache problem described in #1320344 and #1320385 has not been resolved in this delivery.

Patch Set 2:

This patch set addresses the two issues that Hans found. Thank you.

Patch Set 3:

This patch set addresses the two issues that Dave found. Thank you.

Change-Id: I19764227663dbb5c2d410608864a000592ce964a

  1. … 18 more files in changeset.