Clone Tools
Constraints: committers
Constraints: files
Constraints: dates
Costing and statistics compiler interfaces for UDFs

blueprint cmp-tmudf-compile-time-interface

bug 1433192

This change adds compiler interfaces for UDFs that give information

about statistics of the result table and also a cost estimate. It also

has more code for the upcoming Java UDF feature, retrieving updated

invocation infos and returning them back to the executor/compiler C++


Description of the changes in more detail:

- Addressed remaining review comments from my last checkin,


- Make sure that user-generated exceptions during deallocation of

a routine are reported. These happens in the destructor of the

object derived from tmudr::UDR. For Java, we may need a deallocate


- Java and JNI code to serialize the updated UDRInvocationInfo and

UDRPlanInfo object after calling the user code and return them back

through the JNI interface to the calling C++ code.

- The cost method source files had some inline methods defined in

the .cpp file and used an include file that included other .cpp

files. Make didn't pick up changes made in these files. Removed

this code and changed it to regular methods and inlines.

- Replaced some Context * parameters in costing with PlanWorkSpace *,

to be able to get to UDF-related info that's stored in a special


- Changed the behavior or isBigMemoryOperator() for TMUDFs. If the

UDF writer specifies the DoP for the UDF invocation, then consider

it a BMO.

- If possible, synthesize the HASH2 partitioning function of a TMUDF's

child as the partitioning function of the UDF. This can be done if

the partitioning key gets passed through the UDF.

- Statistics interface for TMUDFs:

- TMUDF now populates statistics field in the UDRInvocationInfo

object and calls the describeStatistics() method.

- Added an estimated # of partitions for partitioned input tables

of TMUDFs. Also changed row count methods to "estimated" row count.

- Added code to incorporate the information on row count and UEC

provided by the UDF writer into statistics of the TMUDF. This code

is not that suitable for coding it as the default implementation

of describeStatistics(). Therefore, the default implementation of

describeStatistics() does nothing, but the compiler applies some

heuristics in case the UDF writer provides no statistics.

- Changed cost method for TMUDFs to incorporate an estimated cost

per row from the UDF writer. There is no special compiler interface

call to ask for the cost, it can be set from the

describeDesiredDegreeOfParallelism() call and, once supported, from

the describePlanProperties() call. Note that we don't have immediate

plans to support describePlanProperties(), that might come after 2.0.

Patch Set 3: Addressed Dave's review comments.

Patch Set 4: Fixed misplaced copyright in expected file.

Change-Id: Ia9ae076b7ae1fc2968c3d253d6d2d0e1d9a2ea40

  1. … 45 more files in changeset.
Using the language manager for UDF compiler interface

blueprint cmp-tmudf-compile-time-interface

This change includes new CLI calls, to be used in the compiler to

invoke routines. Right now, only trusted routines are supported,

executed in the same process as the caller, but in the future we may

extend this to isolated routines. Using a CLI call allows us to share

the language manager between compiler and executor, since language

manager resources such as the JVM and loaded DLLs exist only once per

process. This change is in preparation for Java UDFs.

Changes in a bit more detail:

- Added 4 new CLI calls to allocate a routine, invoke it, retrieve

updated invocation and plan infos and deallocate (put) the routine.

The CLI globals now have a C/C++ and a Java language manager that

is allocated on demand.

- The compiler no longer loads a DLL for the UDF compiler interface,

it uses the new CLI calls instead.

- DDL syntax is changed to allow TMUDFs in Java (not officially

supported, so don't use it quite yet).

- TMUDFs in C are no longer supported, only C++ and Java are.

Converted remaining TMUDF tests to C++.

- C++ TMUDFs now do a basic verification at DDL time, so errors

like missing entry points are detected earlier. Validation for

Java TMUDFs is also done through the CLI.

- Make sure we have no memory or resource leaks:

- CmpContext keeps track of UDF-related objects allocated on

system heap and in the CLI, cleaned up at the end of a statement

- CLI keeps a list of allocated trusted routines, cleaned up

when a CLI context is deallocated

- Using ExeCliInterface class to make the new CLI calls (4 new calls


- Removed CmpCli class in the optimizer directory and converted

tracking compiler to use ExeCliInterface as well.

- Compile-time parameter values are no longer baked into the

UDRInvocationInfo. Instead, they are provided as an input row, the

same way as they are provided at runtime.

- Bug fixes in C++ UDR code, mostly related to serialization and

to multiple interactions with the UDF through serialized objects.

- Added more info to UDRInvocationInfo (SQL access type, etc.).

- Since there are multiple plans per invocation, each of which

can have multiple interactions with the UDF, plans need to be

numbered so the UDF side can tell them apart to attach the

right state (owned by the UDF) to it.

- The language manager needs some functions that are provided by

the process it's running in. Added those (empty, for now) functions

as cli/CliImplLmExtFunc.cpp.

- Added a new class for Java TMUDFs, LmRoutineJavaObj. Added methods

to allocate such routines and to load their class as well as to

create Java objects by invoking the default constructor through JNI.

- Java TMUDFs use the new UDR interface (to be provided by Suresh and

Pavani). In the language manager, the container is the class of

the UDF, the external path is the fully qualified jar name. The

Java method name is <init>, the default constructor, with signature

"()V". Some code changes were required to do this.

- Created a new directory trafodion/core/sql/src for Java sources in

the sql engine. Right now, only language manager java

sources are in this directory, but I am planning to move the other

java sources under sql in a future checkin. Suresh and Pavani

will add their UDF-related Java files there as well.

- Renamed the udr jar to trafodion-sql-<version>.jar, in anticipation

of combining all the sql Java sources into this jar.

- Created a maven project file trafodion/core/sql/pom.xml and

changed makefiles to invoke maven to build java sources.

- More work to separate new UDR interface from older SPInfo object,

so that we can get rid of SPInfo if/when we don't support the older

style anymore.

- Small fix to odb makefile, make clean failed when executed twice.

Patch set 2: Adding a custom filter for test regress/udr/TEST108.

Change-Id: Ic827a42ac25505fb1ee451b79636c0f9349d8841

  1. … 98 more files in changeset.
TIMESERIES UDF for repository queries and UDF bug fixes

Bug fixes:

bug 1436593 TMUDF: getScale() returns a wrong scale for the TIME column

bug 1400812 Name resolution for predefined table mapping functions may need to be improved

bug 1436963 TMUDF: Unsigned numeric is mapped to TypeInfo::NUMERIC

bug 1436450 TMUDF: copyPassThruData() fails to pad nchar data properly

Added a predefined UDF to do timeseries queries. "Predefined" means that

like a built-in function it is not registered in the metadata. It is still

a UDF, though, using the SDK for UDFs.

Here is how to invoke the UDF:

select ...

from udf(timeseries(table(select ...

from ...

[partition by ...]

order by <tscol>),

<name of time slice column>,

<time slice width>

[ { , <col name>, <instr> } ... ]



<tscol> is a date, time or timestamp column from the input table

that describes the time dimension of the data. The data

can optionally be partitioned into multiple time series

that are independent of each other, using a PARTITION BY.

<name of time slice column> is the name of the generated output

column that contains the starting time of each time slice.

<time slice width> is an interval literal that determines how wide

each time slice is.

An optional list of pairs of <col name> and <instr> indicates

column values to be interpolated, according to the instructions.


Instruction Value at Interpolation Ignore nulls

----------- --------- ------------- ------------

FC beginning constant no

LC end constant no

FCI beginning constant yes

LCI end constant yes

FL beginning linear no

LL end linear no

FLI beginning linear yes

LLI end linear yes


select *

from udf(timeseries(table(select cust_id,



from e_meters

partition by cust_id

order by tstamp),

'HOURLY_READING', -- name of time slice column

interval '1' hour, -- time slice width

'KWH', 'FL', -- value of KWH column at beginning

-- of time slice, use linear

-- interpolation

'KWH', 'LCI')); -- end value, constant interpolation,

-- ignore NULL values

This will chop the time range of each customer into time slices,

1 hour wide, and will use linear interpolation for the meter readings

(assume we have readings for cust1 at 8:00 for 1000 and 10:30 for 1002

and readings for cust2 at 8:00 for 400 and at 9:30 with a NULL value).


------- ------------------- -------- -------

cust1 2015-03-20 08:00:00 1000.00 1000

cust1 2015-03-20 09:00:00 1000.80 1000

cust1 2015-03-20 10:00:00 1001.60 1002

cust2 2015-03-20 08:00:00 400.00 400

cust2 2015-03-20 09:00:00 ? 400

Other changes:

- Added DCS gui support to install_local_hadoop. If installed with

non-standard ports, see file $MY_SQROOT/sql/scripts/swurls.html

for the port numbers to use. I would recommend that you bookmark

this file in the browser you are using locally on your workstation.

- Addressed comments made by Dave B. in earlier checkins:

- Make error message 3286 more easy to understand.

- Change name resolution rules for predefined UDRs such that

real (user-defined) UDRs take precedence.

- Add comments to Trafodion engine files where some logic

is duplicated in the UDR SDK (file sqludr/sqludr.cpp and

in the future the equivalent Java file).

- Fix for "orphan entries in up queue" assert when canceling

a TMUDF while it is still reading data from its table-valued


Patch set 2: The jenkins build flagged some warnings as errors

that were not flagged on my workstation.

Change-Id: I1b806e35e2b2e91a42318fbbfd788e92d8cba070

  1. … 22 more files in changeset.
cli api to store explain in repository and few more changes.

-- new api to stored packed explain in repository


-- storage of packed explain in encoded form to

mask out any null characters. This helps with

treating of explain data as a string.

-- bug fix in cleanup code

-- fix to handle input param values greater than short max.

-- test code in sqlci frontend to test explain get/store apis.

-- fixed udr/TEST001

Change-Id: I1d501cb93eb7d32808405863afdbdb5c1cec79ee

  1. … 23 more files in changeset.
Fixes for a few scalar UDF bugs

LP 1426605: change in NormRelExpr.cpp. When left linearizing a join backbone

sufficients inputs were not being provided. The change ensures that inputs

from the old tree and still marked as required inputs for a node in the new


LP 1420530: Error handling added to BiArith::bindNode.

LP 1420938: Error handling to CREATE FUNCTION statement to flag more than 32


LP 1421438: showddl [function | procedure | table_mapping function] <name>;

now works. If one of the optional tokens is not specified then we will look

for a table called <name>.

Patch Set 1

Changes to address comments by Dave.

One more fix in ExUdr.cpp. There is no LP for this bug. If a dll is missing

at runtime or other LOAD errors during UDF fixup could lead to an assertion,

since we try to place an error in UDF's up queue, before there are entries

in the corresponding down queue. Fix is to remove this line and let existing

error handling report this error. Thanks for your help Hans.

Couple of items that I forgot to mention before

1) Changes in Analyzer.cpp related to printing predecessorJBBC are due to Hans.

2) Showddl code is mostly refactored from previous versions.

Change-Id: Idfde89d73c47735c4405befa6b9cdd4ae0d2e641

  1. … 14 more files in changeset.
Fixes for TMUDF bugs

Bugs fixed in this change:

bug 1430034 TMUDF: processData() fails to handle several data type

bug 1430438 TMUDF: A TMUDF with nonexisting external name crashes

sqlci with a core

bug 1430453 TMUDF: A TMUDF missing 'language cpp' crashes sqlci with a core

bug 1430484 TMUDF: User defined error number for UDRException() not

returned at run time

bug 1404053 Core with SIGSEGV during CREATE PROCEDURE statement when

procedure validation fails

Summary of changes:

- Support for more data types, including char types with character set UCS2

and interval types. Unlike in other types of UDFs, intervals are

represented as integers, but that representation is not directly

exposed to the UDF writer. CLOBs and BLOBs are only supported when

they are mapped to VARCHARS.

- Support to get and set values by specifying a C++ data type of time_t

for datetime and certain interval values.

- The default language for creating UDFs is changing. It used to be

C, the new rule depends on the type of the library. If we can determine

the library to be a jar file, the default language is Java. For

other libraries, the default is C for scalar UDFs, C++ for table-valued

UDFs. Also the PARAMETER STYLE clause is no longer used for TMUDFs, the

parameter style is computed automatically for now. PARAMETER STYLE is

now optional for Java stored procedures.

- Error handling is somewhat improved, fewer internal errors occur when

a UDR raises an exception. Still needs work.

- Making interfaces of OrderInfo and PartitionInfo more similar

by giving them similar methods.

Change-Id: I0bd0cebcf32f2185907c7d3573d4a511b17ead3e

  1. … 36 more files in changeset.
Normalizer interface for TMUDFs, blueprint cmp-tmudf-compile-time-interface

Added new compiler interfaces:

describeParamsAndColumns has been extended to allow updating

PARTITION BY and ORDER BY clauses specified for input tables.

describeDataFlowAndPredicates() allows the TMUDF to eliminate

columns not needed by the query and to push predicates into

the TMUDF or to the children.

describeConstraints() allows the TMUDF to see cardinality and

uniqueness constraints of the table-valued inputs (children) and

to synthesize cardinality and uniqueness constraints on the

TMUDF result.

TMUDFs now have 3 function types:

GENERIC - makes most conservative assumptions in the compiler

MAPPER - assumes TMUDF carries no state between input rows

REDUCER - assumes TMUDF carries no state between input partitions

defined by PARTITION BY clause

Query id and user id are now available to the UDR.

Added doxygen documentation for the C++ UDR interface. The resulting

web page will be published on the wiki. To generate the documentation

yourself, do the following:

cd $MY_SQROOT/../sql/sqludr

doxygen doxygen_tmudr.1.6.config

# now open tmudr_1.0/html/index.html in a web browser

Patch set 2: Updated copyrights in 2 files.

Change-Id: I3735eb3dd7e5292ba308ac00332fdebdf66a7472

  1. … 25 more files in changeset.
C++ run-time interface for TMUDFs

blueprint cmp-tmudf-compile-time-interface

- Support for C++ run-time interface:

- A new language, C++ is added to langman, the existing

LanguageManagerC handles both C and C++

- Two new parameter styles got added, C++ and Java

object-oriented parameter styles. Routines written in C++

use the new object-oriented C++ parameter style. The compiler

interface is only supported for that style (and in the future

for the Java object-oriented style).

- Also added one more compile time interface, the "completeDescription()"

call in the generator. Added logic to extract the UDRPlanInfo of

the optimal plan.

- Changes to UDRInvocationInfo and UDRPlanInfo classes:

- UDRInvocationInfo and UDRPlanInfo objects can now be serialized

and they are added to generated plans, as part of the UDR TDB.

- Split TableInfo into TupleInfo and TableInfo classes. TupleInfo

is now the common base class for describing both parameters and

input/output tables.

- TypeInfo now has offsets for data, null indicator and varchar


- New get<type> and set<type> methods on class TupleInfo, to be

used at compile time for parameters and at runtime for parameters,

input and output tables.

- Added a "call phase" member, to be able to throw exceptions when

certain methods are called at the wrong time (e.g. trying to modify

compile time members at runtime).

- Routine class in langman now has a new subclass, LmRoutineCppObj

and a new method, invokeRoutineMethod, that is used to invoke

the object-oriented methods, requiring UDRInvocationInfo and

UDRPlanInfo as parameters.

- Fixed some executor issues with error handling for UDFs, this is

still not very well supported

- Emitting the EOD row in the UDF is no longer required, and no longer

supported or even possible.

- UDRPlanInfo is now part of the physical properties, so that we

can extract it from the optimal plan.

- Disabling TMUDF as the inner of a nested join - for now.

We might support this "routine join" at a later time.

- regress/udr/TEST001:

- SESSIONIZE_STATIC remains in C, but other TMUDFs are now

rewritten in C++ (the runtime part that was not yet in C++)

- SESSIONIZE_DYNAMIC is now the same as the example on the wiki

- regress/udr/TEST002: Added some tests for event log reader UDF,

but can't add the part that copies a sample log file, since

in Jenkins, we don't have $MY_SQROOT set. Tried the test on my

workstation, though. Steve tells me $MY_SQROOT should be available,

so in a future checkin I'll enable this code again.

- For patch set 2: Removed fix for LP bug 1420539 and addressed

other review comments.

Change-Id: I008ad68a8f25f1aaee94e1c45bbf097a267129bb

  1. … 73 more files in changeset.
Fixed typos in user-visible messages and sample code

Fixed spellings and minor grammar in strings used in

in output, and sample code supplied to users.

One possibly harmful misspelling was found in source code,

file sqf/monitor/linux/montim.cxx used "Contianer_ExitProcess"

where it should have used "Container_ExitProcess".

Five files which could be changed for this reason are being held back

because they are also in open change 875. They will be checked in after

that change is merged to the main branch:






In patch set 2, modified these files as notde by reviewer:







Change-Id: I3761e9e1518ab39415806bba0f4ed93d14e1d41c

  1. … 59 more files in changeset.
TMUDF C++ compiler interface, part of log-reading TMUDF

This is the infrastructure for a new C++ interface for TMUDFs

(table-mapping UDFs). It is used by a new log-reading TMUDF that

is not yet complete, but should be finished in the next few days.

See blueprint cmp-tmudf-compile-time-interface for more info.

Change-Id: I5a74e461462313b6d9722ac0deb21cd16c4b02ce

  1. … 55 more files in changeset.
GET commands for SPJ and UDF

Fix for bug #1322691

These 7 GET statements are now supported.

get procedures [in schema <schema-name>];

get libraries [in schema <schema-name>];

get functions [in schema <schema-name>];

get table_mapping functions [in schema <schema-name>];

get procedures for library <library-name>;

get functions for library <library-name>;

get table_mapping for library <library-name>;

A fix for a problem reported by Anoop where GET TABLES IN SCHEMA <hive-schema-name> did not work is also fixed. The issue was that the schema name was not getting passed through when the internal statement was used to retrieve tables ina hive schema.

Change-Id: I4649a0432ed766504a95cb6da34b0c01881fc1c3

  1. … 11 more files in changeset.
Initial code drop of Trafodion

  1. … 4886 more files in changeset.