Hans Zeller <>
on 01 Jun 15
Costing and statistics compiler interfaces for UDFs
blueprint cmp-tmudf-compile-time-interface
bug 1433192

This change adds compiler interf… Show more
Costing and statistics compiler interfaces for UDFs

blueprint cmp-tmudf-compile-time-interface

bug 1433192

This change adds compiler interfaces for UDFs that give information

about statistics of the result table and also a cost estimate. It also

has more code for the upcoming Java UDF feature, retrieving updated

invocation infos and returning them back to the executor/compiler C++


Description of the changes in more detail:

- Addressed remaining review comments from my last checkin,

- Make sure that user-generated exceptions during deallocation of

 a routine are reported. These happens in the destructor of the

 object derived from tmudr::UDR. For Java, we may need a deallocate


- Java and JNI code to serialize the updated UDRInvocationInfo and

 UDRPlanInfo object after calling the user code and return them back

 through the JNI interface to the calling C++ code.

- The cost method source files had some inline methods defined in

 the .cpp file and used an include file that included other .cpp

 files. Make didn't pick up changes made in these files. Removed

 this code and changed it to regular methods and inlines.

- Replaced some Context * parameters in costing with PlanWorkSpace *,

 to be able to get to UDF-related info that's stored in a special


- Changed the behavior or isBigMemoryOperator() for TMUDFs. If the

 UDF writer specifies the DoP for the UDF invocation, then consider

 it a BMO.

- If possible, synthesize the HASH2 partitioning function of a TMUDF's

 child as the partitioning function of the UDF. This can be done if

 the partitioning key gets passed through the UDF.

- Statistics interface for TMUDFs:

 - TMUDF now populates statistics field in the UDRInvocationInfo

   object and calls the describeStatistics() method.

 - Added an estimated # of partitions for partitioned input tables

   of TMUDFs. Also changed row count methods to "estimated" row count.

 - Added code to incorporate the information on row count and UEC

   provided by the UDF writer into statistics of the TMUDF. This code

   is not that suitable for coding it as the default implementation

   of describeStatistics(). Therefore, the default implementation of

   describeStatistics() does nothing, but the compiler applies some

   heuristics in case the UDF writer provides no statistics.

- Changed cost method for TMUDFs to incorporate an estimated cost

 per row from the UDF writer. There is no special compiler interface

 call to ask for the cost, it can be set from the

 describeDesiredDegreeOfParallelism() call and, once supported, from

 the describePlanProperties() call. Note that we don't have immediate

 plans to support describePlanProperties(), that might come after 2.0.

Patch Set 3: Addressed Dave's review comments.

Patch Set 4: Fixed misplaced copyright in expected file.

Change-Id: Ia9ae076b7ae1fc2968c3d253d6d2d0e1d9a2ea40

Show less

default + 8 more