Costing and statistics compiler interfaces for UDFs blueprint cmp-tmudf-compile-time-interface bug 1433192
This change adds compiler interfaces for UDFs that give information about statistics of the result table and also a cost estimate. It also has more code for the upcoming Java UDF feature, retrieving updated invocation infos and returning them back to the executor/compiler C++ code.
Description of the changes in more detail:
- Addressed remaining review comments from my last checkin, https://review.trafodion.org/1655 - Make sure that user-generated exceptions during deallocation of a routine are reported. These happens in the destructor of the object derived from tmudr::UDR. For Java, we may need a deallocate method. - Java and JNI code to serialize the updated UDRInvocationInfo and UDRPlanInfo object after calling the user code and return them back through the JNI interface to the calling C++ code. - The cost method source files had some inline methods defined in the .cpp file and used an include file that included other .cpp files. Make didn't pick up changes made in these files. Removed this code and changed it to regular methods and inlines. - Replaced some Context * parameters in costing with PlanWorkSpace *, to be able to get to UDF-related info that's stored in a special PlanWorkSpace. - Changed the behavior or isBigMemoryOperator() for TMUDFs. If the UDF writer specifies the DoP for the UDF invocation, then consider it a BMO. - If possible, synthesize the HASH2 partitioning function of a TMUDF's child as the partitioning function of the UDF. This can be done if the partitioning key gets passed through the UDF. - Statistics interface for TMUDFs: - TMUDF now populates statistics field in the UDRInvocationInfo object and calls the describeStatistics() method. - Added an estimated # of partitions for partitioned input tables of TMUDFs. Also changed row count methods to "estimated" row count. - Added code to incorporate the information on row count and UEC provided by the UDF writer into statistics of the TMUDF. This code is not that suitable for coding it as the default implementation of describeStatistics(). Therefore, the default implementation of describeStatistics() does nothing, but the compiler applies some heuristics in case the UDF writer provides no statistics. - Changed cost method for TMUDFs to incorporate an estimated cost per row from the UDF writer. There is no special compiler interface call to ask for the cost, it can be set from the describeDesiredDegreeOfParallelism() call and, once supported, from the describePlanProperties() call. Note that we don't have immediate plans to support describePlanProperties(), that might come after 2.0.
Patch Set 3: Addressed Dave's review comments. Patch Set 4: Fixed misplaced copyright in expected file.
Using the language manager for UDF compiler interface blueprint cmp-tmudf-compile-time-interface
This change includes new CLI calls, to be used in the compiler to invoke routines. Right now, only trusted routines are supported, executed in the same process as the caller, but in the future we may extend this to isolated routines. Using a CLI call allows us to share the language manager between compiler and executor, since language manager resources such as the JVM and loaded DLLs exist only once per process. This change is in preparation for Java UDFs.
Changes in a bit more detail:
- Added 4 new CLI calls to allocate a routine, invoke it, retrieve updated invocation and plan infos and deallocate (put) the routine. The CLI globals now have a C/C++ and a Java language manager that is allocated on demand. - The compiler no longer loads a DLL for the UDF compiler interface, it uses the new CLI calls instead. - DDL syntax is changed to allow TMUDFs in Java (not officially supported, so don't use it quite yet). - TMUDFs in C are no longer supported, only C++ and Java are. Converted remaining TMUDF tests to C++. - C++ TMUDFs now do a basic verification at DDL time, so errors like missing entry points are detected earlier. Validation for Java TMUDFs is also done through the CLI. - Make sure we have no memory or resource leaks: - CmpContext keeps track of UDF-related objects allocated on system heap and in the CLI, cleaned up at the end of a statement - CLI keeps a list of allocated trusted routines, cleaned up when a CLI context is deallocated - Using ExeCliInterface class to make the new CLI calls (4 new calls added). - Removed CmpCli class in the optimizer directory and converted tracking compiler to use ExeCliInterface as well. - Compile-time parameter values are no longer baked into the UDRInvocationInfo. Instead, they are provided as an input row, the same way as they are provided at runtime. - Bug fixes in C++ UDR code, mostly related to serialization and to multiple interactions with the UDF through serialized objects. - Added more info to UDRInvocationInfo (SQL access type, etc.). - Since there are multiple plans per invocation, each of which can have multiple interactions with the UDF, plans need to be numbered so the UDF side can tell them apart to attach the right state (owned by the UDF) to it. - The language manager needs some functions that are provided by the process it's running in. Added those (empty, for now) functions as cli/CliImplLmExtFunc.cpp. - Added a new class for Java TMUDFs, LmRoutineJavaObj. Added methods to allocate such routines and to load their class as well as to create Java objects by invoking the default constructor through JNI. - Java TMUDFs use the new UDR interface (to be provided by Suresh and Pavani). In the language manager, the container is the class of the UDF, the external path is the fully qualified jar name. The Java method name is <init>, the default constructor, with signature "()V". Some code changes were required to do this. - Created a new directory trafodion/core/sql/src for Java sources in the sql engine. Right now, only language manager java sources are in this directory, but I am planning to move the other java sources under sql in a future checkin. Suresh and Pavani will add their UDF-related Java files there as well. - Renamed the udr jar to trafodion-sql-<version>.jar, in anticipation of combining all the sql Java sources into this jar. - Created a maven project file trafodion/core/sql/pom.xml and changed makefiles to invoke maven to build java sources. - More work to separate new UDR interface from older SPInfo object, so that we can get rid of SPInfo if/when we don't support the older style anymore. - Small fix to odb makefile, make clean failed when executed twice.
Patch set 2: Adding a custom filter for test regress/udr/TEST108.