Clone
Hans Zeller <hans.zeller@hp.com>
committed
on 30 Mar 15
TIMESERIES UDF for repository queries and UDF bug fixes
Bug fixes:

bug 1436593 TMUDF: getScale() returns a wrong scale for the TIME column… Show more
TIMESERIES UDF for repository queries and UDF bug fixes

Bug fixes:

bug 1436593 TMUDF: getScale() returns a wrong scale for the TIME column

bug 1400812 Name resolution for predefined table mapping functions may need to be improved

bug 1436963 TMUDF: Unsigned numeric is mapped to TypeInfo::NUMERIC

bug 1436450 TMUDF: copyPassThruData() fails to pad nchar data properly

Added a predefined UDF to do timeseries queries. "Predefined" means that

like a built-in function it is not registered in the metadata. It is still

a UDF, though, using the SDK for UDFs.

Here is how to invoke the UDF:

select ...

from udf(timeseries(table(select ...

                         from ...

                         [partition by ...]

                         order by <tscol>),

                   <name of time slice column>,

                   <time slice width>

                   [ { , <col name>, <instr> } ... ]

                  )

       )

<tscol> is a date, time or timestamp column from the input table

       that describes the time dimension of the data. The data

       can optionally be partitioned into multiple time series

       that are independent of each other, using a PARTITION BY.

<name of time slice column> is the name of the generated output

       column that contains the starting time of each time slice.

<time slice width> is an interval literal that determines how wide

       each time slice is.

An optional list of pairs of <col name> and <instr> indicates

       column values to be interpolated, according to the instructions.

       Instructions:

       Instruction  Value at   Interpolation  Ignore nulls

       -----------  ---------  -------------  ------------

           FC       beginning     constant         no

           LC          end        constant         no

           FCI      beginning     constant         yes

           LCI         end        constant         yes

           FL       beginning      linear          no

           LL          end         linear          no

           FLI      beginning      linear          yes

           LLI         end         linear          yes

Example:

 select *

 from udf(timeseries(table(select cust_id,

                                  tstamp,

                                  kwh

                           from e_meters

                           partition by cust_id

                           order by tstamp),

                     'HOURLY_READING',  -- name of time slice column

                     interval '1' hour, -- time slice width

                     'KWH', 'FL',       -- value of KWH column at beginning

                                        -- of time slice, use linear

                                        -- interpolation

                     'KWH', 'LCI'));    -- end value, constant interpolation,

                                        -- ignore NULL values

 This will chop the time range of each customer into time slices,

 1 hour wide, and will use linear interpolation for the meter readings

 (assume we have readings for cust1 at 8:00 for 1000 and 10:30 for 1002

 and readings for cust2 at 8:00 for 400 and at 9:30 with a NULL value).

 CUST_ID  HOURLY_READING       KWH_FL    KWH_LCI

 -------  -------------------  --------  -------

 cust1    2015-03-20 08:00:00   1000.00     1000

 cust1    2015-03-20 09:00:00   1000.80     1000

 cust1    2015-03-20 10:00:00   1001.60     1002

 cust2    2015-03-20 08:00:00    400.00      400

 cust2    2015-03-20 09:00:00         ?      400

Other changes:

- Added DCS gui support to install_local_hadoop. If installed with

 non-standard ports, see file $MY_SQROOT/sql/scripts/swurls.html

 for the port numbers to use. I would recommend that you bookmark

 this file in the browser you are using locally on your workstation.

- Addressed comments made by Dave B. in earlier checkins:

 - Make error message 3286 more easy to understand.

 - Change name resolution rules for predefined UDRs such that

   real (user-defined) UDRs take precedence.

 - Add comments to Trafodion engine files where some logic

   is duplicated in the UDR SDK (file sqludr/sqludr.cpp and

   in the future the equivalent Java file).

- Fix for "orphan entries in up queue" assert when canceling

 a TMUDF while it is still reading data from its table-valued

 inputs.

Patch set 2: The jenkins build flagged some warnings as errors

            that were not flagged on my workstation.

Change-Id: I1b806e35e2b2e91a42318fbbfd788e92d8cba070

Show less

default + 9 more