Fix for bug 1442932 and bug 1442966, encoding for varchar Submitting this before finishing regressions on workstation, in the interest of time.
Key encodings for VARCHAR values used to put a varchar length indicator in front of the encoded value. The value was the max. length of the varchar and the indicator was 2 or 4 bytes long, depending on the length of the indicator in the source field. That length used to depend only on the max number of bytes in the field, for >32767 bytes we would use a 4 byte VC length indicator.
Now, with the introduction of long rows, the varchar indicator length for varchars in aligned rows is always 4 bytes, regardless of the character length. This causes a problem for the key encoding.
We could have computed the encoded VC indicator length from the field length. Anoop suggested a better solution, not to include the VC indicator at all, since that is unnecessary. Note that for HBase row keys stored on disk, we already remove the VC indicator by converting such keys from varchar to fixed char. Therefore, the issue happens only for encoding needed in a query, for example when sorting or in a merge join or union.
Description of the fix:
1. Change CompEncode::synthType not to include the VC length indicator in the encoded buffer. This change also includes some minor code clean-up.
2. Change the assert in CompEncode::codeGen not to include the VC indicator length anymore.
3. Changes in ex_function_encode::encodeKeyValue(): a) Read 2 and 4 byte VC length indicators for VARCHAR/NVARCHAR. b) Small code cleanup, don't copy buffer for case-insensitive encode, since that is not necessary. c) Don't write max length as VC length indicator into target and adjust target offsets accordingly (for VARCHAR/NVARCHAR).
4. Other changes in sql/exp/exp_function.cpp: d) Handle 2 and 4 byte VC len indicators in hash function and Hive hash function (problems unrelated to LP bugs fixed). e) Add some asserts for cases where we assume VC length indicator is a 2 byte integer.
CompDecode is not yet changed. Filed bug 1444134 to do that for the next release, since that change is less urgent.
Patch set 2: Copyright notice changes only. Patch set 3: Updated expected regression test file that prints out encoded key in hex.