Clone
Ali Alsuliman <ali.al.solaiman@gmail.com>
committed
on 22 Apr
[ASTERIXDB-2713][EXT] CSV & TSV support for external dataset p3
- user model changes: no
- storage format changes: no
- interface changes: y… Show more
[ASTERIXDB-2713][EXT] CSV & TSV support for external dataset p3

- user model changes: no

- storage format changes: no

- interface changes: yes

        IRecordDataParser, IRecordReader, IRecordConverter

Details:

- record parser:

        - delimited-data (CSV/TSV) parser: ignore and warn for invalid records.

        - other parses: continue to use their existing behaviour.

- stream parser:

        continue to use their existing behaviour.

- fixes:

        - fixed S3 stream read() to properly advance to next files and also

          to notify consumers to handle properties like header properly.

        - fixed localfs stream read() when reached end of current file

          and notifying of a new file source.

        - extracted the read() of both streams since now they are identical.

- report file, record number and field number in warnings of parser

- propagate stream name to parsers that need report stream name

- add test cases

Change-Id: Ie1ba545d753d8afef9cef4e290e058019a465201

Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/5926

Reviewed-by: Ali Alsuliman <ali.al.solaiman@gmail.com>

Reviewed-by: Murtadha Hubail <mhubail@apache.org>

Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>

Show less