Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-7485: NPE on PCAP Batch Reader

closes #1932

    • binary
    ./pcap/arpWithNullIP.pcap
  1. … 3 more files in changeset.
DRILL-7484: Malware found in the Drill test folder

closes #1934

    • -0
    • +1
    ./pcap/dataFromRemote.txt
  1. … 1 more file in changeset.
DRILL-7473: Parquet reader failed to get field of repeated map

closes #1933

  1. … 5 more files in changeset.
DRILL-7443: Enable PCAP Plugin to Reassemble TCP Streams

closes #1898

    • binary
    ./pcap/attack-trace.pcap
  1. … 12 more files in changeset.
DRILL-7292: Remove V1 and V2 text readers

Drill 1.16 introduced the "V2" text reader based on the row set

and provided schema mechanisms. V3 was available by system/session

option as the functionality was considered experimental.

The functionality has now undergone thorough testing. This commit makes

the V3 text reader available by default, and removes the code for the

original "V1" and the "new" (compliant, "V2") text reader.

The system/session options that controlled reader selection are retained

for backward compatibility, but they no longer do anything.

Specific changes:

* Removed the V2 "compliant" text reader.

* Moved the "V3" to replace the "compliant" version.

* Renamed the "complaint" package to "reader."

* Removed the V1 text reader.

* Moved the V1 text writer (still used with the V2 and V3 readers)

into a new "writer" package adjacent to the reader.

* Removed the CSV tests for the V2 reader, including those that

demonstrated bugs in V2.

* V2 did not properly handle the quote escape character. One or two unit

tests depended on the broken behavior. Fixed them for the correct

behavior.

* Behavior of "messy quotes" (those that appear in a non-quoted field)

was undefined for the text reader. Added a test to clearly demonstrate

the (somewhat odd) behavior. The behavior itself was not changed.

Reran all unit tests to ensure that they work with the now-default V3

text reader.

closes #1806

  1. … 59 more files in changeset.
DRILL-7196: Queries are still runnable on disabled plugins

- Storage client is not created anymore for disabled plugins

- GET "/storage/{name}.json" endpoint now working with

plugin configuration directly, without client instantination.

It have increased UI responsitivity.

- Hbase and mongo base test classes refactored to honor enabled

plugin attribute

- Fixed path contructor for mongo test datasets:

Now it is cross-platform

- Fixed test json files format which using plugin definitions

- Code cleanup

    • -2
    • +2
    ./json/project_pushdown_json_physical_plan.json
  1. … 105 more files in changeset.
DRILL-7032: Ignore corrupt rows in a PCAP file

closes #1637

  1. … 4 more files in changeset.
DRILL-7096: Develop vector for canonical Map<K,V>

- Added new type DICT;

- Created value vectors for the type for single and repeated modes;

- Implemented corresponding FieldReaders and FieldWriters;

- Made changes in EvaluationVisitor to be able to read values from the map by key;

- Made changes to DrillParquetGroupConverter to be able to read Parquet's MAP type;

- Added an option `store.parquet.reader.enable_map_support` to disable reading MAP type as DICT from Parquet files;

- Updated AvroRecordReader to use new DICT type for Avro's MAP;

- Added support of the new type to ParquetRecordWriter.

    • binary
    ./parquet/complex/map/parquet/000000_0.parquet
    • binary
    ./parquet/complex/simple_map.parquet
  1. … 107 more files in changeset.
DRILL-7068: Support memory adjustment framework for resource management with Queues. closes #1677

    • -3
    • +9
    ./json/project_pushdown_json_physical_plan.json
  1. … 37 more files in changeset.
DRILL-4858: REPEATED_COUNT on an array of maps and an array of arrays is not implemented

- Implemented 'repeated_count' function for repeated MAP and repeated LIST;

- Updated RepeatedListReader and RepeatedMapReader implementations to return correct value from size() method

- Moved repeated_count to freemarker template and added support for more repeated types for the function

closes #1641

    • binary
    ./parquet/complex/repeated_types.parquet
  1. … 8 more files in changeset.
DRILL-6670: Align Parquet TIMESTAMP_MICROS logical type handling with earlier versions + minor fixes

closes #1428

    • binary
    ./parquet/complex/parquet_logical_types_complex.parquet
    • binary
    ./parquet/complex/parquet_logical_types_complex_nodict.parquet
    • binary
    ./parquet/complex/parquet_logical_types_complex_nullable.parquet
    • binary
    ./parquet/complex/parquet_logical_types_complex_nullable_nodict.parquet
  1. … 12 more files in changeset.
DRILL-5797: Use Parquet new reader on all non-complex columns queries

    • binary
    ./parquet/complex/complex_special_cases.parquet
  1. … 6 more files in changeset.
DRILL-6179: Added pcapng-format support

    • binary
    ./pcapng/example.pcapng
    • binary
    ./pcapng/sniff.pcapng
  1. … 21 more files in changeset.
DRILL-6375 : Support for ANY_VALUE aggregate function

closes #1256

    • -0
    • +50
    ./json/test_anyvalue.json
  1. … 36 more files in changeset.
DRILL-6191: Add acknowledgement sequence number and flags fields, details for flags

closes #1134

  1. … 7 more files in changeset.
DRILL-5783, DRILL-5841, DRILL-5894: Rationalize test temp directories

This change includes:

DRILL-5783:

- A unit test is created for the priority queue in the TopN operator.

- The code generation classes passed around a completely unused function registry reference in some places so it is removed.

- The priority queue had unused parameters for some of its methods so it is removed.

DRILL-5841:

- Created standardized temp directory classes DirTestWatcher, SubDirTestWatcher, and BaseDirTestWatcher. And updated all unit tests to use them.

DRILL-5894:

- Removed the dfs_test storage plugin for tests and replaced it with the already existing dfs storage plugin.

Misc:

- General code cleanup.

- Removed unnecessary use of String.format in the tests.

This closes #984

  1. … 365 more files in changeset.
DRILL-5971: Fix INT64, INT32 logical types in complex parquet reader

Added the following types : ENUM (Binary annotated as ENUM) INT96 (Dictionary encoded)

Fixed issue with reading Dictionary encoded fixed width reader

Added test file generator

This closes #1049

    • binary
    ./parquet/complex/logical_int_complex.parquet
    • binary
    ./parquet/complex/parquet_logical_types_complex.parquet
    • binary
    ./parquet/complex/parquet_logical_types_complex_nullable.parquet
  1. … 10 more files in changeset.
DRILL-4264: Allow field names to include dots

    • -10
    • +15
    ./parquet/complex/baseline8.json
  1. … 98 more files in changeset.
DRILL-5432: Added pcap-format support as format plugin

closes #831

  1. … 20 more files in changeset.
DRILL-3178: csv reader should allow newlines inside quotes

This closes #593

    • -0
    • +6
    ./text/WithQuotedCrLf.tbl
  1. … 3 more files in changeset.
DRILL-4364: Image Metadata Format Plugin - Initial commit of Image Metadata Format Plugin - See https://issues.apache.org/jira/browse/DRILL-4364

This closes #367

    • binary
    ./image/1_webp_a.webp
    • binary
    ./image/adobeJpeg1.eps
    • -0
    • +213
    ./image/jpeg.json
    • binary
    ./image/rose-128x174-24bit-lzw.tiff
    • binary
    ./image/rose-128x174-24bit.bmp
  1. … 24 more files in changeset.
DRILL-4108: Handle non existing cols for query w extractHeader

Closes #269

    • -0
    • +6
    ./text/data/cars.csvh-test
    • -0
    • +6
    ./text/data/d2/cars1.csvh
    • -0
    • +5
    ./text/data/d2/cars2.csvh
  1. … 5 more files in changeset.
DRILL-3423: Adding HTTPd Log Parsing functionality including full pushdown, type remapping and wildcard support. Pushed through the requested columns for push down to the parser. Added more tests to cover a few more use cases. Ensured that user query fields are now completely consistent with returned values.

    • -0
    • +2
    ./httpd/dfs-bootstrap.httpd
    • -0
    • +5
    ./httpd/dfs-test-bootstrap-test.httpd
  1. … 12 more files in changeset.
DRILL-951: Add support for csv header row parsing

This closes #232

  1. … 9 more files in changeset.
DRILL-4006: Reallocate offset vector in repeated vectors when index is beyond the current capacity

Author: Steven Phillips <smp@apache.org>

This closes #243, #242

    • -0
    • +7
    ./json/emptyLists/a.json
    • -0
    • +4
    ./json/emptyLists/b.json
    • -0
    • +1113
    ./json/emptyLists/c.json
  1. … 2 more files in changeset.
DRILL-4028: Update Drill to leverage latest version of Parquet library.

- Remove references to the shaded version of a Jackson @JsonCreator annotation from parquet, replace with proper fasterxml version.

- Fixing imports using the wrong parquet packages after rebase.

- Fixing issues with Drill parquet read a write path after merging the Drill parquet fork back into mainline.

- Fixed the issue with the writer, needed to flush the RecordConsumer in the ParquetRecordWriter.

- Consolidate page reading code

- Added some test to print out some additional context when an ordered comparison of two datasets fails in a test.

- Fix up parquet API usage in Hive Module.

- Adding unit test to read a write all types in parquet, the decimal types and interval year have some issues.

- Use direct codec factory from new package in the parquet library now that it has been moved.

- Moving the test for Direct Codec Factory out of the Drill source as the class itself has been moved.

- Small fix after consolidating two different ByteBuffer based implementations of BytesInput.

- Small fixes to accommodate interface changes.

- Small changes to remove direct references to DirectCodecFactory, this class is not accessible outside of parquet, but an instance with the same contract is now accessible with a new factory method on CodecFactory.

- Fixed failing test using miniDFS when reading a larger parquet file.

This closes #236

    • -0
    • +0
    ./json/donuts_short.json
  1. … 56 more files in changeset.
DRILL-3718: After TextReader finishes reading a field surrounded by double quotes, the reader would skip whitespaces only if those whitespaces are not used as delimiter

  1. … 5 more files in changeset.
DRILL-3423: Initial HTTPD log plugin. Needs tests. Would be good to improve the timestamp and cookies behaviors since we can make those more type specific.

  1. … 6 more files in changeset.
DRILL-3557: Ensure empty CSV's path can be added

    • -0
    • +0
    ./text/directoryWithEmpyCSV/empty.csv
  1. … 2 more files in changeset.
DRILL-3537: Whe scanning files in ScanBatch, ignore all the empty files before reach a non-empty file

    • -0
    • +0
    ./json/jsonDirectoryWithEmpyFile/a.json
    • -0
    • +3
    ./json/jsonDirectoryWithEmpyFile/b.json
  1. … 2 more files in changeset.