Clone Tools
  • last updated a few seconds ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6168: Revise format plugin table functions

Allows table functions to inherit properties from a

defined format plugin.

Also DRILL-7612: enforces immutability for all format plugins.

  1. … 45 more files in changeset.
DRILL-7620: Fix plugin mutability issues

A recent commit made the plugin registry more strict about

the rule that, once a plugin is registered, it must be

immutable. A flaw enforcing that rule in the UI put the

registry in an inconsistent state.

Also

* Registry-specific errors

* Push more operations from UI layer into registry

* Clean up semantics of "resolve" for plugins

* Add more unit tests

* Better handling of "bad" plugins

* Force plugin names to lower case

* Fix comparison bugs in some format plugins

  1. … 101 more files in changeset.
DRILL-7592: Add missing licenses and update plugins exclusion list and fix licenses

closes #1989

    • -18
    • +21
    ./main/resources/drill-module.conf
  1. … 85 more files in changeset.
DRILL-7590: Refactor plugin registry

Major cleanup of the plugin registry to split it into components

in preparation for a proper plugin API.

Better coordinates the named and ephemeral plugin caches.

Cleans up the registry API. Sharpens rules for modifying

plugin configs.

closes #1988

  1. … 162 more files in changeset.
DRILL-5674: Support ZIP compression

1. Added ZipCodec implementation which can read / write single file.

2. Revisited Drill plugin formats to ensure 'openPossiblyCompressedStream' method is used in those which support compression.

3. Added unit tests.

4. General refactoring.

  1. … 16 more files in changeset.
DRILL-7350: Move RowSet related classes from test folder

  1. … 292 more files in changeset.
DRILL-7030: Make format plugins fully pluggable

- Bootstrap files for format plugins were introduced and added to the existing plugins in contrib.

- Formats from these files are being added dynamically to the corresponding storage plugins.

closes #1780

    • -0
    • +26
    ./main/resources/bootstrap-format-plugins.json
  1. … 5 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 83 more files in changeset.
DRILL-6582: SYSLOG (RFC-5424) Format Plugin closes #1530

    • -0
    • +22
    ./main/resources/drill-module.conf
    • -0
    • +8
    ./test/resources/syslog/logs.syslog
    • -0
    • +8
    ./test/resources/syslog/logs.syslog1
    • -0
    • +6
    ./test/resources/syslog/logs1.syslog
    • -0
    • +1
    ./test/resources/syslog/test.syslog
    • -0
    • +2
    ./test/resources/syslog/test.syslog1
  1. … 11 more files in changeset.
DRILL-1328: Support table statistics - Part 2

Add support for avg row-width and major type statistics.

Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.

Update/fix rowcount, selectivity and ndv computations to improve plan costing.

Add options for configuring collection/usage of statistics.

Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).

Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.

Add support for CPU sampling and nested scalar columns.

Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.

Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.

FUNCS: Statistics functions as UDFs:

Separate

Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.

* custom versions of "count" that always return BigInt

* HyperLogLog based NDV that returns BigInt that works only on VarChars

* HyperLogLog with binary output that only works on VarChars

OPS: Updated protobufs for new ops

OPS: Implemented StatisticsMerge

OPS: Implemented StatisticsUnpivot

ANALYZE: AnalyzeTable functionality

* JavaCC syntax more-or-less copied from LucidDB.

* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel

ANALYZE: Add getMetadataTable() to AbstractSchema

USAGE: Change field access in QueryWrapper

USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel

* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor

* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.

USAGE: Attach DrillStatsTable to DrillTable.

* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table

* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.

** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.

** Query is set up to extract only the most recent statistics results for each column.

closes #729

  1. … 143 more files in changeset.