Clone Tools
  • last updated 11 mins ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
DRILL-6168: Revise format plugin table functions

Allows table functions to inherit properties from a

defined format plugin.

Also DRILL-7612: enforces immutability for all format plugins.

  1. … 46 more files in changeset.
DRILL-7620: Fix plugin mutability issues

A recent commit made the plugin registry more strict about

the rule that, once a plugin is registered, it must be

immutable. A flaw enforcing that rule in the UI put the

registry in an inconsistent state.

Also

* Registry-specific errors

* Push more operations from UI layer into registry

* Clean up semantics of "resolve" for plugins

* Add more unit tests

* Better handling of "bad" plugins

* Force plugin names to lower case

* Fix comparison bugs in some format plugins

  1. … 101 more files in changeset.
DRILL-7530: Fix class names in loggers

1. Fix incorrect class names for loggers.

2. Minor code cleanup.

closes #1957

  1. … 53 more files in changeset.
DRILL-7021: HTTPD Throws NPE and Doesn't Recognize Timeformat

    • -0
    • +80
    ./HttpdLogFormatConfig.java
  1. … 5 more files in changeset.
DRILL-5603: Replace String file paths to Hadoop Path - replaced all String path representation with org.apache.hadoop.fs.Path - added PathSerDe.Se JSON serializer - refactoring of DFSPartitionLocation code by leveraging existing listPartitionValues() functionality

closes #1657

  1. … 83 more files in changeset.
DRILL-6724: Dump operator context to logs when error occurs during query execution

closes #1455

  1. … 102 more files in changeset.
DRILL-6422: Replace guava imports with shaded ones

  1. … 982 more files in changeset.
DRILL-6639: Exception happens while displaying operator profiles for some queries

  1. … 17 more files in changeset.
DRILL-6639: Exception happens while displaying operator profiles for some queries

closes #1404

  1. … 17 more files in changeset.
DRILL-6320: Fixed license headers.

closes #1207

  1. … 2064 more files in changeset.
DRILL-5771: Fix serDe errors for format plugins

1. Fix various serde issues for format plugins described in DRILL-5771.

2. Throw meaninful exception instead of NPE when table is not found when table function is used.

3. Added unit tests for all format plugins for ensure serde is checked (physical plan is generated in json format and then submitted).

4. Fix physical plan submission on Windows (DRILL-4640).

This closes #1014

  1. … 14 more files in changeset.
DRILL-3243: Added CSG mods. Fixed field names. Removed old test files Added Parse_url and parse_query() functions Fix unit test

This closes #607

    • -0
    • +48
    ./HttpdParserTest.java
  1. … 6 more files in changeset.
DRILL-3423: Adding HTTPd Log Parsing functionality including full pushdown, type remapping and wildcard support. Pushed through the requested columns for push down to the parser. Added more tests to cover a few more use cases. Ensured that user query fields are now completely consistent with returned values.

    • -0
    • +246
    ./HttpdLogFormatPlugin.java
    • -0
    • +299
    ./HttpdLogRecord.java
    • -0
    • +171
    ./HttpdParser.java
  1. … 11 more files in changeset.
DRILL-3423: Initial HTTPD log plugin. Needs tests. Would be good to improve the timestamp and cookies behaviors since we can make those more type specific.

    • -0
    • +487
    ./HttpdFormatPlugin.java
  1. … 6 more files in changeset.
DRILL-1328: Support table statistics - Part 2

Add support for avg row-width and major type statistics.

Parallelize the ANALYZE implementation and stats UDF implementation to improve stats collection performance.

Update/fix rowcount, selectivity and ndv computations to improve plan costing.

Add options for configuring collection/usage of statistics.

Add new APIs and implementation for stats writer (as a precursor to Drill Metastore APIs).

Fix several stats/costing related issues identified while running TPC-H nad TPC-DS queries.

Add support for CPU sampling and nested scalar columns.

Add more testcases for collection and usage of statistics and fix remaining unit/functional test failures.

Thanks to Venki Korukanti (@vkorukanti) for the description below (modified to account for new changes). He graciously agreed to rebase the patch to latest master, fixed few issues and added few tests.

FUNCS: Statistics functions as UDFs:

Separate

Currently using FieldReader to ensure consistent output type so that Unpivot doesn't get confused. All stats columns should be Nullable, so that stats functions can return NULL when N/A.

* custom versions of "count" that always return BigInt

* HyperLogLog based NDV that returns BigInt that works only on VarChars

* HyperLogLog with binary output that only works on VarChars

OPS: Updated protobufs for new ops

OPS: Implemented StatisticsMerge

OPS: Implemented StatisticsUnpivot

ANALYZE: AnalyzeTable functionality

* JavaCC syntax more-or-less copied from LucidDB.

* (Basic) AnalyzePrule: DrillAnalyzeRel -> UnpivotPrel StatsMergePrel FilterPrel(for sampling) StatsAggPrel ScanPrel

ANALYZE: Add getMetadataTable() to AbstractSchema

USAGE: Change field access in QueryWrapper

USAGE: Add getDrillTable() to DrillScanRelBase and ScanPrel

* since ScanPrel does not inherit from DrillScanRelBase, this requires adding a DrillTable to the constructor

* This is done so that a custom ReflectiveRelMetadataProvider can access the DrillTable associated with Logical/Physical scans.

USAGE: Attach DrillStatsTable to DrillTable.

* DrillStatsTable represents the data scanned from a corresponding ".stats.drill" table

* In order to avoid doing query execution right after the ".stats.drill" table is found, metadata is not actually collected until the MaterializationVisitor is used.

** Currently, the metadata source must be a string (so that a SQL query can be created). Doing this with a table is probably more complicated.

** Query is set up to extract only the most recent statistics results for each column.

closes #729

  1. … 143 more files in changeset.