Clone
Igor Guzenko <ihor.huzenko.igs@gmail.com>
committed
on 19 Mar
DRILL-7115: Improve Hive schema show tables performance
1. To make SHOW TABLES for Hive schema work much faster, additional Drill
feature… Show more
DRILL-7115: Improve Hive schema show tables performance

1. To make SHOW TABLES for Hive schema work much faster, additional Drill

  feature of showing only accesible tables when Storage-Based authorization

  is enabled was sacrificed. Now the behaviour matches to Hive/Beeline, all

  tables will be shown despite of accessibility. For details about previous

  show tables results, check description of DRILL-540.

2. In HiveDatabaseSchema implemented faster getTableNamesAndTypes() method

  and removed bulk related code.

3. Deprecated bulk related options and removed bulk code from AbstractSchema,

  DrillHiveMetastoreClient.

4. For 8000 Hive tables query returned in 1.8 seconds, for combination of

  4000 tables and 8000 views query returned in 2.3 seconds. Note, that

  after first query table names will be cached and next queries will perform

  in less than 1 sec.

5. Refactored WorkspaceSchemaFactory's getTableNamesAndTypes()

  method to reuse existing getViews() method.

6. DrillHiveMetastoreClient was refactored. Classes were unnested and enclosed

  within client package with restricted visibility. Also was updated cache

  values type to avoid unnecessarry List to Set back and forth conversions.

  Client creation methods moved to separate class. So the new package

  exposes only factory and client class.

closes #1706

Show less