Skip to content

Commit b80ce99

Browse files
amitkduttaLeziak
authored andcommitted
# This is a combination of 17 commits.
# This is the 1st commit message: [native] Add a test group for async data cache e2e tests. # This is the commit message prestodb#2: [native] Advance velox. # This is the commit message prestodb#3: Fix error when describing a nonexistent table # This is the commit message prestodb#4: [native] test modification. # This is the commit message prestodb#5: Changes to enable ssl/tls in hms Co-authored-by: Arin Mathew <arin.mathew1@ibm.com> Changes to move ssl related properties to seperate class # This is the commit message prestodb#6: [native] Add tests for UUID type # This is the commit message prestodb#7: Reintroduced json_extract to generate canonicalized output (prestodb#24879) ## Description The original pull request [prestodb#24614](prestodb#24614) incorrectly compares canonicalizedJsonExtract and legacyJsonCast in the equals function of an object. This issue can be seen in the code [here](https://github.com/prestodb/presto/pull/24614/files#diff-e921c5d186f9d5daa836bc7330f52caf8c1b84d19cf42288d5a8a7c9a6d2a5d5R156). As a result, whenever a SQL function requires caching, the cache is never hit, leading to the creation of new SQL function objects repeatedly. This behavior eventually causes an OOM error in the JVM metaspace. and eventually this error led to UER SEV. After the problematic comparison was updated and tested through shadow cluster by @rschlussel , we are confident that the issue has been resolved in this PR. Therefore, we plan to bring back the json canonicalized extract ## Motivation and Context Reintroduced json_extract to generate canonicalized output ## Impact <!---Describe any public API or user-facing feature change or any performance impact--> low impact ## Test Plan <!---Please fill in how you tested your change--> N/A ## Contributor checklist - [x] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [x] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [x] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [x] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [x] Adequate tests were added if applicable. - [x] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ``` # This is the commit message prestodb#8: Reuse JoinType in IndexJoinNode Reuse JoinType instead of creating IndexJoinNode's own. JoinType is already part of Prestissimo protocol. Adding IndexJoinNode with another JoinType would cause naming conflict. # This is the commit message prestodb#9: Add jaro-winkler implementation, documentation and tests # This is the commit message prestodb#10: Add documentation for Iceberg support in PrestoCPP # This is the commit message prestodb#11: Add memory pool debug regex # This is the commit message prestodb#12: doc on hive csv limitations # This is the commit message prestodb#13: Add support for S3 WebIdentity authentication # This is the commit message prestodb#14: use com.facebook.airlift:security in presto-hive-metastore # This is the commit message prestodb#15: [native] Advance Velox # This is the commit message prestodb#16: [native] Add protocol for index lookup join plan # This is the commit message prestodb#17: [native] Add sidecar and sidecar plugin documentation
1 parent c875296 commit b80ce99

79 files changed

Lines changed: 2965 additions & 573 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/hive-tests.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,9 @@ jobs:
114114
run: |
115115
export MAVEN_OPTS="${MAVEN_INSTALL_OPTS}"
116116
./mvnw install ${MAVEN_FAST_INSTALL} -am -pl :presto-hive
117-
- name: Run Hive Dockerized Tests
117+
- name: Run Hive Insert Overwrite Tests
118118
if: needs.changes.outputs.codechange == 'true'
119119
run: ./mvnw test ${MAVEN_TEST} -pl :presto-hive -P test-hive-insert-overwrite
120+
- name: Run Hive SSL Enabled Tests
121+
if: needs.changes.outputs.codechange == 'true'
122+
run: ./mvnw test ${MAVEN_TEST} -pl :presto-hive -P test-ssl-enabled-hms

presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowMetadata.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,9 @@ public Map<SchemaTableName, List<ColumnMetadata>> listTableColumns(ConnectorSess
175175
for (SchemaTableName tableName : tables) {
176176
try {
177177
ConnectorTableHandle tableHandle = getTableHandle(session, tableName);
178-
columns.put(tableName, getTableMetadata(session, tableHandle).getColumns());
178+
if (tableHandle != null) {
179+
columns.put(tableName, getTableMetadata(session, tableHandle).getColumns());
180+
}
179181
}
180182
catch (ClassCastException | NotFoundException e) {
181183
throw new ArrowException(ARROW_FLIGHT_METADATA_ERROR, "Table columns could not be listed for table: " + tableName, e);

presto-base-arrow-flight/src/test/java/com/facebook/plugin/arrow/TestArrowFlightQueries.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
import static com.facebook.presto.common.type.VarcharType.VARCHAR;
4545
import static com.facebook.presto.common.type.VarcharType.createVarcharType;
4646
import static com.facebook.presto.testing.MaterializedResult.resultBuilder;
47+
import static com.facebook.presto.testing.assertions.Assert.assertEquals;
4748
import static java.lang.String.format;
4849
import static org.testng.Assert.assertTrue;
4950

@@ -137,6 +138,16 @@ public void testSelectTime()
137138
assertTrue(actualRow.equals(expectedRow));
138139
}
139140

141+
@Test
142+
public void testDescribeUnknownTable()
143+
{
144+
MaterializedResult actualRows = computeActual("DESCRIBE information_schema.enabled_roles");
145+
MaterializedResult expectedRows = resultBuilder(getSession(), VARCHAR, VARCHAR, VARCHAR, VARCHAR)
146+
.row("role_name", "varchar", "", "")
147+
.build();
148+
assertEquals(actualRows, expectedRows);
149+
}
150+
140151
private LocalDate getDate(String dateString)
141152
{
142153
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");

presto-common/src/main/java/com/facebook/presto/common/function/SqlFunctionProperties.java

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ public class SqlFunctionProperties
3737
private final boolean legacyJsonCast;
3838
private final Map<String, String> extraCredentials;
3939
private final boolean warnOnCommonNanPatterns;
40+
private final boolean canonicalizedJsonExtract;
4041

4142
private SqlFunctionProperties(
4243
boolean parseDecimalLiteralAsDouble,
@@ -50,7 +51,8 @@ private SqlFunctionProperties(
5051
boolean fieldNamesInJsonCastEnabled,
5152
boolean legacyJsonCast,
5253
Map<String, String> extraCredentials,
53-
boolean warnOnCommonNanPatterns)
54+
boolean warnOnCommonNanPatterns,
55+
boolean canonicalizedJsonExtract)
5456
{
5557
this.parseDecimalLiteralAsDouble = parseDecimalLiteralAsDouble;
5658
this.legacyRowFieldOrdinalAccessEnabled = legacyRowFieldOrdinalAccessEnabled;
@@ -64,6 +66,7 @@ private SqlFunctionProperties(
6466
this.legacyJsonCast = legacyJsonCast;
6567
this.extraCredentials = requireNonNull(extraCredentials, "extraCredentials is null");
6668
this.warnOnCommonNanPatterns = warnOnCommonNanPatterns;
69+
this.canonicalizedJsonExtract = canonicalizedJsonExtract;
6770
}
6871

6972
public boolean isParseDecimalLiteralAsDouble()
@@ -127,6 +130,9 @@ public boolean shouldWarnOnCommonNanPatterns()
127130
return warnOnCommonNanPatterns;
128131
}
129132

133+
public boolean isCanonicalizedJsonExtract()
134+
{ return canonicalizedJsonExtract; }
135+
130136
@Override
131137
public boolean equals(Object o)
132138
{
@@ -146,15 +152,16 @@ public boolean equals(Object o)
146152
Objects.equals(sessionLocale, that.sessionLocale) &&
147153
Objects.equals(sessionUser, that.sessionUser) &&
148154
Objects.equals(extraCredentials, that.extraCredentials) &&
149-
Objects.equals(legacyJsonCast, that.legacyJsonCast);
155+
Objects.equals(legacyJsonCast, that.legacyJsonCast) &&
156+
Objects.equals(canonicalizedJsonExtract, that.canonicalizedJsonExtract);
150157
}
151158

152159
@Override
153160
public int hashCode()
154161
{
155162
return Objects.hash(parseDecimalLiteralAsDouble, legacyRowFieldOrdinalAccessEnabled, timeZoneKey,
156163
legacyTimestamp, legacyMapSubscript, sessionStartTime, sessionLocale, sessionUser,
157-
extraCredentials, legacyJsonCast);
164+
extraCredentials, legacyJsonCast, canonicalizedJsonExtract);
158165
}
159166

160167
public static Builder builder()
@@ -176,6 +183,7 @@ public static class Builder
176183
private boolean legacyJsonCast;
177184
private Map<String, String> extraCredentials = emptyMap();
178185
private boolean warnOnCommonNanPatterns;
186+
private boolean canonicalizedJsonExtract;
179187

180188
private Builder() {}
181189

@@ -251,6 +259,12 @@ public Builder setWarnOnCommonNanPatterns(boolean warnOnCommonNanPatterns)
251259
return this;
252260
}
253261

262+
public Builder setCanonicalizedJsonExtract(boolean canonicalizedJsonExtract)
263+
{
264+
this.canonicalizedJsonExtract = canonicalizedJsonExtract;
265+
return this;
266+
}
267+
254268
public SqlFunctionProperties build()
255269
{
256270
return new SqlFunctionProperties(
@@ -265,7 +279,8 @@ public SqlFunctionProperties build()
265279
fieldNamesInJsonCastEnabled,
266280
legacyJsonCast,
267281
extraCredentials,
268-
warnOnCommonNanPatterns);
282+
warnOnCommonNanPatterns,
283+
canonicalizedJsonExtract);
269284
}
270285
}
271286
}

presto-delta/src/main/java/com/facebook/presto/delta/DeltaModule.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444
import com.facebook.presto.hive.metastore.InvalidateMetastoreCacheProcedure;
4545
import com.facebook.presto.hive.metastore.MetastoreCacheStats;
4646
import com.facebook.presto.hive.metastore.MetastoreConfig;
47+
import com.facebook.presto.hive.metastore.thrift.ThriftHiveMetastoreConfig;
4748
import com.facebook.presto.spi.procedure.Procedure;
4849
import com.fasterxml.jackson.databind.DeserializationContext;
4950
import com.fasterxml.jackson.databind.deser.std.FromStringDeserializer;
@@ -104,6 +105,7 @@ protected void setup(Binder binder)
104105
configBinder(binder).bindConfig(MetastoreConfig.class);
105106
configBinder(binder).bindConfig(HiveClientConfig.class);
106107
configBinder(binder).bindConfig(MetastoreClientConfig.class);
108+
configBinder(binder).bindConfig(ThriftHiveMetastoreConfig.class);
107109
binder.bind(MetastoreCacheStats.class).to(HiveMetastoreCacheStats.class).in(Scopes.SINGLETON);
108110
newExporter(binder).export(MetastoreCacheStats.class).as(generatedNameOf(MetastoreCacheStats.class, connectorId));
109111
binder.bind(ExtendedHiveMetastore.class).to(InMemoryCachingHiveMetastore.class).in(Scopes.SINGLETON);

presto-docs/src/main/sphinx/connector/hive.rst

Lines changed: 72 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -213,9 +213,9 @@ Metastore Configuration Properties
213213

214214
The required Hive metastore can be configured with a number of properties.
215215

216-
======================================================= ============================================================= ============
216+
======================================================== ============================================================= ============
217217
Property Name Description Default
218-
======================================================= ============================================================= ============
218+
======================================================== ============================================================= ============
219219
``hive.metastore-timeout`` Timeout for Hive metastore requests. ``10s``
220220

221221
``hive.metastore-cache-ttl`` Duration how long cached metastore data should be considered ``0s``
@@ -232,7 +232,17 @@ Property Name Descriptio
232232
``hive.invalidate-metastore-cache-procedure-enabled`` When enabled, users will be able to invalidate metastore false
233233
cache on demand.
234234

235-
======================================================= ============================================================= ============
235+
``hive.metastore.thrift.client.tls.enabled`` Whether TLS security is enabled. false
236+
237+
``hive.metastore.thrift.client.tls.keystore-path`` Path to the PEM or JKS key store. NONE
238+
239+
``hive.metastore.thrift.client.tls.keystore-password`` Password for the key store. NONE
240+
241+
``hive.metastore.thrift.client.tls.truststore-path`` Path to the PEM or JKS trust store. NONE
242+
243+
``hive.metastore.thrift.client.tls.truststore-password`` Password for the trust store. NONE
244+
245+
======================================================== ============================================================= ============
236246

237247
AWS Glue Catalog Configuration Properties
238248
-----------------------------------------
@@ -361,6 +371,13 @@ Property Name Description
361371
``hive.s3.skip-glacier-objects`` Ignore Glacier objects rather than failing the query. This
362372
will skip data that may be expected to be part of the table
363373
or partition. Defaults to ``false``.
374+
375+
``hive.s3.web.identity.auth.enabled`` Enables Web Identity authentication for S3 access. Requires
376+
``hive.s3.iam-role`` to be specified. Additionally, ensure that
377+
the environment variables ``AWS_WEB_IDENTITY_TOKEN_FILE`` and
378+
``AWS_REGION`` are set for proper authentication. Since this
379+
implementation uses AWS SDK 1.x, setting these environment
380+
variables is necessary.
364381
============================================ =================================================================
365382

366383
S3 Credentials
@@ -1117,4 +1134,56 @@ Drop a schema::
11171134
Hive Connector Limitations
11181135
--------------------------
11191136

1137+
SQL DELETE
1138+
^^^^^^^^^^
1139+
11201140
:doc:`/sql/delete` is only supported if the ``WHERE`` clause matches entire partitions.
1141+
1142+
CSV Format Type Limitations
1143+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
1144+
1145+
When creating tables with CSV format, all columns must be defined as ``VARCHAR`` due to
1146+
the underlying OpenCSVSerde limitations. `OpenCSVSerde <https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/OpenCSVSerde.java>`_ deserializes all CSV columns
1147+
as strings only. Using any other data type will result in an error similar to the following::
1148+
1149+
CREATE TABLE hive.csv.csv_fail (
1150+
id BIGINT,
1151+
value INT,
1152+
date_col DATE
1153+
) with ( format = 'CSV' ) ;
1154+
1155+
.. code-block:: none
1156+
1157+
Query failed: Hive CSV storage format only supports VARCHAR (unbounded).
1158+
Unsupported columns: id integer, value integer, date_col date
1159+
1160+
To work with other data types when using CSV format:
1161+
1162+
1. Create the table with all the columns as ``VARCHAR``
1163+
2. Create a view or another table that casts the columns to their desired data types
1164+
1165+
Example::
1166+
1167+
-- First create table with VARCHAR columns
1168+
CREATE TABLE hive.csv.csv_data (
1169+
id VARCHAR,
1170+
value VARCHAR,
1171+
date_col VARCHAR
1172+
)
1173+
WITH (format = 'CSV');
1174+
1175+
-- Then create a view with the proper data types
1176+
CREATE VIEW hive.csv.csv_data_view AS
1177+
SELECT
1178+
CAST(id AS BIGINT) AS id,
1179+
CAST(value AS INT) AS value,
1180+
CAST(date_col AS DATE) AS date_col
1181+
FROM hive.csv.csv_data;
1182+
1183+
-- OR another table with the proper data types
1184+
CREATE TABLE hive.csv.csv_data_cast AS
1185+
SELECT
1186+
CAST(id AS BIGINT) AS id,
1187+
CAST(value AS INT) AS value,
1188+
CAST(date_col AS DATE) AS date_col
1189+
FROM hive.csv.csv_data;

0 commit comments

Comments
 (0)