Skip to content
This repository was archived by the owner on Jul 29, 2024. It is now read-only.

Commit c79bbc1

Browse files
authored
updated the base docker to delta 3.0 and spark 3.5. (#60)
1 parent 1d24e69 commit c79bbc1

File tree

4 files changed

+116
-96
lines changed

4 files changed

+116
-96
lines changed

static/quickstart_docker/Dockerfile_delta_quickstart

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -18,20 +18,23 @@
1818
# Dockerfile for Delta Lake quickstart
1919
# ------------------------------------------------
2020

21-
# This docker image uses the official Docker image of [OSS] Apache Spark v3.3.2 as the base container
21+
# This docker image uses the official Docker image of [OSS] Apache Spark v3.5.0 as the base container
2222
# Note: Python version in this image is 3.9.2 and is available as `python3`.
23-
ARG BASE_CONTAINER=apache/spark-py:v3.3.2
23+
# Note: PySpark v3.5.0 (https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies)
24+
ARG BASE_CONTAINER=spark:3.5.0-scala2.12-java11-python3-ubuntu
2425
FROM $BASE_CONTAINER as spark
2526
FROM spark as delta
2627

2728
# Authors (add your name when updating the Dockerfile)
28-
LABEL authors="Prashanth Babu,Denny Lee,Andrew Bauman"
29+
LABEL authors="Prashanth Babu,Denny Lee,Andrew Bauman, Scott Haines"
2930

3031
# Docker image was created and tested with the versions of following packages.
3132
USER root
32-
ARG DELTA_SPARK_VERSION="2.3.0"
33-
ARG DELTALAKE_VERSION="0.9.0"
34-
ARG JUPYTERLAB_VERSION="3.6.3"
33+
ARG DELTA_SPARK_VERSION="3.0.0"
34+
# Note: for 3.0.0 https://pypi.org/project/deltalake/
35+
ARG DELTALAKE_VERSION="0.12.0"
36+
ARG JUPYTERLAB_VERSION="4.0.7"
37+
# requires pandas >1.0.5, py4j>=0.10.9.7, pyarrow>=4.0.0
3538
ARG PANDAS_VERSION="1.5.3"
3639
ARG ROAPI_VERSION="0.9.0"
3740

@@ -45,7 +48,7 @@ FROM delta as startup
4548
ARG NBuser=NBuser
4649
ARG GROUP=NBuser
4750
ARG WORKDIR=/opt/spark/work-dir
48-
ENV DELTA_PACKAGE_VERSION=delta-core_2.12:${DELTA_SPARK_VERSION}
51+
ENV DELTA_PACKAGE_VERSION=delta-spark_2.12:${DELTA_SPARK_VERSION}
4952

5053
# OS Installations Configurations
5154
RUN groupadd -r ${GROUP} && useradd -r -m -g ${GROUP} ${NBuser}
@@ -62,7 +65,8 @@ RUN chown -R ${NBuser}:${GROUP} /home/${NBuser}/ \
6265
# Rust install
6366
USER ${NBuser}
6467
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y
65-
RUN source "$HOME/.cargo/env"
68+
# moved the source command into the bash process in the entrypoint startup.sh
69+
#RUN source "$HOME/.cargo/env"
6670

6771
# Establish entrypoint
6872
ENTRYPOINT ["bash", "startup.sh"]

static/quickstart_docker/README.md

Lines changed: 95 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -45,10 +45,12 @@ Note, there are different versions of the Delta Lake docker
4545
| ----------------- | -------- | ------ | ------ | ----------- | ----- | ---------- | ------ | ----- |
4646
| 0.8.1_2.3.0 | amd64 | 0.8.1 | latest | 2.3.0 | 3.3.2 | 3.6.3 | 1.5.3 | 0.9.0 |
4747
| 0.8.1_2.3.0_arm64 | arm64 | 0.8.1 | latest | 2.3.0 | 3.3.2 | 3.6.3 | 1.5.3 | 0.9.0 |
48-
| latest | amd64 | 0.9.0 | latest | 2.3.0 | 3.3.2 | 3.6.3 | 1.5.3 | 0.9.0 |
49-
| latest | arm64 | 0.9.0 | latest | 2.3.0 | 3.3.2 | 3.6.3 | 1.5.3 | 0.9.0 |
48+
| 1.0.0_3.0.0 | amd64 | 0.12.0 | latest | 3.0.0 | 3.5.0 | 3.6.3 | 1.5.3 | 0.9.0 |
49+
| 1.0.0_3.0.0_arm64 | arm64 | 0.12.0 | latest | 3.0.0 | 3.5.0 | 3.6.3 | 1.5.3 | 0.9.0 |
50+
| latest | amd64 | 0.12.0 | latest | 3.0.0 | 3.5.0 | 3.6.3 | 1.5.3 | 0.9.0 |
51+
| latest | arm64 | 0.12.0 | latest | 3.0.0 | 3.5.0 | 3.6.3 | 1.5.3 | 0.9.0 |
5052

51-
\*\* Note, the arm64 version is built for ARM64 platforms like Mac M1
53+
> Note, the arm64 version is built for ARM64 platforms like Mac M1
5254
5355
Download the appropriate tag, e.g.:
5456

@@ -75,7 +77,7 @@ Once the image has been built or you have downloaded the correct image, you can
7577

7678
In the following instructions, the variable `${DELTA_PACKAGE_VERSION}` refers to the Delta Lake Package version.
7779

78-
The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark 3.3.x release line.
80+
The current version is `delta-spark_2.12:3.0.0` which corresponds to Apache Spark 3.5.x release line.
7981

8082
## Choose an Interface
8183

@@ -98,7 +100,7 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
98100
python3
99101
```
100102

101-
> Note: The Delta Rust Python bindings are already installed in this docker. To do this manually in your own environment, run the command: `pip3 install deltalake==0.9.0`
103+
> Note: The Delta Rust Python bindings are already installed in this docker. To do this manually in your own environment, run the command: `pip3 install deltalake==0.12.0`
102104
103105
1. Run some basic commands in the shell to write to and read from Delta Lake with Pandas
104106

@@ -126,13 +128,13 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
126128

127129
```python
128130
## Output
129-
0
130-
0 0
131-
1 1
132-
2 2
133-
... ...
134-
8 9
135-
9 10
131+
data
132+
0 0
133+
1 1
134+
2 2
135+
...
136+
8 9
137+
9 10
136138
```
137139

138140
1. Review the files
@@ -144,7 +146,7 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
144146

145147
```python
146148
## Output
147-
['0-d4920663-30e9-4a1a-afde-59bc4ebd24b5-0.parquet', '1-f27a5ea6-a15f-4ca1-91b3-72bcf64fbc09-0.parquet']
149+
['0-6944fddf-60e3-4eab-811d-1398e9f64073-0.parquet', '1-66c7ee6e-6aab-4c74-866d-a82790102652-0.parquet']
148150
```
149151

150152
1. Review history
@@ -156,7 +158,7 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
156158

157159
```python
158160
## Output
159-
[{'timestamp': 1682475171964, 'delta-rs': '0.8.0'}, {'timestamp': 1682475171985, 'operation': 'WRITE', 'operationParameters': {'partitionBy': '[]', 'mode': 'Append'}, 'clientVersion': 'delta-rs.0.8.0'}]
161+
[{'timestamp': 1698002214493, 'operation': 'WRITE', 'operationParameters': {'mode': 'Append', 'partitionBy': '[]'}, 'clientVersion': 'delta-rs.0.17.0', 'version': 1}, {'timestamp': 1698002207527, 'operation': 'CREATE TABLE', 'operationParameters': {'mode': 'ErrorIfExists', 'protocol': '{"minReaderVersion":1,"minWriterVersion":1}', 'location': 'file:///tmp/deltars_table', 'metadata': '{"configuration":{},"created_time":1698002207525,"description":null,"format":{"options":{},"provider":"parquet"},"id":"bf749aab-22b6-484b-bd73-dc1680ee4384","name":null,"partition_columns":[],"schema":{"fields":[{"metadata":{},"name":"data","nullable":true,"type":"long"}],"type":"struct"}}'}, 'clientVersion': 'delta-rs.0.17.0', 'version': 0}]
160162
```
161163

162164
1. Time Travel (load older version of table)
@@ -171,12 +173,12 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
171173

172174
```python
173175
## Output
174-
0
175-
0 0
176-
1 1
177-
2 2
178-
3 3
179-
4 4
176+
data
177+
0 0
178+
1 1
179+
2 2
180+
3 3
181+
4 4
180182
```
181183

182184
1. Follow the delta-rs Python documentation [here](https://delta-io.github.io/delta-rs/python/usage.html#)
@@ -189,9 +191,9 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
189191

190192
```bash
191193
total 12
192-
4 drwxr-xr-x 2 NBuser 4096 Apr 26 02:12 _delta_log
193-
4 -rw-r--r-- 1 NBuser 1689 Apr 26 02:12 0-d4920663-30e9-4a1a-afde-59bc4ebd24b5-0.parquet
194-
4 -rw-r--r-- 1 NBuser 1691 Apr 26 02:12 1-f27a5ea6-a15f-4ca1-91b3-72bcf64fbc09-0.parquet
194+
4 -rw-r--r-- 1 NBuser 1689 Oct 22 19:16 0-6944fddf-60e3-4eab-811d-1398e9f64073-0.parquet
195+
4 -rw-r--r-- 1 NBuser 1691 Oct 22 19:16 1-66c7ee6e-6aab-4c74-866d-a82790102652-0.parquet
196+
4 drwxr-xr-x 2 NBuser 4096 Oct 22 19:16 _delta_log
195197
```
196198

197199
1. [Optional] Skip ahead to try out the [Delta Rust API](#delta-rust-api) and [ROAPI](#optional-roapi)
@@ -225,11 +227,15 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
225227
3. Launch a pyspark interactive shell session
226228

227229
```bash
230+
228231
$SPARK_HOME/bin/pyspark --packages io.delta:${DELTA_PACKAGE_VERSION} \
232+
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" \
229233
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
230234
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
231235
```
232236

237+
> Note: `DELTA_PACKAGE_VERSION` is set in `./startup.sh`
238+
233239
4. Run some basic commands in the shell
234240

235241
```python
@@ -277,16 +283,20 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
277283
```
278284

279285
```bash
280-
total 36
281-
4 drwxr-xr-x 2 NBuser 4096 Apr 26 02:30 _delta_log
282-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:30 .part-00000-bdee316b-8623-4423-b59c-6a809addaea8-c000.snappy.parquet.crc
283-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:30 .part-00001-6b373d50-5bdd-496a-9e21-ab4164176f11-c000.snappy.parquet.crc
284-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:30 .part-00002-9721ce9e-e043-4875-bcff-08f7d7c3d3f0-c000.snappy.parquet.crc
285-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:30 .part-00003-61aaf450-c318-452a-aea5-5a44c909fd74-c000.snappy.parquet.crc
286-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:30 part-00000-bdee316b-8623-4423-b59c-6a809addaea8-c000.snappy.parquet
287-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:30 part-00001-6b373d50-5bdd-496a-9e21-ab4164176f11-c000.snappy.parquet
288-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:30 part-00002-9721ce9e-e043-4875-bcff-08f7d7c3d3f0-c000.snappy.parquet
289-
4 -rw-r--r-- 1 NBuser 486 Apr 26 02:30 part-00003-61aaf450-c318-452a-aea5-5a44c909fd74-c000.snappy.parquet
286+
total 52
287+
4 drwxr-xr-x 2 NBuser 4096 Oct 22 19:23 _delta_log
288+
4 -rw-r--r-- 1 NBuser 296 Oct 22 19:23 part-00000-dc0fd6b3-9c0f-442f-a6db-708301b27bd2-c000.snappy.parquet
289+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00000-dc0fd6b3-9c0f-442f-a6db-708301b27bd2-c000.snappy.parquet.crc
290+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:23 part-00001-d379441e-1ee4-4e78-8616-1d9635df1c7b-c000.snappy.parquet
291+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00001-d379441e-1ee4-4e78-8616-1d9635df1c7b-c000.snappy.parquet.crc
292+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:23 part-00003-c08dcac4-5ea9-4329-b85d-9110493e8757-c000.snappy.parquet
293+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00003-c08dcac4-5ea9-4329-b85d-9110493e8757-c000.snappy.parquet.crc
294+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:23 part-00005-5db8dd16-2ab1-4d76-9b4d-457c5641b1c8-c000.snappy.parquet
295+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00005-5db8dd16-2ab1-4d76-9b4d-457c5641b1c8-c000.snappy.parquet.crc
296+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:23 part-00007-cad760e0-3c26-4d22-bed6-7d75a9459a0f-c000.snappy.parquet
297+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00007-cad760e0-3c26-4d22-bed6-7d75a9459a0f-c000.snappy.parquet.crc
298+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:23 part-00009-b58e8445-07b7-4e2a-9abf-6fea8d0c3e3f-c000.snappy.parquet
299+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:23 .part-00009-b58e8445-07b7-4e2a-9abf-6fea8d0c3e3f-c000.snappy.parquet.crc
290300
```
291301

292302
### Scala Shell
@@ -299,17 +309,21 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
299309

300310
```bash
301311
$SPARK_HOME/bin/spark-shell --packages io.delta:${DELTA_PACKAGE_VERSION} \
312+
--conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" \
302313
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
303314
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
304315
```
305316

306317
4. Run some basic commands in the shell
307318

319+
> note: if you've already written to the Delta table in the python shell example, use `.mode("overwrite")` to overwrite the current delta table. You can always time-travel to rewind.
320+
308321
```scala
309322
// Create a Spark DataFrame
310323
val data = spark.range(0, 5)
311324

312325
// Write to a Delta Lake table
326+
313327
(data
314328
.write
315329
.format("delta")
@@ -350,22 +364,29 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
350364
```
351365

352366
```bash
353-
total 36
354-
4 drwxr-xr-x 2 NBuser 4096 Apr 26 02:31 _delta_log
355-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:31 .part-00000-e0353d3e-7473-4ff7-9b58-e977d48d008a-c000.snappy.parquet.crc
356-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:31 .part-00001-0e2c89cf-3f9b-4698-b059-6dd41d4e3aed-c000.snappy.parquet.crc
357-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:31 .part-00002-06bf68f9-16d8-4c08-ba8e-7b0b00d52b8e-c000.snappy.parquet.crc
358-
4 -rw-r--r-- 1 NBuser 12 Apr 26 02:31 .part-00003-5963f002-d98a-421f-9c2d-22376b7f87e4-c000.snappy.parquet.crc
359-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:31 part-00000-e0353d3e-7473-4ff7-9b58-e977d48d008a-c000.snappy.parquet
360-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:31 part-00001-0e2c89cf-3f9b-4698-b059-6dd41d4e3aed-c000.snappy.parquet
361-
4 -rw-r--r-- 1 NBuser 478 Apr 26 02:31 part-00002-06bf68f9-16d8-4c08-ba8e-7b0b00d52b8e-c000.snappy.parquet
362-
4 -rw-r--r-- 1 NBuser 486 Apr 26 02:31 part-00003-5963f002-d98a-421f-9c2d-22376b7f87e4-c000.snappy.parquet
367+
total 52
368+
4 drwxr-xr-x 2 NBuser 4096 Oct 22 19:28 _delta_log
369+
4 -rw-r--r-- 1 NBuser 296 Oct 22 19:28 part-00000-f1f417f7-df64-4c7c-96f2-6a452ae2b49e-c000.snappy.parquet
370+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00000-f1f417f7-df64-4c7c-96f2-6a452ae2b49e-c000.snappy.parquet.crc
371+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:28 part-00001-b28acb6f-f08a-460f-a24e-4d9c1affee86-c000.snappy.parquet
372+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00001-b28acb6f-f08a-460f-a24e-4d9c1affee86-c000.snappy.parquet.crc
373+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:28 part-00003-29079c58-d1ad-4604-9c04-0f00bf09546d-c000.snappy.parquet
374+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00003-29079c58-d1ad-4604-9c04-0f00bf09546d-c000.snappy.parquet.crc
375+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:28 part-00005-04424aa7-48e1-4212-bd57-52552c713154-c000.snappy.parquet
376+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00005-04424aa7-48e1-4212-bd57-52552c713154-c000.snappy.parquet.crc
377+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:28 part-00007-e7a54a4f-bee4-4371-a35d-d284e28eb9f8-c000.snappy.parquet
378+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00007-e7a54a4f-bee4-4371-a35d-d284e28eb9f8-c000.snappy.parquet.crc
379+
4 -rw-r--r-- 1 NBuser 478 Oct 22 19:28 part-00009-086e6cd9-e8c6-4f16-9658-b15baf22905d-c000.snappy.parquet
380+
4 -rw-r--r-- 1 NBuser 12 Oct 22 19:28 .part-00009-086e6cd9-e8c6-4f16-9658-b15baf22905d-c000.snappy.parquet.crc
363381
```
364382

365383
</details>
366384

367385
### Delta Rust API
368386

387+
> Note: Use a docker volume in case of running into limits "no room left on device"
388+
> `docker volume create rustbuild` > `docker run --name delta_quickstart -v rustbuild:/tmp --rm -it --entrypoint bash deltaio/delta-docker:3.0.0`
389+
369390
1. Open a bash shell (if on windows use git bash, WSL, or any shell configured for bash commands)
370391

371392
2. Run a container from the image with a bash entrypoint ([build](#build-entry-point) | [DockerHub](#image-entry-point))
@@ -377,28 +398,26 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
377398
cargo run --example read_delta_table
378399
```
379400

401+
> You can also use a different location to build and run the examples
402+
403+
```bash
404+
cd rs
405+
CARGO_TARGET_DIR=/tmp cargo run --example read_delta_table
406+
```
407+
380408
> If using [Delta Lake DockerHub](https://go.delta.io/dockerhub), sometimes the Rust environment hasn't been configured. To resolve this, run the command `source "$HOME/.cargo/env"`
381409
382410
```bash
383411
=== Delta table metadata ===
384-
DeltaTable(../quickstart_docker/rs/data/COVID-19_NYT)
412+
DeltaTable(/opt/spark/work-dir/rs/data/COVID-19_NYT)
385413
version: 0
386414
metadata: GUID=7245fd1d-8a6d-4988-af72-92a95b646511, name=None, description=None, partitionColumns=[], createdTime=Some(1619121484605), configuration={}
387415
min_version: read=1, write=2
388416
files count: 8
389417

390418

391419
=== Delta table files ===
392-
[
393-
Path { raw: "part-00000-a496f40c-e091-413a-85f9-b1b69d4b3b4e-c000.snappy.parquet" },
394-
Path { raw: "part-00001-9d9d980b-c500-4f0b-bb96-771a515fbccc-c000.snappy.parquet" },
395-
Path { raw: "part-00002-8826af84-73bd-49a6-a4b9-e39ffed9c15a-c000.snappy.parquet" },
396-
Path { raw: "part-00003-539aff30-2349-4b0d-9726-c18630c6ad90-c000.snappy.parquet" },
397-
Path { raw: "part-00004-1bb9c3e3-c5b0-4d60-8420-23261f58a5eb-c000.snappy.parquet" },
398-
Path { raw: "part-00005-4d47f8ff-94db-4d32-806c-781a1cf123d2-c000.snappy.parquet" },
399-
Path { raw: "part-00006-d0ec7722-b30c-4e1c-92cd-b4fe8d3bb954-c000.snappy.parquet" },
400-
Path { raw: "part-00007-4582392f-9fc2-41b0-ba97-a74b3afc8239-c000.snappy.parquet" }
401-
]
420+
[Path { raw: "part-00000-a496f40c-e091-413a-85f9-b1b69d4b3b4e-c000.snappy.parquet" }, Path { raw: "part-00001-9d9d980b-c500-4f0b-bb96-771a515fbccc-c000.snappy.parquet" }, Path { raw: "part-00002-8826af84-73bd-49a6-a4b9-e39ffed9c15a-c000.snappy.parquet" }, Path { raw: "part-00003-539aff30-2349-4b0d-9726-c18630c6ad90-c000.snappy.parquet" }, Path { raw: "part-00004-1bb9c3e3-c5b0-4d60-8420-23261f58a5eb-c000.snappy.parquet" }, Path { raw: "part-00005-4d47f8ff-94db-4d32-806c-781a1cf123d2-c000.snappy.parquet" }, Path { raw: "part-00006-d0ec7722-b30c-4e1c-92cd-b4fe8d3bb954-c000.snappy.parquet" }, Path { raw: "part-00007-4582392f-9fc2-41b0-ba97-a74b3afc8239-c000.snappy.parquet" }]
402421
```
403422
404423
4. Execute `examples/read_delta_datafusion.rs` to query the `covid19_nyt` Delta Lake table using `datafusion`
@@ -408,37 +427,29 @@ The current version is `delta-core_2.12:2.3.0` which corresponds to Apache Spark
408427
```
409428
410429
```bash
430+
=== Datafusion query ===
431+
[RecordBatch { schema: Schema { fields: [Field { name: "cases", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "county", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "date", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, columns: [PrimitiveArray<Int32>
411432
[
412-
RecordBatch {
413-
schema: Schema {
414-
fields: [
415-
Field { name: "cases", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None },
416-
Field { name: "county", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None },
417-
Field { name: "date", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }
418-
], metadata: {}
419-
},
420-
columns: [PrimitiveArray<Int32>[
421-
1,
422-
1,
423-
1,
424-
1,
425-
1,
426-
], StringArray [
427-
"Snohomish",
428-
"Snohomish",
429-
"Snohomish",
430-
"Cook",
431-
"Snohomish",
432-
], StringArray [
433-
"2020-01-21",
434-
"2020-01-22",
435-
"2020-01-23",
436-
"2020-01-24",
437-
"2020-01-24",
438-
]],
439-
row_count: 5
440-
}
441-
]
433+
1,
434+
1,
435+
1,
436+
1,
437+
1,
438+
], StringArray
439+
[
440+
"Snohomish",
441+
"Snohomish",
442+
"Snohomish",
443+
"Cook",
444+
"Snohomish",
445+
], StringArray
446+
[
447+
"2020-01-21",
448+
"2020-01-22",
449+
"2020-01-23",
450+
"2020-01-24",
451+
"2020-01-24",
452+
]], row_count: 5 }]
442453
```
443454
444455
</p>

static/quickstart_docker/rs/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
name = "quickstart"
3-
version = "0.1.1"
4-
rust-version = "1.64"
3+
version = "0.1.2"
4+
rust-version = "1.73"
55
authors = ["Denny Lee <[email protected]>"]
66
license = "Apache-2.0"
77
keywords = ["deltalake", "delta", "datalake", "deltars"]

static/quickstart_docker/startup.sh

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
#!/bin/bash
22

3+
source "$HOME/.cargo/env"
4+
35
export PYSPARK_DRIVER_PYTHON=jupyter
46
export PYSPARK_DRIVER_PYTHON_OPTS='lab --ip=0.0.0.0'
7+
export DELTA_SPARK_VERSION='3.0.0'
8+
export DELTA_PACKAGE_VERSION=delta-spark_2.12:${DELTA_SPARK_VERSION}
59

610
$SPARK_HOME/bin/pyspark --packages io.delta:${DELTA_PACKAGE_VERSION} \
7-
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
8-
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
11+
--conf "spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp -Dio.netty.tryReflectionSetAccessible=true" \
12+
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
13+
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

0 commit comments

Comments
 (0)