diff --git a/src/components/PageLayout/PageFooter.jsx b/src/components/PageLayout/PageFooter.jsx index 1c288f2..88b5c9b 100644 --- a/src/components/PageLayout/PageFooter.jsx +++ b/src/components/PageLayout/PageFooter.jsx @@ -104,7 +104,7 @@ const PageFooter = () => ( Copyright © {new Date().getFullYear()} Delta Lake, a series of LF Projects, LLC. For web site terms of use, trademark policy and other - project polcies please see{" "} + project policies please see{" "} https://lfprojects.org diff --git a/src/pages/latest/concurrency-control.mdx b/src/pages/latest/concurrency-control.mdx index 3d51d35..40cc459 100644 --- a/src/pages/latest/concurrency-control.mdx +++ b/src/pages/latest/concurrency-control.mdx @@ -51,7 +51,7 @@ operate in three stages: The following table describes which pairs of write operations can conflict. Compaction refers to [file compaction operation](/latest/best-practices#compact-files) written with the option dataChange set to false. -| | **INSERT** | **UPDATE, DELTE, MERGE INTO** | **OPTIMIZE** | +| | **INSERT** | **UPDATE, DELETE, MERGE INTO** | **OPTIMIZE** | | ------------------------------ | --------------- | ----------------------------- | ------------ | | **INSERT** | Cannot conflict | | | | **UPDATE, DELETE, MERGE INTO** | Can conflict | Can conflict | | diff --git a/src/pages/latest/delta-batch.mdx b/src/pages/latest/delta-batch.mdx index 05c43de..bea1afb 100644 --- a/src/pages/latest/delta-batch.mdx +++ b/src/pages/latest/delta-batch.mdx @@ -13,7 +13,7 @@ For many Delta Lake operations on tables, you enable integration with Apache Spa Delta Lake supports creating two types of tables --- tables defined in the metastore and tables defined by path. -To work with metastore-defined tables, you must enable integration with Apache Spark DataSourceV2 and Catalog APIs by setting configurations when you create a new `SparkSession`. See [Configure SparkSesion](#configure-sparksession). +To work with metastore-defined tables, you must enable integration with Apache Spark DataSourceV2 and Catalog APIs by setting configurations when you create a new `SparkSession`. See [Configure SparkSession](#configure-sparksession). You can create tables in the following ways. @@ -1347,7 +1347,7 @@ pyspark --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" -- ## Configure storage credentials -Delta Lake uses Hadoop FileSystem APIs to access the storage systems. The credentails for storage systems usually can be set through Hadoop configurations. Delta Lake provides multiple ways to set Hadoop configurations similar to Apache Spark. +Delta Lake uses Hadoop FileSystem APIs to access the storage systems. The credentials for storage systems usually can be set through Hadoop configurations. Delta Lake provides multiple ways to set Hadoop configurations similar to Apache Spark. ### Spark configurations @@ -1365,7 +1365,7 @@ Spark SQL will pass all of the current [SQL session configurations](http://spark Besides setting Hadoop file system configurations through the Spark (cluster) configurations or SQL session configurations, Delta supports reading Hadoop file system configurations from `DataFrameReader` and `DataFrameWriter` options (that is, option keys that start with the `fs.` prefix) when the table is read or written, by using `DataFrameReader.load(path)` or `DataFrameWriter.save(path)`. -For example, you can pass your storage credentails through DataFrame options: +For example, you can pass your storage credentials through DataFrame options: diff --git a/src/pages/latest/delta-storage.mdx b/src/pages/latest/delta-storage.mdx index e067d93..6426e92 100644 --- a/src/pages/latest/delta-storage.mdx +++ b/src/pages/latest/delta-storage.mdx @@ -203,7 +203,7 @@ that S3 is lacking. - All of the requirements listed in [\_](#requirements-s3-single-cluster) section -- In additon to S3 credentials, you also need DynamoDB operating permissions +- In addition to S3 credentials, you also need DynamoDB operating permissions #### Quickstart (S3 multi-cluster) diff --git a/src/pages/latest/delta-streaming.mdx b/src/pages/latest/delta-streaming.mdx index db1cd6f..801d2e3 100644 --- a/src/pages/latest/delta-streaming.mdx +++ b/src/pages/latest/delta-streaming.mdx @@ -279,7 +279,7 @@ For applications with more lenient latency requirements, you can save computing Available in Delta Lake 2.0.0 and above. -The command `foreachBatch` allows you to specify a function that is executed on the output of every micro-batch after arbitrary transformations in the streaming query. This allows implementating a `foreachBatch` function that can write the micro-batch output to one or more target Delta table destinations. However, `foreachBatch` does not make those writes idempotent as those write attempts lack the information of whether the batch is being re-executed or not. For example, rerunning a failed batch could result in duplicate data writes. +The command `foreachBatch` allows you to specify a function that is executed on the output of every micro-batch after arbitrary transformations in the streaming query. This allows implementing a `foreachBatch` function that can write the micro-batch output to one or more target Delta table destinations. However, `foreachBatch` does not make those writes idempotent as those write attempts lack the information of whether the batch is being re-executed or not. For example, rerunning a failed batch could result in duplicate data writes. To address this, Delta tables support the following `DataFrameWriter` options to make the writes idempotent: diff --git a/src/pages/latest/delta-update.mdx b/src/pages/latest/delta-update.mdx index 6ac9b04..e97477b 100644 --- a/src/pages/latest/delta-update.mdx +++ b/src/pages/latest/delta-update.mdx @@ -455,7 +455,7 @@ You can reduce the time taken by merge using the following approaches: - will make the query faster as it looks for matches only in the relevant partitions. Furthermore, it will also reduce the chances of conflicts with other concurrent operations. See [concurency control](/latest/concurrency-control) for more details. + will make the query faster as it looks for matches only in the relevant partitions. Furthermore, it will also reduce the chances of conflicts with other concurrent operations. See [concurrency control](/latest/concurrency-control) for more details. - **Compact files**: If the data is stored in many small files, reading the data to search for matches can become slow. You can compact small files into larger files to improve read throughput. See [best practices for compaction](/latest/best-practices/#compact-files) for details. diff --git a/src/pages/latest/integrations.mdx b/src/pages/latest/integrations.mdx index 119c1ff..2d3689c 100644 --- a/src/pages/latest/integrations.mdx +++ b/src/pages/latest/integrations.mdx @@ -1,6 +1,6 @@ --- title: Access Delta tables from external data processing engines -description: Docs for accessesing Delta tables from external data processing engines +description: Docs for accessing Delta tables from external data processing engines --- You can access Delta tables from Apache Spark and [other data processing systems](https://delta.io/integrations/). Here is the list of integrations that enable you to access Delta tables from external data processing engines. diff --git a/src/pages/latest/porting.mdx b/src/pages/latest/porting.mdx index c885101..6efa052 100644 --- a/src/pages/latest/porting.mdx +++ b/src/pages/latest/porting.mdx @@ -122,7 +122,7 @@ migrating from older to newer versions of Delta Lake. Delta Lake 1.2.1, 2.0.0 and 2.1.0 have a bug in their DynamoDB-based S3 multi-cluster configuration implementations where an incorrect timestamp value was written to DynamoDB. This caused [DynamoDB’s TTL](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html) feature to cleanup completed items before it was safe to do so. This has been fixed in Delta Lake versions 2.0.1 and 2.1.1, and the TTL attribute has been renamed from `commitTime` to `expireTime`. -If you already have TTL enabled on your DynamoDB table using the old attribute, you need to disable TTL for that attribute and then enable it for the new one. You may need to wait an hour between these two operations, as TTL settings changes may take some time to propagate. See the DynamoDB docs [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html). If you don’t do this, DyanmoDB’s TTL feature will not remove any new and expired entries. There is no risk of data loss. +If you already have TTL enabled on your DynamoDB table using the old attribute, you need to disable TTL for that attribute and then enable it for the new one. You may need to wait an hour between these two operations, as TTL settings changes may take some time to propagate. See the DynamoDB docs [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html). If you don’t do this, DynamoDB’s TTL feature will not remove any new and expired entries. There is no risk of data loss. ```bash # Disable TTL on old attribute diff --git a/src/pages/latest/quick-start.mdx b/src/pages/latest/quick-start.mdx index 757adbb..cbfcc4d 100644 --- a/src/pages/latest/quick-start.mdx +++ b/src/pages/latest/quick-start.mdx @@ -373,7 +373,7 @@ deltaTable.toDF().show(); You should see that some of the existing rows have been updated and new rows have been inserted. -For more information on these operations, see [Table delets, updates, and merges](/latestl/delta-update). +For more information on these operations, see [Table deletes, updates, and merges](/latestl/delta-update). ## Read older versions of data using time travel diff --git a/static/quickstart_docker/README.md b/static/quickstart_docker/README.md index b89f0a2..7079c1e 100644 --- a/static/quickstart_docker/README.md +++ b/static/quickstart_docker/README.md @@ -202,7 +202,7 @@ The current version is `delta-spark_2.12:3.0.0` which corresponds to Apache Spar 1. Open a bash shell (if on windows use git bash, WSL, or any shell configured for bash commands) -2. Run a container from the image with a JuypterLab entrypoint +2. Run a container from the image with a JupyterLab entrypoint ```bash # Build entry point