forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-29339][R] Support Arrow 0.14 in vectoried dapply and gapply (t…
…est it in AppVeyor build) This PR proposes: 1. Use `is.data.frame` to check if it is a DataFrame. 2. to install Arrow and test Arrow optimization in AppVeyor build. We're currently not testing this in CI. 1. To support SparkR with Arrow 0.14 2. To check if there's any regression and if it works correctly. ```r df <- createDataFrame(mtcars) collect(dapply(df, function(rdf) { data.frame(rdf$gear + 1) }, structType("gear double"))) ``` **Before:** ``` Error in readBin(con, raw(), as.integer(dataLen), endian = "big") : invalid 'n' argument ``` **After:** ``` gear 1 5 2 5 3 5 4 4 5 4 6 4 7 4 8 5 9 5 ... ``` AppVeyor Closes apache#25993 from HyukjinKwon/arrow-r-appveyor. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
- Loading branch information
1 parent
768ae42
commit 5154b7b
Showing
9 changed files
with
31 additions
and
39 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,7 +22,8 @@ Suggests: | |
rmarkdown, | ||
testthat, | ||
e1071, | ||
survival | ||
survival, | ||
arrow | ||
Collate: | ||
'schema.R' | ||
'generics.R' | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -648,13 +648,20 @@ Apache Arrow is an in-memory columnar data format that is used in Spark to effic | |
|
||
## Ensure Arrow Installed | ||
|
||
Currently, Arrow R library is not on CRAN yet [ARROW-3204](https://issues.apache.org/jira/browse/ARROW-3204). Therefore, it should be installed directly from Github. You can use `remotes::install_github` as below. | ||
Arrow R library is available on CRAN as of [ARROW-3204](https://issues.apache.org/jira/browse/ARROW-3204). It can be installed as below. | ||
|
||
```bash | ||
Rscript -e 'remotes::install_github("apache/arrow@TAG", subdir = "r")' | ||
Rscript -e 'install.packages("arrow", repos="https://cloud.r-project.org/")' | ||
``` | ||
|
||
`TAG` is a version tag that can be checked in [Arrow at Github](https://github.com/apache/arrow/releases). You must ensure that Arrow R package is installed and available on all cluster nodes. The current supported version is 0.12.1. | ||
If you need to install old versions, it should be installed directly from Github. You can use `remotes::install_github` as below. | ||
|
||
```bash | ||
Rscript -e 'remotes::install_github("apache/[email protected]", subdir = "r")' | ||
``` | ||
|
||
`apache-arrow-0.12.1` is a version tag that can be checked in [Arrow at Github](https://github.com/apache/arrow/releases). You must ensure that Arrow R package is installed and available on all cluster nodes. | ||
The current supported minimum version is 0.12.1; however, this might change between the minor releases since Arrow optimization in SparkR is experimental. | ||
|
||
## Enabling for Conversion to/from R DataFrame, `dapply` and `gapply` | ||
|
||
|