Skip to content

Commit d2d5070

Browse files
authored
Hadoop configuration serialization bug (#62)
1 parent 527ab20 commit d2d5070

File tree

4 files changed

+31
-87
lines changed

4 files changed

+31
-87
lines changed

README.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,17 @@ This library achieves two different goals:
2222
## Selecting the right Version
2323
It is important to choose the right version depending of your Scala version.
2424

25-
| osm4scala | Scala | Scalapb | Spark |
26-
|:--------:|:------:|:-------:|:-----:|
27-
| 1.0.7-RC1 | 2.11 | 0.9.7 | 2.4 |
28-
| 1.0.7-RC1 | 2.12 | 0.10.2 | 2.4, 3.0 |
29-
| 1.0.7-RC1 | 2.13 | 0.10.2 | NA |
30-
| 1.0.6 | 2.12 | 0.10.2 | 3.0 |
31-
| 1.0.6 | 2.13 | 0.10.2 | NA |
32-
| 1.0.3 | 2.11, 2.12, 2.13 | 0.9.7 | NA |
25+
| osm4scala | Scalapb | Scala | Spark |
26+
|:---------:|:------:|:-------:|:-----:|
27+
| 1.0.7 | 0.9.7 | 2.11 | 2.4 |
28+
| 1.0.7 | 0.10.2 | 2.12 | 2.4, 3.0 |
29+
| 1.0.7 | 0.10.2 | 2.13 | NA |
30+
31+
For example,
32+
- If you want to import the Spark Connector for Scala 2.11 and Spark 2.4: `com.acervera.osm4scala:osm4scala-spark2-shaded_2.11:1.0.7`
33+
- If you want to import the Spark Connector for Scala 2.12 and Spark 2.4: `com.acervera.osm4scala:osm4scala-spark2-shaded_2.12:1.0.7`
34+
- If you want to import the Spark Connector for Scala 2.12 and Spark 3.0: `com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7`
35+
3336

3437
## Core library
3538
With Osm4scala, you can forget about complexity of the `osm.pbf` format and think about a **scala iterators of primitives**
@@ -94,7 +97,7 @@ StructType(
9497
9598
1. Start the shell:
9699
```shell script
97-
bin/spark-shell --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6'
100+
bin/spark-shell --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7'
98101
```
99102
2. Load the data set and execute queries:
100103
```scala
@@ -227,7 +230,7 @@ StructType(
227230
### Examples from spark-sql
228231
1. Start the shell:
229232
```shell script
230-
bin/spark-sql --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6'
233+
bin/spark-sql --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7'
231234
```
232235
2. Load the data set and execute queries:
233236
``` sql
@@ -258,7 +261,7 @@ StructType(
258261
### Examples from pyspark
259262
1. Start the shell:
260263
```shell script
261-
bin/pyspark --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6'
264+
bin/pyspark --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7'
262265
```
263266
2. Load the data set and execute queries:
264267
```python
@@ -291,27 +294,27 @@ The simplest way to add the library to the job, is using the shaded flat jar.
291294
For example:
292295
- Submitting a job:
293296
```shell script
294-
bin/spark-submit --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6' .....
297+
bin/spark-submit --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7' .....
295298
```
296299
297300
- Using in a Spark shell:
298301
```shell script
299-
bin/spark-shell --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6' .....
302+
bin/spark-shell --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7' .....
300303
```
301304
302305
- Using in a Spark SQL shell:
303306
```shell script
304-
bin/spark-sql --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6' .....
307+
bin/spark-sql --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7' .....
305308
```
306309
307310
- Using in a Spark R shell:
308311
```
309-
bin/sparkR --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6'
312+
bin/sparkR --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7'
310313
```
311314
312315
- Using in a PySpark shell:
313316
```
314-
bin/pyspark --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.6'
317+
bin/pyspark --packages 'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.7'
315318
```
316319
317320

spark/src/main/scala/com/acervera/osm4scala/spark/OsmPbfFormat.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import com.acervera.osm4scala.spark.OsmPbfRowIterator._
3333
import org.apache.hadoop.conf.Configuration
3434
import org.apache.hadoop.fs.{FSDataInputStream, FileStatus, Path}
3535
import org.apache.hadoop.mapreduce.Job
36+
import org.apache.spark.SerializableWritable
3637
import org.apache.spark.internal.Logging
3738
import org.apache.spark.sql.SparkSession
3839
import org.apache.spark.sql.catalyst.InternalRow
@@ -68,7 +69,7 @@ class OsmPbfFormat extends FileFormat with DataSourceRegister with Logging {
6869

6970
// TODO: OsmSqlEntity.validateSchema(requiredSchema)
7071

71-
val broadcastedHadoopConf = sparkSession.sparkContext.broadcast(new SerializableConfiguration(hadoopConf))
72+
val broadcastedHadoopConf = sparkSession.sparkContext.broadcast(new SerializableWritable(hadoopConf))
7273

7374
(file: PartitionedFile) =>
7475
{

spark/src/main/scala/com/acervera/osm4scala/spark/SerializableConfiguration.scala

Lines changed: 0 additions & 64 deletions
This file was deleted.

spark/src/test/scala/com/acervera/osm4scala/spark/SparkTestUtilities.scala

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525

2626
package com.acervera.osm4scala.spark
2727

28+
import org.apache.spark.SparkConf
2829
import org.apache.spark.sql.{SQLContext, SparkSession}
2930
import org.scalatest.{BeforeAndAfterAll, Suite}
3031

@@ -62,13 +63,16 @@ trait SparkSessionBeforeAfterAll extends BeforeAndAfterAll { this: Suite =>
6263

6364
var spark: SparkSession = _
6465

66+
def sparkConf(): SparkConf =
67+
new SparkConf()
68+
.setAppName(appName)
69+
.setMaster(s"local[$cores]")
70+
6571
override def beforeAll(): Unit = {
66-
spark =
67-
SparkSession
68-
.builder()
69-
.appName(appName)
70-
.master(s"local[$cores]")
71-
.getOrCreate()
72+
spark = SparkSession
73+
.builder()
74+
.config(sparkConf())
75+
.getOrCreate()
7276

7377
super.beforeAll()
7478
}

0 commit comments

Comments
 (0)