a benchmark to test scalability of xgboost4j-spark and relevant projects
You have to ensure that maven (3.0+) and cmake is installed in your $PATH
- 
Edit build/build.sh and define variables like TARGET_URL, TARGET_BRANCH 
- 
run build/build.sh 
- 
You get the benchmark jar in target/ 
- Generate Data:
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
    --class me.codingcat.xgboost4j.AirlineDataGenerator --files conf/airline_datagen.conf \
     target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline_datagen.conf- Run workload:
spark-submit --master yarn-cluster --num-executors 10 --executor-memory 6g --executor-cores 8 \
    --class me.codingcat.xgboost4j.AirlineClassifier --files conf/airline.conf \
     target/scala-2.11/xgboost4j-spark-scalability-assembly-0.1-SNAPSHOT.jar ./airline.conf