-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
When I run StratifiedCrossValidator instead of CrossValidator with my pipeline, I get the following error, which I suspect relates to the newer version of PySpark and/or NumPy since spark_stratifier installs pyspark-2.3.2 and numpy==1.15.1 as part of its installation.
Any plans for upgrading the package?
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<ipython-input-4-e237f44298bb> in <module>
237
238
--> 239 cvModel = crossval.fit(train)
240 predictions = cvModel.transform(test)
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/ml/base.py in fit(self, dataset, params)
127 return self.copy(params)._fit(dataset)
128 else:
--> 129 return self._fit(dataset)
130 else:
131 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.py in _fit(self, dataset)
45 metrics = [0.0] * numModels
46
---> 47 stratified_data = self.stratify_data(dataset)
48
49 for i in range(nFolds):
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.py in stratify_data(self, dataset)
26 split_ratio = 1.0 / nFolds
27
---> 28 passes = dataset[dataset['label'] == 1]
29 fails = dataset[dataset['label'] == 0]
30
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/dataframe.py in __getitem__(self, item)
1378 """
1379 if isinstance(item, basestring):
-> 1380 jc = self._jdf.apply(item)
1381 return Column(jc)
1382 elif isinstance(item, Column):
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
1303 answer = self.gateway_client.send_command(command)
1304 return_value = get_return_value(
-> 1305 answer, self.gateway_client, self.target_id, self.name)
1306
1307 for temp_arg in temp_args:
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
135 # Hide where the exception came from that shows a non-Pythonic
136 # JVM exception message.
--> 137 raise_from(converted)
138 else:
139 raise
~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in raise_from(e)
AnalysisException: Cannot resolve column name "label" among (type, amount, oldbalanceOrg, newbalanceOrig, isFraud, sample_weight_per_class);Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels