Skip to content

Compatibility with PySpark 3.0.0 and NumPy #4

@browshanravan

Description

@browshanravan

When I run StratifiedCrossValidator instead of CrossValidator with my pipeline, I get the following error, which I suspect relates to the newer version of PySpark and/or NumPy since spark_stratifier installs pyspark-2.3.2 and numpy==1.15.1 as part of its installation.

Any plans for upgrading the package?

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<ipython-input-4-e237f44298bb> in <module>
    237 
    238 
--> 239 cvModel = crossval.fit(train)
    240 predictions = cvModel.transform(test)

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/ml/base.py in fit(self, dataset, params)
    127                 return self.copy(params)._fit(dataset)
    128             else:
--> 129                 return self._fit(dataset)
    130         else:
    131             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.py in _fit(self, dataset)
     45     metrics = [0.0] * numModels
     46 
---> 47     stratified_data = self.stratify_data(dataset)
     48 
     49     for i in range(nFolds):

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/spark_stratifier/stratifier.py in stratify_data(self, dataset)
     26     split_ratio = 1.0 / nFolds
     27 
---> 28     passes = dataset[dataset['label'] == 1]
     29     fails = dataset[dataset['label'] == 0]
     30 

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/dataframe.py in __getitem__(self, item)
   1378         """
   1379         if isinstance(item, basestring):
-> 1380             jc = self._jdf.apply(item)
   1381             return Column(jc)
   1382         elif isinstance(item, Column):

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
   1303         answer = self.gateway_client.send_command(command)
   1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
   1306 
   1307         for temp_arg in temp_args:

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in deco(*a, **kw)
    135                 # Hide where the exception came from that shows a non-Pythonic
    136                 # JVM exception message.
--> 137                 raise_from(converted)
    138             else:
    139                 raise

~/PycharmProjects/Data School/DS_Pandas_tut/venv/lib/python3.7/site-packages/pyspark/sql/utils.py in raise_from(e)

AnalysisException: Cannot resolve column name "label" among (type, amount, oldbalanceOrg, newbalanceOrig, isFraud, sample_weight_per_class);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions