Merge pull request #101 from queryverse/remove-xlsx-support

davidanthoff · web-flow · commit 3679b66dc599 · 2023-07-15T13:14:55.000-07:00
Remove support for XLSX files
diff --git a/NEWS.md b/NEWS.md
@@ -1,6 +1,7 @@
-# ExcelReaders.jl v1.0.0 Release Notes
+# ExcelReaders.jl v0.12.0 Release Notes
 * Drop julia 0.7 support
 * Migrate to Project.toml
+* Drop support for modern Excel files, this package now only supports legacy xls files
 
 # ExcelReaders.jl v0.11.0 Release Notes
 * Update to PyCall.jl 1.90.0
diff --git a/Project.toml b/Project.toml
@@ -6,14 +6,17 @@ version = "1.0.0-DEV"
 DataValues = "e7dc6d0d-1eca-5fa6-8ad6-5aecde8b7ea5"
 Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
 PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
+Conda = "8f4d0f93-b110-5947-807f-2305c1781a2d"
 
 [extras]
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
+TestItemRunner = "f8b46487-2199-4994-9208-9a1283c18c0a"
 
 [targets]
-test = ["Test"]
+test = ["Test", "TestItemRunner"]
 
 [compat]
 DataValues = "0.4.4"
 PyCall = "1.90"
+Conda = "1 - 1.8.0"
 julia = "1.6"
diff --git a/README.md b/README.md
@@ -7,13 +7,9 @@
 
 ExcelReaders is a package that provides functionality to read Excel files.
 
-**WARNING**: Version v0.9.0 removed all support for [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl)
-from this package. The [ExcelFiles.jl](https://github.com/queryverse/ExcelFiles.jl)
-package now provides functionality to read data from an Excel file into
-a ``DataFrame`` (or any other table type), and users are encouraged to use
-that package for tabular data going forward. Version v0.9.0 also no longer
-uses [DataArrays.jl](https://github.com/JuliaStats/DataArrays.jl), but instead
-is based on [DataValues.jl](https://github.com/queryverse/DataValues.jl).
+**WARNING**: Version v0.12 removed support for modern Excel files. This package is now _only_ supporting legacy xls files. The reason for this is that the underlying Python package made that move a couple of years ago as well.
+
+The [XLSX.jl](https://github.com/felipenoris/XLSX.jl) provides excellent support for modern Excel files.
 
 ## Installation
 
@@ -23,6 +19,8 @@ The package uses the Python xlrd library. If either Python or the xlrd package a
 
 ## Alternatives
 
+The [XLSX.jl](https://github.com/felipenoris/XLSX.jl) provides excellent support for modern Excel files.
+
 The [Taro](https://github.com/aviks/Taro.jl) package also provides Excel file reading functionality. The main difference between the two packages (in terms of Excel functionality) is that ExcelReaders uses the Python package [xlrd](https://github.com/python-excel/xlrd) for its processing, whereas Taro uses the Java packages Apache [Tika](http://tika.apache.org/) and Apache [POI](http://poi.apache.org/).
 
 ## Basic usage
@@ -32,17 +30,17 @@ The most basic usage is this:
 ````julia
 using ExcelReaders
 
-data = readxl("Filename.xlsx", "Sheet1!A1:C4")
+data = readxl("Filename.xls", "Sheet1!A1:C4")
 ````
 
-This will return an array with all the data in the cell range A1 to C4 on Sheet1 in the Excel file Filename.xlsx.
+This will return an array with all the data in the cell range A1 to C4 on Sheet1 in the Excel file Filename.xls.
 
 If you expect to read multiple ranges from the same Excel file you can get much better performance by opening the Excel file only once:
 
 ````julia
 using ExcelReaders
 
-f = openxl("Filename.xlsx")
+f = openxl("Filename.xls")
 
 data1 = readxl(f, "Sheet1!A1:C4")
 data2 = readxl(f, "Sheet2!B4:F10")
@@ -55,10 +53,10 @@ The ``readxlsheet`` function reads complete Excel sheets, without a need to spec
 ````julia
 using ExcelReaders
 
-data = readxlsheet("Filename.xlsx", "Sheet1")
+data = readxlsheet("Filename.xls", "Sheet1")
 ````
 
-This will read all content on Sheet1 in the file Filename.xlsx. Eventual blank rows and columns at the top and left are skipped. ``readxlsheet`` takes a number of optional keyword arguments:
+This will read all content on Sheet1 in the file Filename.xls. Eventual blank rows and columns at the top and left are skipped. ``readxlsheet`` takes a number of optional keyword arguments:
 
 - ``skipstartrows`` accepts either ``:blanks`` (default) or a positive integer. With ``:blank`` any empty initial rows are skipped. An integer skips as many rows as specified.
 - ``skipstartcols`` accepts either ``:blanks`` (default) or a positive integer. With ``:blank`` any empty initial columns are skipped. An integer skips as many columns as specified.
diff --git a/src/ExcelReaders.jl b/src/ExcelReaders.jl
@@ -56,7 +56,7 @@ file only once with ``openxl``.
 
 # Example
 ````julia
-f = openxl("filename.xlsx")
+f = openxl("filename.xls")
 data = readxl(f, "Sheet1!A1:C4")
 ````
 """
diff --git a/src/package_documentation.jl b/src/package_documentation.jl
@@ -30,18 +30,18 @@ The most basic usage is this:
 
 ````julia
 using ExcelReaders
-data = readxl("Filename.xlsx", "Sheet1!A1:C4")
+data = readxl("Filename.xls", "Sheet1!A1:C4")
 ````
 
 This will return an array with all the data in the cell range A1 to
-C4 on Sheet1 in the Excel file Filename.xlsx.
+C4 on Sheet1 in the Excel file Filename.xls.
 
 If you expect to read multiple ranges from the same Excel file you can get much
 better performance by opening the Excel file only once:
 
 ````julia
 using ExcelReaders
-f = openxl("Filename.xlsx")
+f = openxl("Filename.xls")
 data1 = readxl(f, "Sheet1!A1:C4")
 data2 = readxl(f, "Sheet2!B4:F10")
 ````
@@ -53,10 +53,10 @@ specify precise range information. The most basic usage is
 
 ````julia
 using ExcelReaders
-data = readxlsheet("Filename.xlsx", "Sheet1")
+data = readxlsheet("Filename.xls", "Sheet1")
 ````
 
-This will read all content on Sheet1 in the file Filename.xlsx. Eventual blank
+This will read all content on Sheet1 in the file Filename.xls. Eventual blank
 rows and columns at the top and left are skipped. ``readxlsheet`` takes a number
 of optional keyword arguments:
 
diff --git a/test/TestData.xls b/test/TestData.xls
diff --git a/test/TestData.xlsx b/test/TestData.xlsx
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -1,121 +1,3 @@
-using ExcelReaders
-using Dates
-using PyCall
-using DataValues
-using Test
+using TestItemRunner
 
-@testset "ExcelReaders" begin
-
-# TODO Throw julia specific exceptions for these errors
-    @test_throws PyCall.PyError openxl("FileThatDoesNotExist.xlsx")
-    @test_throws PyCall.PyError openxl("runtests.jl")
-
-    filename = normpath(@__DIR__, "TestData.xlsx")
-    file = openxl(filename)
-    @test file.filename == "TestData.xlsx"
-
-    buffer = IOBuffer()
-    show(buffer, file)
-    @test String(take!(buffer)) == "ExcelFile <TestData.xlsx>"
-
-    for (k, v) in Dict(0 => "#NULL!", 7 => "#DIV/0!", 23 => "#REF!", 42 => "#N/A", 29 => "#NAME?", 36 => "#NUM!", 15 => "#VALUE!")
-        errorcell = ExcelErrorCell(k)
-        buffer = IOBuffer()
-        show(buffer, errorcell)
-        @test String(take!(buffer)) == v
-    end
-
-# Read into DataValueArray
-    for f in [file, filename]
-        @test_throws ErrorException readxl(f, "Sheet1!C4:G3")
-        @test_throws ErrorException readxl(f, "Sheet1!G2:B5")
-        @test_throws ErrorException readxl(f, "Sheet1!G5:B2")
-
-        data = readxl(f, "Sheet1!C3:N7")
-        @test size(data) == (5, 12)
-        @test data[4,1] == 2.0
-        @test data[2,2] == "A"
-        @test data[2,3] == true
-        @test DataValues.isna(data[4,5])
-        @test data[2,9] == Date(2015, 3, 3)
-        @test data[3,9] == DateTime(2015, 2, 4, 10, 14)
-        @test data[4,9] == DateTime(1988, 4, 9, 0, 0)
-        @test data[5,9] == Time(15, 2, 0)
-        @test data[3,10] == DateTime(1950, 8, 9, 18, 40)
-        @test DataValues.isna(data[5,10])
-        @test isa(data[2,11], ExcelErrorCell)
-        @test isa(data[3,11], ExcelErrorCell)
-        @test isa(data[4,12], ExcelErrorCell)
-        @test DataValues.isna(data[5,12])
-
-    # Test readxlsheet function
-        @test_throws ErrorException readxlsheet(f, "Empty Sheet")
-        for sheetinfo = ["Second Sheet", 2]
-            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartrows = -1)
-            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartrows = :nonsense)
-
-            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartcols = -1)
-            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartcols = :nonsense)
-
-            @test_throws ErrorException readxlsheet(f, sheetinfo, nrows = -1)
-            @test_throws ErrorException readxlsheet(f, sheetinfo, nrows = :nonsense)
-
-            @test_throws ErrorException readxlsheet(f, sheetinfo, ncols = -1)
-            @test_throws ErrorException readxlsheet(f, sheetinfo, ncols = :nonsense)
-
-            data = readxlsheet(f, sheetinfo)
-            @test size(data) == (6, 6)
-            @test data[2,1] == 1.
-            @test data[5,2] == "CCC"
-            @test data[3,3] == false
-            @test data[6,6] == Time(15, 2, 00)
-            @test DataValues.isna(data[4,3])
-            @test DataValues.isna(data[4,6])
-
-            data = readxlsheet(f, sheetinfo, skipstartrows = :blanks, skipstartcols = :blanks)
-            @test size(data) == (6, 6)
-            @test data[2,1] == 1.
-            @test data[5,2] == "CCC"
-            @test data[3,3] == false
-            @test data[6,6] == Time(15, 2, 00)
-            @test DataValues.isna(data[4,3])
-            @test DataValues.isna(data[4,6])
-
-            data = readxlsheet(f, sheetinfo, skipstartrows = 0, skipstartcols = 0)
-            @test size(data) == (6 + 7, 6 + 3)
-            @test data[2 + 7,1 + 3] == 1.
-            @test data[5 + 7,2 + 3] == "CCC"
-            @test data[3 + 7,3 + 3] == false
-            @test data[6 + 7,6 + 3] == Time(15, 2, 00)
-            @test DataValues.isna(data[4 + 7,3 + 3])
-            @test DataValues.isna(data[4 + 7,6 + 3])
-
-            data = readxlsheet(f, sheetinfo, skipstartrows = 0, )
-            @test size(data) == (6 + 7, 6)
-            @test data[2 + 7,1] == 1.
-            @test data[5 + 7,2] == "CCC"
-            @test data[3 + 7,3] == false
-            @test data[6 + 7,6] == Time(15, 2, 00)
-            @test DataValues.isna(data[4 + 7,3])
-            @test DataValues.isna(data[4 + 7,6])
-
-            data = readxlsheet(f, sheetinfo, skipstartcols = 0)
-            @test size(data) == (6, 6 + 3)
-            @test data[2,1 + 3] == 1.
-            @test data[5,2 + 3] == "CCC"
-            @test data[3,3 + 3] == false
-            @test data[6,6 + 3] == Time(15, 2, 00)
-            @test DataValues.isna(data[4,3 + 3])
-            @test DataValues.isna(data[4,6 + 3])
-
-            data = readxlsheet(f, sheetinfo, skipstartrows = 1, skipstartcols = 1, nrows = 11, ncols = 7)
-            @test size(data) == (11, 7)
-            @test data[2 + 6,1 + 2] == 1.
-            @test data[5 + 6,2 + 2] == "CCC"
-            @test data[3 + 6,3 + 2] == false
-            @test_throws BoundsError data[6 + 6,6 + 2] == Time(15, 2, 00)
-            @test DataValues.isna(data[4 + 6,2 + 2])
-        end
-    end
-
-end
+@run_package_tests
diff --git a/test/test_excelreaders.jl b/test/test_excelreaders.jl
@@ -0,0 +1,114 @@
+@testitem "ExcelReaders" begin
+    using Dates, PyCall, DataValues
+
+# TODO Throw julia specific exceptions for these errors
+    @test_throws PyCall.PyError openxl("FileThatDoesNotExist.xls")
+    @test_throws PyCall.PyError openxl("runtests.jl")
+
+    filename = normpath(@__DIR__, "TestData.xls")
+    file = openxl(filename)
+    @test file.filename == "TestData.xls"
+
+    buffer = IOBuffer()
+    
+    @test sprint(show, file) == "ExcelFile <TestData.xls>"
+
+    for (k, v) in Dict(0 => "#NULL!", 7 => "#DIV/0!", 23 => "#REF!", 42 => "#N/A", 29 => "#NAME?", 36 => "#NUM!", 15 => "#VALUE!")
+        errorcell = ExcelErrorCell(k)
+        @test sprint(show, errorcell) == v
+    end
+
+# Read into DataValueArray
+    for f in [file, filename]
+        @test_throws ErrorException readxl(f, "Sheet1!C4:G3")
+        @test_throws ErrorException readxl(f, "Sheet1!G2:B5")
+        @test_throws ErrorException readxl(f, "Sheet1!G5:B2")
+
+        data = readxl(f, "Sheet1!C3:N7")
+        @test size(data) == (5, 12)
+        @test data[4,1] == 2.0
+        @test data[2,2] == "A"
+        @test data[2,3] == true
+        @test DataValues.isna(data[4,5])
+        @test data[2,9] == Date(2015, 3, 3)
+        @test data[3,9] == DateTime(2015, 2, 4, 10, 14)
+        @test data[4,9] == DateTime(1988, 4, 9, 0, 0)
+        @test data[5,9] == Time(15, 2, 0)
+        @test data[3,10] == DateTime(1950, 8, 9, 18, 40)
+        @test DataValues.isna(data[5,10])
+        @test isa(data[2,11], ExcelErrorCell)
+        @test isa(data[3,11], ExcelErrorCell)
+        @test isa(data[4,12], ExcelErrorCell)
+        @test DataValues.isna(data[5,12])
+
+    # Test readxlsheet function
+        @test_throws ErrorException readxlsheet(f, "Empty Sheet")
+        for sheetinfo = ["Second Sheet", 2]
+            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartrows = -1)
+            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartrows = :nonsense)
+
+            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartcols = -1)
+            @test_throws ErrorException readxlsheet(f, sheetinfo, skipstartcols = :nonsense)
+
+            @test_throws ErrorException readxlsheet(f, sheetinfo, nrows = -1)
+            @test_throws ErrorException readxlsheet(f, sheetinfo, nrows = :nonsense)
+
+            @test_throws ErrorException readxlsheet(f, sheetinfo, ncols = -1)
+            @test_throws ErrorException readxlsheet(f, sheetinfo, ncols = :nonsense)
+
+            data = readxlsheet(f, sheetinfo)
+            @test size(data) == (6, 6)
+            @test data[2,1] == 1.
+            @test data[5,2] == "CCC"
+            @test data[3,3] == false
+            @test data[6,6] == Time(15, 2, 00)
+            @test DataValues.isna(data[4,3])
+            @test DataValues.isna(data[4,6])
+
+            data = readxlsheet(f, sheetinfo, skipstartrows = :blanks, skipstartcols = :blanks)
+            @test size(data) == (6, 6)
+            @test data[2,1] == 1.
+            @test data[5,2] == "CCC"
+            @test data[3,3] == false
+            @test data[6,6] == Time(15, 2, 00)
+            @test DataValues.isna(data[4,3])
+            @test DataValues.isna(data[4,6])
+
+            data = readxlsheet(f, sheetinfo, skipstartrows = 0, skipstartcols = 0)
+            @test size(data) == (6 + 7, 6 + 3)
+            @test data[2 + 7,1 + 3] == 1.
+            @test data[5 + 7,2 + 3] == "CCC"
+            @test data[3 + 7,3 + 3] == false
+            @test data[6 + 7,6 + 3] == Time(15, 2, 00)
+            @test DataValues.isna(data[4 + 7,3 + 3])
+            @test DataValues.isna(data[4 + 7,6 + 3])
+
+            data = readxlsheet(f, sheetinfo, skipstartrows = 0, )
+            @test size(data) == (6 + 7, 6)
+            @test data[2 + 7,1] == 1.
+            @test data[5 + 7,2] == "CCC"
+            @test data[3 + 7,3] == false
+            @test data[6 + 7,6] == Time(15, 2, 00)
+            @test DataValues.isna(data[4 + 7,3])
+            @test DataValues.isna(data[4 + 7,6])
+
+            data = readxlsheet(f, sheetinfo, skipstartcols = 0)
+            @test size(data) == (6, 6 + 3)
+            @test data[2,1 + 3] == 1.
+            @test data[5,2 + 3] == "CCC"
+            @test data[3,3 + 3] == false
+            @test data[6,6 + 3] == Time(15, 2, 00)
+            @test DataValues.isna(data[4,3 + 3])
+            @test DataValues.isna(data[4,6 + 3])
+
+            data = readxlsheet(f, sheetinfo, skipstartrows = 1, skipstartcols = 1, nrows = 11, ncols = 7)
+            @test size(data) == (11, 7)
+            @test data[2 + 6,1 + 2] == 1.
+            @test data[5 + 6,2 + 2] == "CCC"
+            @test data[3 + 6,3 + 2] == false
+            @test_throws BoundsError data[6 + 6,6 + 2] == Time(15, 2, 00)
+            @test DataValues.isna(data[4 + 6,2 + 2])
+        end
+    end
+
+end