BiocPy · jkanche · Apr 17, 2025 · Apr 14, 2025 · Apr 17, 2025
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
 [![PyPI-Server](https://img.shields.io/pypi/v/compressed-lists.svg)](https://pypi.org/project/compressed-lists/)
-![Unit tests](https://github.com/BiocPy/compressed-lists/actions/workflows/pypi-test.yml/badge.svg)
+![Unit tests](https://github.com/BiocPy/compressed-lists/actions/workflows/run-tests.yml/badge.svg)
 
-# compressed-lists
+# CompressedList Implementation in Python
 
-> Add a short description here!
+A Python implementation of the `CompressedList` class from R/Bioconductor for memory-efficient list-like objects.
 
-A longer description of your project goes here...
+`CompressedList` is a memory-efficient container for list-like objects. Instead of storing each list element separately, it concatenates all elements into a single vector-like object and maintains information about where each original element begins and ends. This approach is significantly more memory-efficient than standard lists, especially when dealing with many list elements.
 
 ## Install
 
@@ -15,6 +15,54 @@ To get started, install the package from [PyPI](https://pypi.org/project/compres
 pip install compressed-lists
 ```
 
+## Usage
+
+
+```py
+from compressed_lists import CompressedIntegerList, CompressedStringList
+
+# Create a CompressedIntegerList
+int_data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
+names = ["A", "B", "C"]
+int_list = CompressedIntegerList.from_list(int_data, names)
+
+# Access elements
+print(int_list[0])      # [1, 2, 3]
+print(int_list["B"])    # [4, 5]
+print(int_list[1:3])    # Slice of elements
+
+# Apply a function to each element
+squared = int_list.lapply(lambda x: [i**2 for i in x])
+print(squared[0])       # [1, 4, 9]
+
+# Convert to a regular Python list
+regular_list = int_list.to_list()
+
+# Create a CompressedStringList
+char_data = [["apple", "banana"], ["cherry", "date", "elderberry"], ["fig"]]
+char_list = CompressedStringList.from_list(char_data)
+```
+
+### Partitioning
+
+The `Partitioning` class handles the information about where each element begins and ends in the concatenated data. It allows for efficient extraction of elements without storing each element separately.
+
+```python
+from compressed_lists import Partitioning
+
+# Create partitioning from end positions
+ends = [3, 5, 10]
+names = ["A", "B", "C"]
+part = Partitioning(ends, names)
+
+# Get partition range for an element
+start, end = part[1]  # Returns (3, 5)
+```
+
+> [!NOTE]
+>
+> Check out the [documentation](https://biocpy.github.io/compressed-lists) for extending CompressedLists to custom data types.
+
 <!-- biocsetup-notes -->
 
 ## Note

diff --git a/docs/conf.py b/docs/conf.py
@@ -299,6 +299,7 @@
     "scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
     "setuptools": ("https://setuptools.pypa.io/en/stable/", None),
     "pyscaffold": ("https://pyscaffold.org/en/stable", None),
+    "biocutils": ("https://biocpy.github.io/BiocUtils", None),
 }
 
 print(f"loading configurations for {project} {version} ...", file=sys.stderr)

diff --git a/docs/index.md b/docs/index.md
@@ -1,17 +1,16 @@
 # compressed-lists
 
-Add a short description here!
+A Python implementation of the `CompressedList` class from R/Bioconductor for memory-efficient list-like objects.
 
+`CompressedList` is a memory-efficient container for list-like objects. Instead of storing each list element separately, it concatenates all elements into a single vector-like object and maintains information about where each original element begins and ends. This approach is significantly more memory-efficient than standard lists, especially when dealing with many list elements.
 
-## Note
+## Install
 
-> This is the main page of your project's [Sphinx] documentation. It is
-> formatted in [Markdown]. Add additional pages by creating md-files in
-> `docs` or rst-files (formatted in [reStructuredText]) and adding links to
-> them in the `Contents` section below.
->
-> Please check [Sphinx] and [MyST] for more information
-> about how to document your project and how to configure your preferences.
+To get started, install the package from [PyPI](https://pypi.org/project/compressed-lists/)
+
+```bash
+pip install compressed-lists
+```
 
 
 ## Contents
@@ -20,6 +19,7 @@ Add a short description here!
 :maxdepth: 2
 
 Overview <readme>
+Tutorial <tutorial>
 Contributions & Help <contributing>
 License <license>
 Authors <authors>

diff --git a/docs/tutorial.md b/docs/tutorial.md
@@ -0,0 +1,189 @@
+---
+file_format: mystnb
+kernelspec:
+  name: python
+---
+
+# Basic Usage
+
+```{code-cell}
+from compressed_lists import CompressedIntegerList, CompressedStringList
+
+# Create a CompressedIntegerList
+int_data = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
+names = ["A", "B", "C"]
+int_list = CompressedIntegerList.from_list(int_data, names)
+
+# Access elements
+print(int_list[0])      # [1, 2, 3]
+print(int_list["B"])    # [4, 5]
+print(int_list[1:3])    # Slice of elements
+
+# Apply a function to each element
+squared = int_list.lapply(lambda x: [i**2 for i in x])
+print(squared[0])       # [1, 4, 9]
+
+# Convert to a regular Python list
+regular_list = int_list.to_list()
+
+# Create a CompressedStringList
+char_data = [["apple", "banana"], ["cherry", "date", "elderberry"], ["fig"]]
+char_list = CompressedStringList.from_list(char_data)
+```
+
+## Partitioning
+
+The `Partitioning` class handles the information about where each element begins and ends in the concatenated data. It allows for efficient extraction of elements without storing each element separately.
+
+```{code-cell}
+from compressed_lists import Partitioning
+
+# Create partitioning from end positions
+ends = [3, 5, 10]
+names = ["A", "B", "C"]
+part = Partitioning(ends, names)
+
+# Get partition range for an element
+start, end = part[1]
+print(start, end)
+```
+
+# Creating Custom CompressedList Subclasses
+
+`CompressedList` can be easily it can be extended to support custom data types. Here's a step-by-step guide to creating your own `CompressedList` subclass:
+
+## 1. Subclass CompressedList
+
+Create a new class that inherits from `CompressedList` with appropriate type annotations:
+
+```python
+from typing import List, TypeVar, Generic
+from compressed_lists import CompressedList, Partitioning
+import numpy as np
+
+class CustomCompressedList(CompressedList):
+    """A custom CompressedList for your data type."""
+    pass
+```
+
+## 2. Implement the Constructor
+
+The constructor should initialize the superclass with the appropriate data:
+
+```python
+def __init__(self, 
+             unlist_data: Any,  # Replace with your data type 
+             partitioning: Partitioning,
+             element_metadata: dict = None,
+             metadata: dict = None):
+    super().__init__(unlist_data, partitioning, 
+                    element_type="custom_type",  # Set your element type
+                    element_metadata=element_metadata,
+                    metadata=metadata)
+```
+
+## 3. Implement _extract_range Method
+
+This method defines how to extract a range of elements from your unlisted data:
+
+```python
+def _extract_range(self, start: int, end: int) -> List[T]:
+    """Extract a range from unlisted data."""
+    # For example, with numpy arrays:
+    return self.unlist_data[start:end].tolist()
+
+    # Or for other data types:
+    # return self.unlist_data[start:end]
+```
+
+## 4. Implement from_list Class Method
+
+This factory method creates a new instance from a list:
+
+```python
+@classmethod
+def from_list(cls, lst: List[List[T]], names: list = None, 
+             metadata: dict = None) -> 'CustomCompressedList':
+    """Create a new CustomCompressedList from a list."""
+    # Flatten the list
+    flat_data = []
+    for sublist in lst:
+        flat_data.extend(sublist)
+
+    # Create partitioning
+    partitioning = Partitioning.from_list(lst, names)
+
+    # Create unlisted data in your preferred format
+    # For example, with numpy:
+    unlist_data = np.array(flat_data, dtype=np.float64)
+
+    return cls(unlist_data, partitioning, metadata=metadata)
+```
+
+## Complete Example: CompressedFloatList
+
+Here's a complete example of a custom CompressedList for floating-point numbers:
+
+```{code-cell}
+import numpy as np
+from compressed_lists import CompressedList, Partitioning
+from typing import List
+
+class CompressedFloatList(CompressedList):
+    def __init__(self, 
+                unlist_data: np.ndarray, 
+                partitioning: Partitioning,
+                element_metadata: dict = None,
+                metadata: dict = None):
+        super().__init__(unlist_data, partitioning, 
+                        element_type="float",
+                        element_metadata=element_metadata,
+                        metadata=metadata)
+
+    def _extract_range(self, start: int, end: int) -> List[float]:
+        return self.unlist_data[start:end].tolist()
+
+    @classmethod
+    def from_list(cls, lst: List[List[float]], names: list = None, 
+                 metadata: dict = None) -> 'CompressedFloatList':
+        # Flatten the list
+        flat_data = []
+        for sublist in lst:
+            flat_data.extend(sublist)
+
+        # Create partitioning
+        partitioning = Partitioning.from_list(lst, names)
+
+        # Create unlist_data
+        unlist_data = np.array(flat_data, dtype=np.float64)
+
+        return cls(unlist_data, partitioning, metadata=metadata)
+
+# Usage
+float_data = [[1.1, 2.2, 3.3], [4.4, 5.5], [6.6, 7.7, 8.8, 9.9]]
+float_list = CompressedFloatList.from_list(float_data, names=["X", "Y", "Z"])
+print(float_list["Y"])
+```
+
+## For More Complex Data Types
+
+For more complex data types, you would follow the same pattern but customize the storage and extraction methods to suit your data.
+
+For example, with a custom object:
+
+```python
+class MyObject:
+    def __init__(self, value):
+        self.value = value
+
+class CompressedMyObjectList(CompressedList[List[MyObject]]):
+    # Implementation details...
+
+    def _extract_range(self, start: int, end: int) -> List[MyObject]:
+        return self.unlist_data[start:end]
+
+    @classmethod
+    def from_list(cls, lst: List[List[MyObject]], ...):
+        # Custom flattening and storage logic
+        # ...
+```
diff --git a/pyproject.toml b/pyproject.toml
@@ -12,14 +12,14 @@ version_scheme = "no-guess-dev"
 line-length = 120
 src = ["src"]
 exclude = ["tests"]
-extend-ignore = ["F821"]
+lint.extend-ignore = ["F821"]
 
-[tool.ruff.pydocstyle]
+[tool.ruff.lint.pydocstyle]
 convention = "google"
 
 [tool.ruff.format]
 docstring-code-format = true
 docstring-code-line-length = 20
 
-[tool.ruff.per-file-ignores]
+[tool.ruff.lint.per-file-ignores]
 "__init__.py" = ["E402", "F401"]
diff --git a/setup.cfg b/setup.cfg
@@ -12,10 +12,10 @@ license = MIT
 license_files = LICENSE.txt
 long_description = file: README.md
 long_description_content_type = text/markdown; charset=UTF-8; variant=GFM
-url = https://github.com/pyscaffold/pyscaffold/
+url = https://github.com/biocpy/compressed-lists
 # Add here related links, for example:
 project_urls =
-    Documentation = https://pyscaffold.org/
+    Documentation = https://github.com/biocpy/compressed-lists
 #    Source = https://github.com/pyscaffold/pyscaffold/
 #    Changelog = https://pyscaffold.org/en/latest/changelog.html
 #    Tracker = https://github.com/pyscaffold/pyscaffold/issues
@@ -41,14 +41,15 @@ package_dir =
     =src
 
 # Require a min/specific Python version (comma-separated conditions)
-# python_requires = >=3.8
+python_requires = >=3.9
 
 # Add here dependencies of your project (line-separated), e.g. requests>=2.2,<3.0.
 # Version specifiers like >=2.2,<3.0 avoid problems due to API changes in
 # new major versions. This works if the required packages follow Semantic Versioning.
 # For more information, check out https://semver.org/.
 install_requires =
     importlib-metadata; python_version<"3.8"
+    biocutils
 
 
 [options.packages.find]