diff --git a/docs/source/python/flight.rst b/docs/source/python/flight.rst index f07b9511ccf68..b63d256547de0 100644 --- a/docs/source/python/flight.rst +++ b/docs/source/python/flight.rst @@ -17,6 +17,7 @@ .. currentmodule:: pyarrow.flight .. highlight:: python +.. _flight: ================ Arrow Flight RPC diff --git a/docs/source/python/install.rst b/docs/source/python/install.rst index 4b966e6d2653d..64cb92933102a 100644 --- a/docs/source/python/install.rst +++ b/docs/source/python/install.rst @@ -39,6 +39,13 @@ Install the latest version of PyArrow from conda install -c conda-forge pyarrow +.. note:: + + While the ``pyarrow`` `conda-forge `_ package is + the right choice for most users, both a minimal and maximal variant of the + package exist, either of which may be better for your use case. See + :ref:`python-conda-differences`. + Using Pip --------- @@ -93,3 +100,85 @@ a custom path to the database from Python: >>> import pyarrow as pa >>> pa.set_timezone_db_path("custom_path") + + +.. _python-conda-differences: + +Differences between conda-forge packages +---------------------------------------- + +On `conda-forge `_, PyArrow is published as three +separate packages, each providing varying levels of functionality. This is in +contrast to PyPi, where only a single PyArrow package is provided. + +The purpose of this split is to minimize the size of the installed package for +most users (``pyarrow``), provide a smaller, minimal package for specialized use +cases (``pyarrow-core``), while still providing a complete package for users who +require it (``pyarrow-all``). What was historically ``pyarrow`` on +`conda-forge `_ is now ``pyarrow-all``, though most +users can continue using ``pyarrow``. + +The ``pyarrow-core`` package includes the following functionality: + +- :ref:`data` +- :ref:`compute` (i.e., ``pyarrow.compute``) +- :ref:`io` +- :ref:`ipc` (i.e., ``pyarrow.ipc``) +- :ref:`filesystem` (i.e., ``pyarrow.fs``. Note: It's planned to move cloud fileystems (i.e., :ref:`S3`, :ref:`GCS`, etc) into ``pyarrow`` in a future release though :ref:`filesystem-localfs` will remain in ``pyarrow-core``.) +- File formats: :ref:`Arrow/Feather`, :ref:`JSON`, :ref:`CSV`, :ref:`ORC` (but not Parquet) + +The ``pyarrow`` package adds the following: + +- Acero (i.e., ``pyarrow.acero``) +- :ref:`dataset` (i.e., ``pyarrow.dataset``) +- :ref:`Parquet` (i.e., ``pyarrow.parquet``) +- Substrait (i.e., ``pyarrow.substrait``) + +Finally, ``pyarrow-all`` adds: + +- :ref:`flight` and Flight SQL (i.e., ``pyarrow.flight``) +- Gandiva (i.e., ``pyarrow.gandiva``) + +The following table lists the functionality provided by each package and may be +useful when deciding to use one package over another or when +:ref:`python-conda-custom-selection`. + ++------------+---------------------+--------------+---------+-------------+ +| Component | Package | pyarrow-core | pyarrow | pyarrow-all | ++------------+---------------------+--------------+---------+-------------+ +| Core | pyarrow-core | ✓ | ✓ | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Parquet | libparquet | | ✓ | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Dataset | libarrow-dataset | | ✓ | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Acero | libarrow-acero | | ✓ | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Substrait | libarrow-substrait | | ✓ | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Flight | libarrow-flight | | | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Flight SQL | libarrow-flight-sql | | | ✓ | ++------------+---------------------+--------------+---------+-------------+ +| Gandiva | libarrow-gandiva | | | ✓ | ++------------+---------------------+--------------+---------+-------------+ + +.. _python-conda-custom-selection: + +Creating A Custom Selection +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you know which components you need and want to control what's installed, you +can create a custom selection of packages to include only the extra features you +need. For example, to install ``pyarrow-core`` and add support for reading and +writing Parquet, install ``libparquet`` alongside ``pyarrow-core``: + +.. code-block:: shell + + conda install -c conda-forge pyarrow-core libparquet + +Or if you wish to use ``pyarrow`` but need support for Flight RPC: + +.. code-block:: shell + + conda install -c conda-forge pyarrow libarrow-flight