Skip to content

Commit 11d8022

Browse files
author
Rajat Arya
committed
XET_USER -> XET_USERNAME
- removed mount docs - added ML example to README - cleaned up index.rst
1 parent 8032722 commit 11d8022

File tree

4 files changed

+80
-39
lines changed

4 files changed

+80
-39
lines changed

README.md

+56-13
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,7 @@ pyxet is a Python library that provides a lightweight interface for the [XetHub]
3535
* [glob](https://docs.python.org/3/library/glob.html)
3636
* [pathlib.Path](https://docs.python.org/3/library/pathlib.html)(WIP)
3737

38-
2. Mount:
39-
* Read-only optimize for speed; perfect for data exploration and analysis and building data-apps and model
40-
inference.
41-
* Read-write for data ingestion and preparation; optimal for database backups and training and monitoring logs. _(coming soon)_
42-
43-
3. Integrations:
38+
2. Integrations:
4439
- [x] [GitHub](https://github.com) [submodule](https://git-scm.com/book/en/v2/Git-Tools-Submodules)
4540
- [x] [pandas](https://pandas.pydata.org)
4641
- [x] [polars](https://pola-rs.github.io/polars-book/)
@@ -49,7 +44,6 @@ pyxet is a Python library that provides a lightweight interface for the [XetHub]
4944
- [ ] [dask](https://dask.org/)
5045
- [ ] [ray](https://ray.io/)
5146

52-
5347
## Documentation
5448
For API documentation and full examples, please see [here](https://pyxet.readthedocs.io/en/latest/).
5549

@@ -82,35 +76,84 @@ git config --global user.email "[email protected]"
8276
To verify that pyxet is working, let's load a CSV file directly into a Pandas dataframe, leveraging pyxet's support for Python fsspec.
8377

8478
```python
79+
# assumes you have already done pip install pandas
8580
import pandas as pd
8681
import pyxet
8782

8883
df = pd.read_csv('xet://xdssio/titanic/main/titanic.csv')
8984
df
9085
```
9186

87+
should return something like:
88+
89+
```
90+
Out[3]:
91+
PassengerId Survived Pclass ... Fare Cabin Embarked
92+
0 1 0 3 ... 7.2500 NaN S
93+
1 2 1 1 ... 71.2833 C85 C
94+
2 3 1 3 ... 7.9250 NaN S
95+
3 4 1 1 ... 53.1000 C123 S
96+
4 5 0 3 ... 8.0500 NaN S
97+
.. ... ... ... ... ... ... ...
98+
886 887 0 2 ... 13.0000 NaN S
99+
887 888 1 1 ... 30.0000 B42 S
100+
888 889 0 3 ... 23.4500 NaN S
101+
889 890 1 1 ... 30.0000 C148 C
102+
890 891 0 3 ... 7.7500 NaN Q
103+
104+
[891 rows x 12 columns]
105+
```
106+
92107
### Next Steps - Working with private repos (How to set pyxet credentials)
93108
To start working with private repositories, you need to set up credentials for pyxet. The steps to do this are as follows:
94109

95110
1. Sign up for [XetHub](https://xethub.com/user/sign_up)
96111
2. Install [git-xet client](https://xethub.com/explore/install)
97112
3. Create a [Personal Access Token](https://xethub.com/explore/install). Click on 'CREATE TOKEN' button.
98113
4. Copy & Execute Login command, it should look like: `git xet login -u rajatarya -e [email protected] -p **********`
99-
5. To make these credentials available to pyxet, set the -u param (rajatarya above) and the -p param as XET_USER and XET_TOKEN environment variables. Also, for your python session, `pyxet.login()` will set the environment variables for you.
114+
5. To make these credentials available to pyxet, set the -u param (rajatarya above) and the -p param as XET_USERNAME and XET_TOKEN environment variables. Also, for your python session, `pyxet.login()` will set the environment variables for you.
100115

101116
```sh
102117
# Note: set this environment variable into your shell config (ex. .zshrc) so not lost.
103-
export XET_USER=<YOUR XETHUB USERNAME>
118+
export XET_USERNAME=<YOUR XETHUB USERNAME>
104119
export XET_TOKEN=<YOUR PERSONAL ACCESS TOKEN PASSWORD>
105120
```
106121

122+
### ML Demo
123+
124+
A slightly more complete demo doing some basic ML is as simple as setting up your virtualenv with:
125+
126+
```sh
127+
pip install scikit-learn ipython pandas
128+
```
107129
```python
108-
import pandas as pd
109130
import pyxet
110131

111-
df = pd.read_csv("xet://xdssio/titanic/main/titanic.csv") # All files on the platform are available with permissions
112-
# or
113-
pyxet.copy("xet://xdssio/titanic/main/titanic.csv", 'titanic.csv')
132+
import pandas as pd
133+
from sklearn.model_selection import train_test_split
134+
from sklearn.ensemble import RandomForestClassifier
135+
from sklearn.metrics import classification_report
136+
137+
# make sure to set your XET_USERNAME and XET_TOKEN environment variables, or run:
138+
# pyxet.login('username', 'token')
139+
140+
df = pd.read_csv("xet://xdssio/titanic.git/main/titanic.csv") # read data from XetHub
141+
target_names, features, target = ['die', 'survive'], ["Pclass", "SibSp", "Parch"], "Survived"
142+
143+
test_size, random_state = 0.2, 42
144+
train, test = train_test_split(df, test_size=test_size, random_state=random_state)
145+
model = RandomForestClassifier().fit(train[features], train[target])
146+
predictions = model.predict(test[features])
147+
print(classification_report(test[target], predictions, target_names=target_names))
148+
149+
# Any parameters we want to save
150+
info = classification_report(test[target], predictions,
151+
target_names=target_names,
152+
output_dict=True)
153+
info["test_size"] = test_size
154+
info["random_state"] = random_state
155+
info['features'] = features
156+
info['target'] = target
114157
```
115158

116159
## Contributing & Getting Help

docs/index.rst

+9-18
Original file line numberDiff line numberDiff line change
@@ -4,38 +4,30 @@ Welcome to pyxet's documentation!
44
=================================
55

66
pyxet is a Python library that provides a lightweight interface for the `XetHub <https://xethub.com/>`_ platform.
7-
XetHub is a blob-store with a file-system like interface and git capabilities, therefore pyxet implement both a CLI for both a file-system and git needs.
87

98
Installation
109
~~~~~~~~~~~~
11-
``pip install pyxet``
10+
Assuming you are on a supported OS (MacOS or Linux) and are using a supported version of Python (3.7+), set up your virtualenv with:
11+
12+
``python -m venv .venv``
13+
14+
``. .venv/bin/activate``
1215

16+
Then, install pyxet with:
17+
18+
``pip install pyxet``
1319

1420
Features
1521
~~~~~~~~
1622
1. A file-system like interface:
1723
- [x] `fsspec <https://filesystem-spec.readthedocs.io>`_
1824
- [x] `pathlib.Path <https://docs.python.org/3/library/pathlib.html>`_
1925
- [ ] `glob <https://docs.python.org/3/library/glob.html>`_
20-
2. Mount:
21-
- [x] read-only for data exploration and analysis
22-
- [ ] read-write for data ingestion and preparation - optimal for database backups and logs _(coming soon)_
23-
3. Integrations:
26+
2. Integrations:
2427
- [x] `pandas <https://pandas.pydata.org>`_
2528
- [x] `polars <https://pola-rs.github.io/polars-book/>`_
2629
- [x] `pyarrow <https://arrow.apache.org/docs/python/>`_
2730
- [ ] `duckdb <https://duckdb.org>`_
28-
4. Extra features like login, copy, move, delete, rename, etc.
29-
5. Git capabilities:
30-
* add, commit, push
31-
* clone, fork
32-
* merge, rebase
33-
* pull, fetch
34-
* checkout, reset
35-
* stash, diff, log
36-
* status, branch
37-
* submodules
38-
* etc.
3931

4032
.. toctree::
4133
:maxdepth: 2
@@ -48,7 +40,6 @@ Features
4840
:caption: API Reference:
4941

5042
markdowns/filesystem
51-
markdowns/mount
5243
markdowns/integrations
5344

5445
.. toctree::

docs/markdowns/integrations.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[XetHub](https://xethub.com) is aimed to simplify every part of the data science workflow.
44
Getting data in and out of XetHub is no exception.
55

6-
* You can setup `XET_USER` and `XET_TOKEN` as an environment variable to avoid passing it to every function.
6+
* You can setup `XET_USERNAME` and `XET_TOKEN` as an environment variable to avoid passing it to every function.
77

88
## Python Packages
99

docs/markdowns/quickstart.md

+14-7
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,19 @@ This library allows you to access XetHub from Python.
66

77
## Installation
88

9-
1. [Create an account or sign in](https://xethub.com)
10-
2. Get a personal access token [here](https://xethub.com/user/settings/pat) and set it `XET_USER`, `XET_TOKEN` environment
11-
variables.
12-
3. Install the library
9+
Assuming you are on a supported OS (MacOS or Linux) and are using a supported version of Python (3.7+), set up your virtualenv with:
1310

14-
`pip install pyxet`
11+
```sh
12+
$ python -m venv .venv
13+
...
14+
$ . .venv/bin/activate
15+
```
16+
17+
Then, install pyxet with:
18+
19+
```sh
20+
$ pip install pyxet
21+
```
1522

1623
## Usage
1724

@@ -25,7 +32,7 @@ from sklearn.model_selection import train_test_split
2532
from sklearn.ensemble import RandomForestClassifier
2633
from sklearn.metrics import classification_report
2734

28-
# make sure to set your XET_USER and XET_TOKEN environment variables, or run:
35+
# make sure to set your XET_USERNAME and XET_TOKEN environment variables, or run:
2936
# pyxet.login('username', 'token')
3037

3138
df = pd.read_csv("xet://xdssio/titanic.git/main/titanic.csv") # read data from XetHub
@@ -51,7 +58,7 @@ What do we care about? We care about the model, the data, the metrics and the co
5158

5259
### Setup
5360

54-
Let's [create a new repo](https://xethub.com/xet/create) in the UI or programmatically:
61+
Let's [create a new repo](https://xethub.com/xet/create) in the UI.
5562

5663
We clone the repo to our local filesystem, saving everything we want, and committing it:
5764

0 commit comments

Comments
 (0)