Skip to content

Commit f070167

Browse files
authored
VER: Release 0.7.0
See release notes.
2 parents e88ba8f + fbfa989 commit f070167

File tree

15 files changed

+197
-87
lines changed

15 files changed

+197
-87
lines changed

.github/workflows/release.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,7 @@ jobs:
3333
- name: Install dependencies
3434
run: |
3535
python -m pip install --upgrade pip setuptools wheel
36-
pip install -r requirements.txt
37-
pip install -r requirements_dev.txt
36+
scripts/build.sh
3837
3938
# Tag the commit with the library version
4039
- name: Create git tag

.github/workflows/test.yml

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,11 @@ jobs:
2525
with:
2626
python-version: ${{ matrix.python-version }}
2727

28-
# Install dependencies
29-
- name: Install dependencies
28+
- name: Install
3029
run: |
3130
python -m pip install --upgrade pip setuptools wheel
32-
pip install -r requirements.txt
33-
pip install -r requirements_dev.txt
34-
35-
# Test pip installation
36-
- name: Test pip installation
37-
run: pip install .
31+
scripts/build.sh
3832
3933
# Run tests
4034
- name: Run tests
41-
run: pytest tests .
35+
run: scripts/test.sh

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
# Changelog
22

3+
## 0.7.0 - 2023-01-10
4+
- Added support for `definition` schema
5+
- Updated `Flags` enum
6+
- Upgraded `dbz-python` to `0.2.1`
7+
- Upgraded `zstandard` to `0.19.0`
8+
39
## 0.6.0 - 2022-12-02
410
- Added `metadata.get_dataset_condition` method to `Historical` client
511
- Upgraded `dbz-python` to `0.2.0`

README.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,13 @@ The official Python client library for [Databento](https://databento.com).
1010

1111
Key features include:
1212
- Fast, lightweight access to both live and historical data from [multiple markets](https://docs.databento.com/knowledge-base/new-users/venues-and-publishers?historical=python&live=python).
13-
- [Multiple schemas](https://docs.databento.com/knowledge-base/new-users/list-of-supported-market-data-schemas?historical=python&live=python) such as MBO, MBP, top of book, OHLCV, last sale, and more.
13+
- [Multiple schemas](https://docs.databento.com/knowledge-base/new-users/market-data-schemas?historical=python&live=python) such as MBO, MBP, top of book, OHLCV, last sale, and more.
1414
- [Fully normalized](https://docs.databento.com/knowledge-base/new-users/normalization?historical=python&live=python), i.e. identical message schemas for both live and historical data, across multiple asset classes.
1515
- Provides mappings between different symbology systems, including [smart symbology](https://docs.databento.com/reference-historical/basics/symbology?historical=python&live=python) for futures rollovers.
1616
- [Point-in-time]() instrument definitions, free of look-ahead bias and retroactive adjustments.
1717
- Reads and stores market data in an extremely efficient file format using [Databento Binary Encoding](https://docs.databento.com/knowledge-base/new-users/dbz-format?historical=python&live=python).
1818
- Event-driven [market replay](https://docs.databento.com/reference-historical/helpers/bento-replay?historical=python&live=python), including at high-frequency order book granularity.
19-
- Support for [batch download](https://docs.databento.com/knowledge-base/new-users/historical-data-streaming-vs-batch-download?historical=python&live=python) of flat files.
19+
- Support for [batch download](https://docs.databento.com/knowledge-base/new-users/stream-vs-batch?historical=python&live=python) of flat files.
2020
- Support for [pandas](https://pandas.pydata.org/docs/), CSV, and JSON.
2121

2222
## Documentation
@@ -31,11 +31,11 @@ The library is fully compatible with the latest distribution of Anaconda 3.7 and
3131
The minimum dependencies as found in the `requirements.txt` are also listed below:
3232
- Python (>=3.7)
3333
- aiohttp (>=3.7.2)
34-
- dbz-python (>=0.2.0)
34+
- dbz-python (>=0.2.1)
3535
- numpy (>=1.17.0)
3636
- pandas (>=1.1.3)
3737
- requests (>=2.24.0)
38-
- zstandard (>=0.18.0)
38+
- zstandard (>=0.19.0)
3939

4040
## Installation
4141
To install the latest stable version of the package from PyPI:
@@ -56,6 +56,8 @@ import databento as db
5656
client = db.Historical('YOUR_API_KEY')
5757
data = client.timeseries.stream(
5858
dataset='GLBX.MDP3',
59+
symbols='ES.FUT',
60+
stype_in='smart',
5961
start='2022-06-10T14:30',
6062
end='2022-06-10T14:40',
6163
)
@@ -75,14 +77,17 @@ array = data.to_ndarray() # to ndarray
7577
```
7678

7779
Note that the API key was also passed as a parameter, which is
78-
[not recommended for production applications](https://docs.databento.com/knowledge-base/kb-new-users/kb-new-security-managing-api-keys?historical=python&live=python).
80+
[not recommended for production applications](https://docs.databento.com/knowledge-base/new-users/security-managing-api-keys?historical=python&live=python).
7981
Instead, you can leave out this parameter to pass your API key via the `DATABENTO_API_KEY` environment variable:
8082

8183
```python
8284
import databento as db
8385

84-
client = db.Historical('YOUR_API_KEY') # pass as parameter
85-
client = db.Historical() # pass as `DATABENTO_API_KEY` environment variable
86+
# Pass as parameter
87+
client = db.Historical('YOUR_API_KEY')
88+
89+
# Or, pass as `DATABENTO_API_KEY` environment variable
90+
client = db.Historical()
8691
```
8792

8893
## License

databento/common/bento.py

Lines changed: 70 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,14 @@
66
import numpy as np
77
import pandas as pd
88
import zstandard
9-
from databento.common.data import COLUMNS, DERIV_SCHEMAS, STRUCT_MAP
9+
from databento.common.data import (
10+
COLUMNS,
11+
DEFINITION_CHARARRAY_COLUMNS,
12+
DEFINITION_PRICE_COLUMNS,
13+
DEFINITION_TYPE_MAX_MAP,
14+
DERIV_SCHEMAS,
15+
STRUCT_MAP,
16+
)
1017
from databento.common.enums import Compression, Encoding, Schema, SType
1118
from databento.common.logging import log_debug
1219
from databento.common.metadata import MetadataDecoder
@@ -52,6 +59,8 @@ def _get_index_column(self) -> str:
5259
Schema.OHLCV_1M,
5360
Schema.OHLCV_1H,
5461
Schema.OHLCV_1D,
62+
Schema.GATEWAY_ERROR,
63+
Schema.SYMBOL_MAPPING,
5564
)
5665
else "ts_recv"
5766
)
@@ -435,43 +444,75 @@ def to_df(
435444
"""
436445
df = pd.DataFrame(self.to_ndarray())
437446
df.set_index(self._get_index_column(), inplace=True)
447+
df = self._cleanup_dataframe(df)
438448

439-
# Cleanup dataframe
449+
if pretty_ts:
450+
df = self._apply_pretty_ts(df)
451+
452+
if pretty_px:
453+
df = self._apply_pretty_px(df)
454+
455+
if map_symbols and self.schema != Schema.DEFINITION:
456+
df = self._map_symbols(df, pretty_ts)
457+
458+
return df
459+
460+
def _cleanup_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
440461
df.drop(["length", "rtype"], axis=1, inplace=True)
441462
if self.schema == Schema.MBO or self.schema in DERIV_SCHEMAS:
442463
df = df.reindex(columns=COLUMNS[self.schema])
443464
df["flags"] = df["flags"] & 0xFF # Apply bitmask
444465
df["side"] = df["side"].str.decode("utf-8")
445466
df["action"] = df["action"].str.decode("utf-8")
467+
elif self.schema == Schema.DEFINITION:
468+
for column in DEFINITION_CHARARRAY_COLUMNS:
469+
df[column] = df[column].str.decode("utf-8")
470+
for column, type_max in DEFINITION_TYPE_MAX_MAP.items():
471+
if column in df.columns:
472+
df[column] = df[column].where(df[column] != type_max, np.nan)
446473

447-
if pretty_ts:
448-
df.index = pd.to_datetime(df.index, utc=True)
449-
for column in df.columns:
450-
if column.startswith("ts_") and "delta" not in column:
451-
df[column] = pd.to_datetime(df[column], utc=True)
474+
return df
452475

453-
if pretty_px:
454-
for column in list(df.columns):
455-
if (
456-
column in ("price", "open", "high", "low", "close")
457-
or column.startswith("bid_px") # MBP
458-
or column.startswith("ask_px") # MBP
459-
):
460-
df[column] = df[column] * 1e-9
461-
462-
if map_symbols:
463-
# Build product ID index
464-
if not self._product_id_index:
465-
self._product_id_index = self._build_product_id_index()
466-
467-
# Map product IDs to native symbols
468-
if self._product_id_index:
469-
df_index = df.index if pretty_ts else pd.to_datetime(df.index, utc=True)
470-
dates = [ts.date() for ts in df_index]
471-
df["symbol"] = [
472-
self._product_id_index[dates[i]][p]
473-
for i, p in enumerate(df["product_id"])
474-
]
476+
def _apply_pretty_ts(self, df: pd.DataFrame) -> pd.DataFrame:
477+
df.index = pd.to_datetime(df.index, utc=True)
478+
for column in df.columns:
479+
if column.startswith("ts_") and "delta" not in column:
480+
df[column] = pd.to_datetime(df[column], utc=True)
481+
482+
if self.schema == Schema.DEFINITION:
483+
df["expiration"] = pd.to_datetime(df["expiration"], utc=True)
484+
df["activation"] = pd.to_datetime(df["activation"], utc=True)
485+
486+
return df
487+
488+
def _apply_pretty_px(self, df: pd.DataFrame) -> pd.DataFrame:
489+
for column in list(df.columns):
490+
if (
491+
column in ("price", "open", "high", "low", "close")
492+
or column.startswith("bid_px") # MBP
493+
or column.startswith("ask_px") # MBP
494+
):
495+
df[column] = df[column] * 1e-9
496+
497+
if self.schema == Schema.DEFINITION:
498+
for column in DEFINITION_PRICE_COLUMNS:
499+
df[column] = df[column] * 1e-9
500+
501+
return df
502+
503+
def _map_symbols(self, df: pd.DataFrame, pretty_ts: bool) -> pd.DataFrame:
504+
# Build product ID index
505+
if not self._product_id_index:
506+
self._product_id_index = self._build_product_id_index()
507+
508+
# Map product IDs to native symbols
509+
if self._product_id_index:
510+
df_index = df.index if pretty_ts else pd.to_datetime(df.index, utc=True)
511+
dates = [ts.date() for ts in df_index]
512+
df["symbol"] = [
513+
self._product_id_index[dates[i]][p]
514+
for i, p in enumerate(df["product_id"])
515+
]
475516

476517
return df
477518

databento/common/data.py

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -140,21 +140,21 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
140140
("related_security_id", np.uint32),
141141
("trading_reference_date", np.uint16),
142142
("appl_id", np.int16),
143-
("maturity_month_year", np.uint16),
143+
("maturity_year", np.uint16),
144144
("decay_start_date", np.uint16),
145-
("chan", np.uint16),
146-
("currency", "S1"), # 1 byte chararray
147-
("settl_currency", "S1"), # 1 byte chararray
148-
("secsubtype", "S1"), # 1 byte chararray
149-
("symbol", "S1"), # 1 byte chararray
150-
("group", "S1"), # 1 byte chararray
151-
("exchange", "S1"), # 1 byte chararray
152-
("asset", "S1"), # 1 byte chararray
153-
("cfi", "S1"), # 1 byte chararray
154-
("security_type", "S1"), # 1 byte chararray
155-
("unit_of_measure", "S1"), # 1 byte chararray
156-
("underlying", "S1"), # 1 byte chararray
157-
("related", "S1"), # 1 byte chararray
145+
("channel_id", np.uint16),
146+
("currency", "S4"), # 4 byte chararray
147+
("settl_currency", "S4"), # 4 byte chararray
148+
("secsubtype", "S6"), # 6 byte chararray
149+
("symbol", "S22"), # 22 byte chararray
150+
("group", "S21"), # 21 byte chararray
151+
("exchange", "S5"), # 5 byte chararray
152+
("asset", "S7"), # 7 byte chararray
153+
("cfi", "S7"), # 7 byte chararray
154+
("security_type", "S7"), # 7 byte chararray
155+
("unit_of_measure", "S31"), # 31 byte chararray
156+
("underlying", "S21"), # 21 byte chararray
157+
("related", "S21"), # 21 byte chararray
158158
("match_algorithm", "S1"), # 1 byte chararray
159159
("md_security_trading_status", np.uint8),
160160
("main_fraction", np.uint8),
@@ -163,17 +163,62 @@ def get_deriv_ba_types(level: int) -> List[Tuple[str, Union[type, str]]]:
163163
("sub_fraction", np.uint8),
164164
("underlying_product", np.uint8),
165165
("security_update_action", "S1"), # 1 byte chararray
166-
("maturity_month_month", np.uint8),
167-
("maturity_month_day", np.uint8),
168-
("maturity_month_week", np.uint8),
166+
("maturity_month", np.uint8),
167+
("maturity_day", np.uint8),
168+
("maturity_week", np.uint8),
169169
("user_defined_instrument", "S1"), # 1 byte chararray
170170
("contract_multiplier_unit", np.int8),
171171
("flow_schedule_type", np.int8),
172172
("tick_rule", np.uint8),
173-
("dummy", "S1"), # 1 byte chararray
173+
("dummy", "S3"), # 3 byte chararray (Adjustment filler for 8-bytes alignment)
174+
],
175+
Schema.GATEWAY_ERROR: RECORD_HEADER
176+
+ [
177+
("error", "S64"),
178+
],
179+
Schema.SYMBOL_MAPPING: RECORD_HEADER
180+
+ [
181+
("stype_in_symbol", "S22"),
182+
("stype_out_symbol", "S22"),
183+
("dummy", "S4"),
184+
("start_ts", np.uint64),
185+
("end_ts", np.uint64),
174186
],
175187
}
176188

189+
DEFINITION_CHARARRAY_COLUMNS = [
190+
"currency",
191+
"settl_currency",
192+
"secsubtype",
193+
"symbol",
194+
"group",
195+
"exchange",
196+
"asset",
197+
"cfi",
198+
"security_type",
199+
"unit_of_measure",
200+
"underlying",
201+
"related",
202+
"match_algorithm",
203+
"security_update_action",
204+
"user_defined_instrument",
205+
]
206+
207+
DEFINITION_PRICE_COLUMNS = [
208+
"min_price_increment",
209+
"display_factor",
210+
"high_limit_price",
211+
"low_limit_price",
212+
"max_price_variation",
213+
"trading_reference_price",
214+
"min_price_increment_amount",
215+
]
216+
217+
DEFINITION_TYPE_MAX_MAP = {
218+
x[0]: np.iinfo(x[1]).max
219+
for x in STRUCT_MAP[Schema.DEFINITION]
220+
if not isinstance(x[1], str)
221+
}
177222

178223
################################################################################
179224
# DBZ fields

databento/common/enums.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ class Schema(Enum):
5050
DEFINITION = "definition"
5151
STATISTICS = "statistics"
5252
STATUS = "status"
53+
GATEWAY_ERROR = "gateway_error"
54+
SYMBOL_MAPPING = "symbol_mapping"
5355

5456

5557
@unique
@@ -134,11 +136,11 @@ class SymbologyResolution(Enum):
134136
class Flags(Enum):
135137
"""Represents record flags."""
136138

137-
F_LAST = 1 << 7 # 128 Last msg in packet (flags < 0)
138-
F_HALT = 1 << 6 # 64 Exchange-independent HALT signal
139-
F_RESET = 1 << 5 # 32 Drop book, reset symbol for this exchange
140-
F_DUPID = 1 << 4 # 16 This OrderID has valid fresh duplicate (Iceberg, etc)
141-
F_MBP = 1 << 3 # 8 This is SIP/MBP ADD message, single per price level
142-
F_RESERVED2 = 1 << 2 # 4 Reserved for future use
143-
F_RESERVED1 = 1 << 1 # 2 Reserved for future use
144-
F_RESERVED0 = 1 # Reserved for future use
139+
# Last message in the packet from the venue for a given `product_id`
140+
F_LAST = 1 << 7
141+
# Message sourced from a replay, such as a snapshot server
142+
F_SNAPSHOT = 1 << 5
143+
# Aggregated price level message, not an individual order
144+
F_MBP = 1 << 4
145+
# The `ts_recv` value is inaccurate (clock issues or reordering)
146+
F_BAD_TS_RECV = 1 << 3

0 commit comments

Comments
 (0)