09 Aug 10:46

XprobeBot

4d2f61c

v0.1.3

What's new in 0.1.3 (2023-08-09)

These are the changes in inference v0.1.3.

Enhancements

ENH: accelerate 4-bit quantization for pytorch model by @pangyoki in #284
ENH: remove chatglmcpp from deps by @UranusSeven in #329
ENH: auto detect device in pytorch model by @pangyoki in #322
ENH: Include model revision by @RayJi01 in #320

Bug fixes

BUG: fix mps and cuda device detection for pytorch model by @pangyoki in #331
Bug: Fix grammar mistake in examples by @Bojun-Feng in #336
BUG: Fix log level on subprocess by @RayJi01 in #335

Documentation

DOC: fix doc warnings by @UranusSeven in #314
DOC: add ja_JP and update po files by @UranusSeven in #315
DOC: custom models by @UranusSeven in #325

Others

OTHER: add chinese podcast demo by @RayJi01 in #237

Full Changelog: v0.1.2...v0.1.3

Contributors

pangyoki, RayJi01, and 2 other contributors

Assets 2

04 Aug 10:36

XprobeBot

v0.1.2

98765f2

v0.1.2

What's new in 0.1.2 (2023-08-04)

These are the changes in inference v0.1.2.

New features

FEAT: custom model by @UranusSeven in #290

Enhancements

ENH: select q4_0 as default quantization method for ggmlv3 model in benchmark by @pangyoki in #293
ENH: disable gradio telemetry by @UranusSeven in #299

Bug fixes

BUG: llm_family.json encoding by @UranusSeven in #297
BUG: handle ChatGLM ggml specific case for RESTful API by @jiayini1119 in #309
BUG: handle Qwen update by @UranusSeven in #307

Others

DEMO: LangChain QA System with Xinference LLMs and Milvus Vector DB by @jiayini1119 in #304
Chore: update issue template by @UranusSeven in #300
Chore: remove codecov by @UranusSeven in #308

Full Changelog: v0.1.1...v0.1.2

Contributors

pangyoki, jiayini1119, and UranusSeven

Assets 2

03 Aug 15:47

XprobeBot

v0.1.1

b21d927

v0.1.1

What's new in 0.1.1 (2023-08-03)

These are the changes in inference v0.1.1.

New features

FEAT: add opt-125m pytorch model and add ut by @pangyoki in #263
FEAT: support falcon 40b pytorch model by @pangyoki in #278
FEAT: pytorch model embeddings by @jiayini1119 in #282
FEAT: support falcon-instruct 7b and 40b pytorch model by @jiayini1119 in #287
FEAT: support chatglm/chatglm2/chatglm2-32k pytorch model by @pangyoki in #283
FEAT: support qwen 7b by @UranusSeven in #294

Enhancements

ENH: Support Enviroment Variable by @RayJi01 in #285
REF: split supervisor and worker by @UranusSeven in #279

Bug fixes

BUG: fix import torch error even if user don't want to launch torch model by @pangyoki in #274
BUG: empty legacy model dir by @UranusSeven in #276

Tests

TST: add benchmark script by @pangyoki in #281

Documentation

DOC: Update README_ja_JP.md by @eltociear in #269
DOC: add docstring to client methods by @RayJi01 in #247

Full Changelog: v0.1.0...v0.1.1

Contributors

eltociear, pangyoki, and 3 other contributors

Assets 2

28 Jul 13:13

XprobeBot

v0.1.0

37ca23a

v0.1.0

What's new in 0.1.0 (2023-07-28)

These are the changes in inference v0.1.0.

New features

FEAT: support fp4 and int8 quantization for pytorch model by @pangyoki in #238
FEAT: support llama-2-chat-70b ggml by @UranusSeven in #257

Enhancements

ENH: skip 4-bit quantization for non-linux or non-cuda local deployment by @UranusSeven in #264
ENH: handle legacy cache by @UranusSeven in #266
REF: model family by @UranusSeven in #251

Bug fixes

BUG: fix restful stop parameters by @RayJi01 in #241
BUG: download integrity hot fix by @RayJi01 in #242
BUG: disable baichuan-chat and baichuan-base on macos by @pangyoki in #250
BUG: delete tqdm_class in snapshot_download by @pangyoki in #258
BUG: ChatGLM Parameter Switch by @Bojun-Feng in #262
BUG: refresh related fields when format changes by @UranusSeven in #265
BUG: Show downloading progress in gradio by @aresnow1 in #267
BUG: LLM json not included by @UranusSeven in #268

Tests

TST: Update ChatGLM Tests by @Bojun-Feng in #259

Documentation

DOC: Update installation part in readme by @aresnow1 in #253
DOC: update readme for pytorch model by @pangyoki in #207

Full Changelog: v0.0.6...v0.1.0

Contributors

pangyoki, RayJi01, and 3 other contributors

Assets 2

24 Jul 07:05

XprobeBot

v0.0.6

b753f98

v0.0.6

What's new in 0.0.6 (2023-07-24)

These are the changes in inference v0.0.6.

Enhancements

ENH: download integrity by @RayJi01 in #180

Bug fixes

BUG: baichuan-chat and baichuan-base don't support MacOS by @pangyoki in #202
BUG: fix pytorch model generate bug when stream is True by @pangyoki in #210
BUG: solve the problem that pytorch model still occupies memory after terminating the model by @pangyoki in #219
BUG: fix baichuan-chat configure by @pangyoki in #217
BUG: Update requirements of gradio by @aresnow1 in #216
BUG: chat stopwords by @UranusSeven in #222
BUG: disable vicuna pytorch model by @pangyoki in #225
BUG: Set default embedding to be True by @jiayini1119 in #236

Documentation

DOC: Add notes for metal GPU acceleration by @aresnow1 in #213
DOC: Add Japanese README by @eltociear in #228
DOC: Adding Examples to documentation by @RayJi01 in #196

New Contributors

@eltociear made their first contribution in #228

Full Changelog: v0.0.5...v0.0.6

Contributors

eltociear, pangyoki, and 4 other contributors

Assets 2

19 Jul 11:32

XprobeBot

v0.0.5

c9d42c6

v0.0.5

What's new in 0.0.5 (2023-07-19)

These are the changes in inference v0.0.5.

New features

FEAT: support pytorch models by @pangyoki in #157
FEAT: support vicuna-v1.3 33B by @Bojun-Feng in #192
FEAT: support baichuan-chat pytorch model by @pangyoki in #190
FEAT: pytorch model support MPS backend by @pangyoki in #198
FEAT: Embedding by @jiayini1119 in #194
FEAT: LLaMA-2 by @UranusSeven in #203

Enhancements

ENH: Implement RESTful API stream generate by @jiayini1119 in #171
ENH: set default device to mps on MacOS by @pangyoki in #205
ENH: Set default mlock to true and mmap to false by @RayJi01 in #206
ENH: add Gradio ChatInterface chatbot to example by @Bojun-Feng in #208

Bug fixes

BUG: fix pytorch int8 by @pangyoki in #197
BUG: RuntimeError when launching model using kwargs whose value is of type int by @jiayini1119 in #209
BUG: Fix some gradio issues by @aresnow1 in #200

Documentation

DOC: sphinx init by @UranusSeven in #189
DOC: chinese readme by @UranusSeven in #191

Full Changelog: v0.0.4...v0.0.5

Contributors

pangyoki, RayJi01, and 4 other contributors

Assets 2

14 Jul 12:15

XprobeBot

v0.0.4

80003e1

v0.0.4

What's new in 0.0.4 (2023-07-14)

These are the changes in inference v0.0.4.

New features

FEAT: implement chat and generate in RESTful client by @jiayini1119 in #161
FEAT: support wizard-v1.1 by @UranusSeven in #183

Bug fixes

BUG: fix example chat by @UranusSeven in #165

Documentation

DOC: add logo; make words more concise by @onesuper in #158

Others

OTHER: AI podcast example by @RayJi01 in #160

Full Changelog: v0.0.3...v0.0.4

Contributors

onesuper, RayJi01, and 2 other contributors

Assets 2

11 Jul 13:52

XprobeBot

v0.0.3

12ed8a3

v0.0.3

What's new in 0.0.3 (2023-07-11)

These are the changes in inference v0.0.3.

Bug fixes

BUG: Fix model selection on gradio page by @aresnow1 in #156

Documentation

DOC: Add explicit hint to join slack by @onesuper in #155

Full Changelog: v0.0.2...v0.0.3

Contributors

onesuper and aresnow1

Assets 2

11 Jul 12:02

XprobeBot

v0.0.2

d510964

v0.0.2

What's new in 0.0.2 (2023-07-11)

These are the changes in inference v0.0.2.

Enhancements

ENH: auto find available port for API by @jiayini1119 in #143
ENH: Disable httpx logs by @aresnow1 in #144
ENH: socket binding by @UranusSeven in #146
ENH: log when worker started by @UranusSeven in #147
ENH: Remove baichuan in gradio dropdown by @aresnow1 in #152
ENH: optimize error msg for foundation models by @UranusSeven in #153

Bug fixes

BUG: Include json files in MANIFEST.in by @aresnow1 in #139
BUG: chat example doesn't support llama by @UranusSeven in #140
BUG: Use utf-8 encoding when open json file by @aresnow1 in #151

Documentation

DOC: Add gif in readme by @aresnow1 in #135
DOC: Add the two subheadings "Local" and "Distributed." by @aresnow1 in #137

Others

Update README.md by @onesuper in #138

New Contributors

@onesuper made their first contribution in #138

Full Changelog: v0.0.1...v0.0.2

Contributors

onesuper, jiayini1119, and 2 other contributors

Assets 2

10 Jul 10:39

XprobeBot

v0.0.1

9427eed

v0.0.1

What's new in 0.0.1 (2023-07-10)

These are the changes in inference v0.0.1.

New features

FEAT: prototype by @UranusSeven in #3
FEAT: support wizardlm by @UranusSeven in #14
FEAT: baichuan by @UranusSeven in #16
FEAT: gradio prototype by @aresnow1 in #15
FEAT: stream generation by @UranusSeven in #17
FEAT: distributed framework by @UranusSeven in #25
FEAT: local deployment by @UranusSeven in #38
FEAT: custom system prompt by @UranusSeven in #35
FEAT: support orca by @UranusSeven in #51
FEAT: localization language support by @aresnow1 in #63
FEAT: Generate through cmdline by @RayJi01 in #70
FEAT: async client by @UranusSeven in #73
FEAT: RESTful API by @jiayini1119 in #40
FEAT: Support Command Line Operation for Chat functionality by @RayJi01 in #74
FEAT: Support chatglm-6b by @Bojun-Feng in #75
FEAT: add both versions of chatglm by @Bojun-Feng in #90
FEAT: slot based model allocation by @UranusSeven in #108

Enhancements

ENH: Streaming chat UI by @aresnow1 in #31
ENH: Add checkbox to show stop reason & window size of chat history by @aresnow1 in #44
ENH: disable stream by default by @UranusSeven in #68
ENH: Report worker status to supervisor periodically by @aresnow1 in #78
ENH: unify gradio and fastapi by @jiayini1119 in #88
ENH: Add download progress if model is not cached by @aresnow1 in #95
ENH: edit Llama parameters by @Bojun-Feng in #98
ENH: Support alpaca Chinses by @RayJi01 in #105
ENH: optimize xinference cmdline by @pangyoki in #103
ENH: Use thread to launch server by @aresnow1 in #104
ENH: Add meta file to check if model is downloaded by @aresnow1 in #107
ENH: basic exception handling for RESTful api by @UranusSeven in #111
ENH: client provides chat and gen interface by @UranusSeven in #117
ENH: logging for subprocess by @aresnow1 in #119
BLD: fix pre-commit by @UranusSeven in #2
BLD: Add workflow for uploading to PyPI by @aresnow1 in #92
REF: refactor model spec by @UranusSeven in #45
REF: change completion type for RESTful API by @UranusSeven in #56
REF: refactor chat history for restful api by @UranusSeven in #64
REF: pass model uid and spec to model by @UranusSeven in #85
REF: rename package by @UranusSeven in #89

Bug fixes

BUG: Missing dependencies by @jiayini1119 in #21
BUG: fix controller cmdline by @UranusSeven in #48
BUG: fix mypy by @UranusSeven in #67
BUG: RESTful api actor cannot exit by @UranusSeven in #83
BUG: too many clients by @Bojun-Feng in #87
BUG: fix chat_history type by @pangyoki in #106
BUG: Raise KeyError when get model which is not launched by @aresnow1 in #109
BUG: fix chatglm download url by @UranusSeven in #110
BUG: load chatglm by @UranusSeven in #112
BUG: worker timeout during downloading by @UranusSeven in #126
BUG: fix example by @UranusSeven in #130
BUG: remove chinese_alpaca model by @pangyoki in #128
BUG: Use sync client in gradio by @aresnow1 in #129
BUG: chatglm hangs by @UranusSeven in #118
BUG: add error handling when the endpoint port is not available by @jiayini1119 in #127
BUG: fix default host in cmdline by @pangyoki in #132

Tests

TST: lint by @UranusSeven in #55
TST: fix mypy by @UranusSeven in #57
TST: asyncio mode auto by @UranusSeven in #66
TST: CI by @UranusSeven in #71
TST: add chatglm tests by @Bojun-Feng in #97
TST: Add tests for RESTful API by @jiayini1119 in #134

Documentation

DOC: issue template by @UranusSeven in #76
DOC: readme by @UranusSeven in #121
DOC: roadmap by @UranusSeven in #131
DOC: license by @UranusSeven in #133

Others

Pass chat history when calling model.generate by @aresnow1 in #24
Rename some classes and files by @aresnow1 in #59
Fix stop reason by @aresnow1 in #60
add error message while worker timeout by @pangyoki in #125

New Contributors

@UranusSeven made their first contribution in #2
@aresnow1 made their first contribution in #15
@jiayini1119 made their first contribution in #21
@RayJi01 made their first contribution in #70
@Bojun-Feng made their first contribution in #75
@pangyoki made their first contribution in #103

Full Changelog: https://github.com/xorbitsai/inference/commits/v0.0.1

Contributors

pangyoki, RayJi01, and 4 other contributors

Assets 2

Releases: xorbitsai/inference

v0.1.3

What's new in 0.1.3 (2023-08-09)

Enhancements

Bug fixes

Documentation

Others

Contributors

v0.1.2

What's new in 0.1.2 (2023-08-04)

New features

Enhancements

Bug fixes

Others

Contributors

v0.1.1

What's new in 0.1.1 (2023-08-03)

New features

Enhancements

Bug fixes

Tests

Documentation

Contributors

v0.1.0

What's new in 0.1.0 (2023-07-28)

New features

Enhancements

Bug fixes

Tests

Documentation

Contributors

v0.0.6

What's new in 0.0.6 (2023-07-24)

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.0.5

What's new in 0.0.5 (2023-07-19)

New features

Enhancements

Bug fixes

Documentation

Contributors

v0.0.4

What's new in 0.0.4 (2023-07-14)

New features

Bug fixes

Documentation

Others

Contributors

v0.0.3

What's new in 0.0.3 (2023-07-11)

Bug fixes

Documentation

Contributors

v0.0.2

What's new in 0.0.2 (2023-07-11)

Enhancements

Bug fixes

Documentation

Others

New Contributors

Contributors

v0.0.1

What's new in 0.0.1 (2023-07-10)

New features

Enhancements

Bug fixes

Tests

Documentation

Others

New Contributors

Contributors