Releases: xorbitsai/inference
Releases · xorbitsai/inference
v0.1.3
What's new in 0.1.3 (2023-08-09)
These are the changes in inference v0.1.3.
Enhancements
- ENH: accelerate 4-bit quantization for pytorch model by @pangyoki in #284
- ENH: remove chatglmcpp from deps by @UranusSeven in #329
- ENH: auto detect device in pytorch model by @pangyoki in #322
- ENH: Include model revision by @RayJi01 in #320
Bug fixes
- BUG: fix mps and cuda device detection for pytorch model by @pangyoki in #331
- Bug: Fix grammar mistake in examples by @Bojun-Feng in #336
- BUG: Fix log level on subprocess by @RayJi01 in #335
Documentation
- DOC: fix doc warnings by @UranusSeven in #314
- DOC: add ja_JP and update po files by @UranusSeven in #315
- DOC: custom models by @UranusSeven in #325
Others
Full Changelog: v0.1.2...v0.1.3
v0.1.2
What's new in 0.1.2 (2023-08-04)
These are the changes in inference v0.1.2.
New features
- FEAT: custom model by @UranusSeven in #290
Enhancements
- ENH: select q4_0 as default quantization method for ggmlv3 model in benchmark by @pangyoki in #293
- ENH: disable gradio telemetry by @UranusSeven in #299
Bug fixes
- BUG: llm_family.json encoding by @UranusSeven in #297
- BUG: handle ChatGLM ggml specific case for RESTful API by @jiayini1119 in #309
- BUG: handle Qwen update by @UranusSeven in #307
Others
- DEMO: LangChain QA System with Xinference LLMs and Milvus Vector DB by @jiayini1119 in #304
- Chore: update issue template by @UranusSeven in #300
- Chore: remove codecov by @UranusSeven in #308
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's new in 0.1.1 (2023-08-03)
These are the changes in inference v0.1.1.
New features
- FEAT: add opt-125m pytorch model and add ut by @pangyoki in #263
- FEAT: support falcon 40b pytorch model by @pangyoki in #278
- FEAT: pytorch model embeddings by @jiayini1119 in #282
- FEAT: support falcon-instruct 7b and 40b pytorch model by @jiayini1119 in #287
- FEAT: support chatglm/chatglm2/chatglm2-32k pytorch model by @pangyoki in #283
- FEAT: support qwen 7b by @UranusSeven in #294
Enhancements
- ENH: Support Enviroment Variable by @RayJi01 in #285
- REF: split supervisor and worker by @UranusSeven in #279
Bug fixes
- BUG: fix import torch error even if user don't want to launch torch model by @pangyoki in #274
- BUG: empty legacy model dir by @UranusSeven in #276
Tests
Documentation
- DOC: Update README_ja_JP.md by @eltociear in #269
- DOC: add docstring to client methods by @RayJi01 in #247
Full Changelog: v0.1.0...v0.1.1
v0.1.0
What's new in 0.1.0 (2023-07-28)
These are the changes in inference v0.1.0.
New features
- FEAT: support fp4 and int8 quantization for pytorch model by @pangyoki in #238
- FEAT: support llama-2-chat-70b ggml by @UranusSeven in #257
Enhancements
- ENH: skip 4-bit quantization for non-linux or non-cuda local deployment by @UranusSeven in #264
- ENH: handle legacy cache by @UranusSeven in #266
- REF: model family by @UranusSeven in #251
Bug fixes
- BUG: fix restful stop parameters by @RayJi01 in #241
- BUG: download integrity hot fix by @RayJi01 in #242
- BUG: disable baichuan-chat and baichuan-base on macos by @pangyoki in #250
- BUG: delete tqdm_class in snapshot_download by @pangyoki in #258
- BUG: ChatGLM Parameter Switch by @Bojun-Feng in #262
- BUG: refresh related fields when format changes by @UranusSeven in #265
- BUG: Show downloading progress in gradio by @aresnow1 in #267
- BUG: LLM json not included by @UranusSeven in #268
Tests
- TST: Update ChatGLM Tests by @Bojun-Feng in #259
Documentation
- DOC: Update installation part in readme by @aresnow1 in #253
- DOC: update readme for pytorch model by @pangyoki in #207
Full Changelog: v0.0.6...v0.1.0
v0.0.6
What's new in 0.0.6 (2023-07-24)
These are the changes in inference v0.0.6.
Enhancements
Bug fixes
- BUG: baichuan-chat and baichuan-base don't support MacOS by @pangyoki in #202
- BUG: fix pytorch model generate bug when stream is True by @pangyoki in #210
- BUG: solve the problem that pytorch model still occupies memory after terminating the model by @pangyoki in #219
- BUG: fix baichuan-chat configure by @pangyoki in #217
- BUG: Update requirements of gradio by @aresnow1 in #216
- BUG: chat stopwords by @UranusSeven in #222
- BUG: disable vicuna pytorch model by @pangyoki in #225
- BUG: Set default embedding to be True by @jiayini1119 in #236
Documentation
- DOC: Add notes for metal GPU acceleration by @aresnow1 in #213
- DOC: Add Japanese README by @eltociear in #228
- DOC: Adding Examples to documentation by @RayJi01 in #196
New Contributors
- @eltociear made their first contribution in #228
Full Changelog: v0.0.5...v0.0.6
v0.0.5
What's new in 0.0.5 (2023-07-19)
These are the changes in inference v0.0.5.
New features
- FEAT: support pytorch models by @pangyoki in #157
- FEAT: support vicuna-v1.3 33B by @Bojun-Feng in #192
- FEAT: support baichuan-chat pytorch model by @pangyoki in #190
- FEAT: pytorch model support MPS backend by @pangyoki in #198
- FEAT: Embedding by @jiayini1119 in #194
- FEAT: LLaMA-2 by @UranusSeven in #203
Enhancements
- ENH: Implement RESTful API stream generate by @jiayini1119 in #171
- ENH: set default device to
mps
on MacOS by @pangyoki in #205 - ENH: Set default mlock to true and mmap to false by @RayJi01 in #206
- ENH: add Gradio ChatInterface chatbot to example by @Bojun-Feng in #208
Bug fixes
- BUG: fix pytorch int8 by @pangyoki in #197
- BUG: RuntimeError when launching model using kwargs whose value is of type int by @jiayini1119 in #209
- BUG: Fix some gradio issues by @aresnow1 in #200
Documentation
- DOC: sphinx init by @UranusSeven in #189
- DOC: chinese readme by @UranusSeven in #191
Full Changelog: v0.0.4...v0.0.5
v0.0.4
What's new in 0.0.4 (2023-07-14)
These are the changes in inference v0.0.4.
New features
- FEAT: implement chat and generate in RESTful client by @jiayini1119 in #161
- FEAT: support wizard-v1.1 by @UranusSeven in #183
Bug fixes
- BUG: fix example chat by @UranusSeven in #165
Documentation
Others
Full Changelog: v0.0.3...v0.0.4
v0.0.3
v0.0.2
What's new in 0.0.2 (2023-07-11)
These are the changes in inference v0.0.2.
Enhancements
- ENH: auto find available port for API by @jiayini1119 in #143
- ENH: Disable httpx logs by @aresnow1 in #144
- ENH: socket binding by @UranusSeven in #146
- ENH: log when worker started by @UranusSeven in #147
- ENH: Remove baichuan in gradio dropdown by @aresnow1 in #152
- ENH: optimize error msg for foundation models by @UranusSeven in #153
Bug fixes
- BUG: Include json files in MANIFEST.in by @aresnow1 in #139
- BUG: chat example doesn't support llama by @UranusSeven in #140
- BUG: Use utf-8 encoding when open json file by @aresnow1 in #151
Documentation
- DOC: Add gif in readme by @aresnow1 in #135
- DOC: Add the two subheadings "Local" and "Distributed." by @aresnow1 in #137
Others
New Contributors
Full Changelog: v0.0.1...v0.0.2
v0.0.1
What's new in 0.0.1 (2023-07-10)
These are the changes in inference v0.0.1.
New features
- FEAT: prototype by @UranusSeven in #3
- FEAT: support wizardlm by @UranusSeven in #14
- FEAT: baichuan by @UranusSeven in #16
- FEAT: gradio prototype by @aresnow1 in #15
- FEAT: stream generation by @UranusSeven in #17
- FEAT: distributed framework by @UranusSeven in #25
- FEAT: local deployment by @UranusSeven in #38
- FEAT: custom system prompt by @UranusSeven in #35
- FEAT: support orca by @UranusSeven in #51
- FEAT: localization language support by @aresnow1 in #63
- FEAT: Generate through cmdline by @RayJi01 in #70
- FEAT: async client by @UranusSeven in #73
- FEAT: RESTful API by @jiayini1119 in #40
- FEAT: Support Command Line Operation for Chat functionality by @RayJi01 in #74
- FEAT: Support chatglm-6b by @Bojun-Feng in #75
- FEAT: add both versions of chatglm by @Bojun-Feng in #90
- FEAT: slot based model allocation by @UranusSeven in #108
Enhancements
- ENH: Streaming chat UI by @aresnow1 in #31
- ENH: Add checkbox to show stop reason & window size of chat history by @aresnow1 in #44
- ENH: disable stream by default by @UranusSeven in #68
- ENH: Report worker status to supervisor periodically by @aresnow1 in #78
- ENH: unify gradio and fastapi by @jiayini1119 in #88
- ENH: Add download progress if model is not cached by @aresnow1 in #95
- ENH: edit Llama parameters by @Bojun-Feng in #98
- ENH: Support alpaca Chinses by @RayJi01 in #105
- ENH: optimize xinference cmdline by @pangyoki in #103
- ENH: Use thread to launch server by @aresnow1 in #104
- ENH: Add meta file to check if model is downloaded by @aresnow1 in #107
- ENH: basic exception handling for RESTful api by @UranusSeven in #111
- ENH: client provides chat and gen interface by @UranusSeven in #117
- ENH: logging for subprocess by @aresnow1 in #119
- BLD: fix pre-commit by @UranusSeven in #2
- BLD: Add workflow for uploading to PyPI by @aresnow1 in #92
- REF: refactor model spec by @UranusSeven in #45
- REF: change completion type for RESTful API by @UranusSeven in #56
- REF: refactor chat history for restful api by @UranusSeven in #64
- REF: pass model uid and spec to model by @UranusSeven in #85
- REF: rename package by @UranusSeven in #89
Bug fixes
- BUG: Missing dependencies by @jiayini1119 in #21
- BUG: fix controller cmdline by @UranusSeven in #48
- BUG: fix mypy by @UranusSeven in #67
- BUG: RESTful api actor cannot exit by @UranusSeven in #83
- BUG: too many clients by @Bojun-Feng in #87
- BUG: fix chat_history type by @pangyoki in #106
- BUG: Raise KeyError when get model which is not launched by @aresnow1 in #109
- BUG: fix chatglm download url by @UranusSeven in #110
- BUG: load chatglm by @UranusSeven in #112
- BUG: worker timeout during downloading by @UranusSeven in #126
- BUG: fix example by @UranusSeven in #130
- BUG: remove chinese_alpaca model by @pangyoki in #128
- BUG: Use sync client in gradio by @aresnow1 in #129
- BUG: chatglm hangs by @UranusSeven in #118
- BUG: add error handling when the endpoint port is not available by @jiayini1119 in #127
- BUG: fix default host in cmdline by @pangyoki in #132
Tests
- TST: lint by @UranusSeven in #55
- TST: fix mypy by @UranusSeven in #57
- TST: asyncio mode auto by @UranusSeven in #66
- TST: CI by @UranusSeven in #71
- TST: add chatglm tests by @Bojun-Feng in #97
- TST: Add tests for RESTful API by @jiayini1119 in #134
Documentation
- DOC: issue template by @UranusSeven in #76
- DOC: readme by @UranusSeven in #121
- DOC: roadmap by @UranusSeven in #131
- DOC: license by @UranusSeven in #133
Others
- Pass chat history when calling
model.generate
by @aresnow1 in #24 - Rename some classes and files by @aresnow1 in #59
- Fix stop reason by @aresnow1 in #60
- add error message while worker timeout by @pangyoki in #125
New Contributors
- @UranusSeven made their first contribution in #2
- @aresnow1 made their first contribution in #15
- @jiayini1119 made their first contribution in #21
- @RayJi01 made their first contribution in #70
- @Bojun-Feng made their first contribution in #75
- @pangyoki made their first contribution in #103
Full Changelog: https://github.com/xorbitsai/inference/commits/v0.0.1