Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference Providers] Add CentML inference providers support #2959

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

V2arK
Copy link

@V2arK V2arK commented Mar 27, 2025

What does this PR do?

Add CentML as a 3rd party Inference Provider

Key Changes:

  • Added CentML as a remote inference provider
  • Add tests cassettes after passing

Test Plan

pytest tests/test_inference_client.py -k centml
/Users/honglin/.pyenv/versions/3.12.0/lib/python3.12/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================================================== test session starts ===============================================================================
platform darwin -- Python 3.12.0, pytest-8.3.4, pluggy-1.5.0 -- /Users/honglin/.pyenv/versions/3.12.0/bin/python3.12
cachedir: .pytest_cache
metadata: {'Python': '3.12.0', 'Platform': 'macOS-15.3.2-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'vcr': '1.0.2', 'anyio': '4.8.0', 'mock': '3.14.0'}, 'AWS_REGION': 'us-east-1', 'JAVA_HOME': '/Library/Java/JavaVirtualMachines/zulu-18.jdk/Contents/Home'}
rootdir: /Users/honglin/centML/huggingface_hub
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, vcr-1.0.2, anyio-4.8.0, mock-3.14.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 118 items / 114 deselected / 4 selected                                                                                                                                

tests/test_inference_client.py::TestInferenceClient::test_chat_completion_no_stream[centml,conversational] 
--------------------------------------------------------------------------------- live log setup ---------------------------------------------------------------------------------
INFO     vcr.cassette:cassette.py:186 <function VCR._build_before_record_request.<locals>.before_record_request at 0x103ff3b00>
INFO     vcr.cassette:cassette.py:222 Appending request <Request (POST) https://api.centml.com/openai/v1/chat/completions> and response {'body': {'string': b'{"id":"chatcmpl-de1b282d4615cdcf51313490db81295a","object":"chat.completion","created":1742933815,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"**Deep Learning: An Overview**\\n=====================================\\n\\nDeep learning is a subset of machine learning that involves the use of artificial neural networks to analyze and interpret data. These neural networks are designed to mimic the structure and function of the human brain, with multiple layers of interconnected nodes (neurons) that process and transmit information.\\n\\n**Key Characteristics:**\\n\\n1. **Artificial Neural Networks**: Deep learning models are based on artificial neural networks, which are composed of multiple layers of nodes (neurons) that process and transmit information.\\n2. **Multiple Layers**: Deep learning models have multiple layers, each of which performs a specific function, such as feature extraction, feature transformation, or classification.\\n3. **Hierarchical Representation**: Deep learning models learn hierarchical representations of data, with early layers learning low-level features and later layers learning higher-level features.\\n4. **Large Amounts of Data**: Deep learning models require large amounts of data to train, as they need to learn complex patterns and relationships in the data.\\n\\n**Types of Deep Learning Models:**\\n\\n1. **Convolutional Neural Networks (CNNs)**: Used for image and video processing, CNNs are designed to extract features from spatially structured data.\\n2. **Recurrent Neural Networks (RNNs)**: Used for sequential data, such as speech or text, RNNs are designed to model temporal relationships in data.\\n3. **Autoencoders**: Used for dimensionality reduction and generative modeling, autoencoders are designed to learn compact representations of data.\\n\\n**Applications:**\\n\\n1. **Computer Vision**: Deep learning models are widely used in computer vision applications, such as image classification, object detection, and segmentation.\\n2. **Natural Language Processing**: Deep learning models are used in NLP applications, such as language modeling, text classification, and machine translation.\\n3. **Speech Recognition**: Deep learning models are used in speech recognition applications, such as speech-to-text and voice recognition.\\n\\n**Advantages:**\\n\\n1. **High Accuracy**: Deep learning models can achieve high accuracy in complex tasks, such as image recognition and speech recognition.\\n2. **Flexibility**: Deep learning models can be used in a wide range of applications, from computer vision to NLP.\\n3. **Scalability**: Deep learning models can be trained on large datasets and can scale to large applications.\\n\\n**Challenges:**\\n\\n1. **Computational Requirements**: Deep learning models require significant computational resources to train and deploy.\\n2. **Data Requirements**: Deep learning models require large amounts of data to train, which can be difficult to obtain.\\n3. **Interpretability**: Deep learning models can be difficult to interpret, making it challenging to understand why a particular decision was made.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":46,"total_tokens":595,"completion_tokens":549,"prompt_tokens_details":null},"prompt_logprobs":null}'}, 'headers': {'content-type': ['application/json'], 'date': ['Tue, 25 Mar 2025 20:16:54 GMT'], 'server': ['istio-envoy'], 'transfer-encoding': ['chunked'], 'x-envoy-upstream-service-time': ['3844']}, 'status': {'code': 200, 'message': 'OK'}}
--------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------
INFO     vcr.stubs:__init__.py:259 Playing response for <Request (POST) https://api.centml.com/openai/v1/chat/completions> from cassette
PASSED
tests/test_inference_client.py::TestInferenceClient::test_chat_completion_with_stream[centml,conversational] 
--------------------------------------------------------------------------------- live log setup ---------------------------------------------------------------------------------
INFO     vcr.cassette:cassette.py:186 <function VCR._build_before_record_request.<locals>.before_record_request at 0x104094f40>
INFO     vcr.cassette:cassette.py:222 Appending request <Request (POST) https://api.centml.com/openai/v1/chat/completions> and response {'body': {'string': b'data: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"**"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"Deep"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" Learning Overview"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":"**\\n=========================\\n\\nDeep learning is a"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" subset"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" of machine learning that involves"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl-7694e7d5663b4d0e22706f8260bae6df","object":"chat.completion.chunk","created":1742933819,"model":"meta-llama/Llama-3.3-70B-Instruct","choices":[{"index":0,"delta":{"content":" the use"},"logprobs":null,"finish_reason":"length","stop_reason":null}]}\n\ndata: [DONE]\n\n'}, 'headers': {'content-type': ['text/event-stream; charset=utf-8; charset=utf-8'], 'date': ['Tue, 25 Mar 2025 20:16:58 GMT'], 'server': ['istio-envoy'], 'transfer-encoding': ['chunked'], 'x-envoy-upstream-service-time': ['328']}, 'status': {'code': 200, 'message': 'OK'}}
--------------------------------------------------------------------------------- live log call ----------------------------------------------------------------------------------
INFO     vcr.stubs:__init__.py:259 Playing response for <Request (POST) https://api.centml.com/openai/v1/chat/completions> from cassette
PASSED
tests/test_inference_client.py::TestInferenceClient::test_chat_completion_with_tool[centml,conversational] SKIPPED (Schema not aligned between providers)
tests/test_inference_client.py::TestInferenceClient::test_chat_completion_with_response_format[centml,conversational] SKIPPED (Schema not aligned between providers)

================================================================================ warnings summary ================================================================================
../../.pyenv/versions/3.12.0/lib/python3.12/site-packages/_pytest/config/__init__.py:1441
  /Users/honglin/.pyenv/versions/3.12.0/lib/python3.12/site-packages/_pytest/config/__init__.py:1441: PytestConfigWarning: Unknown config option: env
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================== slowest durations ================================================================================
0.44s setup    tests/test_inference_client.py::TestInferenceClient::test_chat_completion_no_stream[centml,conversational]

(9 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================================ 2 passed, 2 skipped, 114 deselected, 1 warning in 0.56s =============================================================

@Wauplin
Copy link
Contributor

Wauplin commented Mar 27, 2025

Hi @V2arK, thanks for your PR :) Even though we are looking forward to integrate new providers, we'll have to hold on for a bit on this one. Integrating a new provider takes time and effort on our side to make things right so for now we've decided to focus on the existing ones. Will let you know when we are ready to move forward! 🤗

@wangshangsam
Copy link

wangshangsam commented Apr 4, 2025

Hi @V2arK, thanks for your PR :) Even though we are looking forward to integrate new providers, we'll have to hold on for a bit on this one. Integrating a new provider takes time and effort on our side to make things right so for now we've decided to focus on the existing ones. Will let you know when we are ready to move forward! 🤗

@Wauplin Thanks a lot for the heads up! I'm wondering if you could shed some light on any ETA of when your team will be ready to accept new serverless endpoint providers again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants