Serverless GPUs to scale your machine learning inference without any hassle of managing servers, deploy complicated and custom models with ease.

Go through this tutorial, for quickly deploy SOLAR-10.7B-Instruct-v1.0 using Inferless

SOLAR-10.7B-Instruct-v1.0 - GPTQ

Model creator: Upstage
Original model: SOLAR-10.7B-Instruct-v1.0

Description

This repo contains GPTQ model files for Upstage's SOLAR-10.7B-Instruct-v1.0.

About GPTQ

GPTQ is a method that compresses the model size and accelerates inference by quantizing weights based on a calibration dataset, aiming to minimize mean squared error in a single post-quantization step. GPTQ achieves both memory efficiency and faster inference.

It is supported by:

Text Generation Webui - using Loader: AutoAWQ
vLLM - version 0.2.2 or later for support for all model types.
Hugging Face Text Generation Inference (TGI)
Transformers version 4.35.0 and later, from any code or client that supports Transformers
AutoAWQ - for use from Python code

Shared files, and GPTQ parameters

Models are released as sharded safetensors files.

Branch	Bits	GS	AWQ Dataset	Seq Len	Size
main	4	128	VMware Open Instruct	4096	5.96 GB

How to use

You will need the following software packages and python libraries:

build:
  cuda_version: "12.1.1"
  system_packages:
    - "libssl-dev"
  python_packages:
    - "torch==2.1.2"
    - "vllm==0.2.6"
    - "transformers==4.36.2"
    - "accelerate==0.25.0"

Here is the code for app.py

from vllm import LLM, SamplingParams

class InferlessPythonModel:
    def initialize(self):

        self.sampling_params = SamplingParams(temperature=0.7, top_p=0.95,max_tokens=256)
        self.llm = LLM(model="Inferless/SOLAR-10.7B-Instruct-v1.0-GPTQ", quantization="gptq", dtype="float16")

    def infer(self, inputs):
        prompts = inputs["prompt"]
        result = self.llm.generate(prompts, self.sampling_params)
        result_output = [[[output.outputs[0].text,output.outputs[0].token_ids] for output in result]

        return {'generated_result': result_output[0]}

    def finalize(self):
        pass

Inferless
/

SOLAR-10.7B-Instruct-v1.0-GPTQ

SOLAR-10.7B-Instruct-v1.0 - GPTQ

Description

About GPTQ

Shared files, and GPTQ parameters

How to use

Model tree for Inferless/SOLAR-10.7B-Instruct-v1.0-GPTQ

Evaluation results