-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Description
Background Description
During #7499 it turns out that the KV store metadata needs further development.
We need an outline and a consensus on that outline for the KV store in such a way that is not too closely coupled with HuggingFace and is independent enough to service GGUF use cases. Ideally we should be able to remotely fetched all the details as needed or use the model card as fallback.
Below is a stab I had in listing out what Keys i thought about as well as possible hugging face model card key I could use.
GGUF Key (Authorship Metadata Only) | Hugging Face Model Card Key | Example Value | Semantic Description |
---|---|---|---|
general.name | model_name | "GPT-3" | Name or title of the model |
general.author | model_creator | "TheBloke" | Name(s) of the author(s) |
general.version | model_version | "v3.0" | Version number of the model |
general.organization | model_organization | "OpenAI" | Organization or institution associated with the model |
general.finetune | model_finetune | "Instruct" | finetune portion of model filename |
general.basename | model_basename | "gpt3-base" | Basename of the model filename |
general.description | model_description | "Large-scale language model." | Brief description of the model and its use cases |
general.quantized_by | quantized_by | "OpenAI" | Entity responsible for quantizing the model to gguf |
general.parameter_class_attribute | model_parameter_class_attribute | "8x3B" | Parameter Weight Class attribute of model parameters |
general.license | license | "MIT" | License type under which the model is released |
general.license.name | license_name | "MIT License" | Name of the license |
general.license.link | license_link | "https://opensource.org/licenses/MIT" | Link to the full text of the license |
general.url | - | "https://openai.com/gpt-3" | URL to the model website or paper |
general.doi | - | "10.1234/5678" | Digital Object Identifier (DOI) of the model |
general.uuid | - | "123e4567-e89b-12d3-a456-426614174000" | Universally Unique Identifier (UUID) of the model |
general.repo_url | - | "https://github.com/openai/gpt-3" | URL to the model source repository |
general.source.url | - | "https://arxiv.org/abs/2005.14165" | URL to the source website or paper |
general.source.doi | - | "10.1234/5678" | Digital Object Identifier (DOI) of the source |
general.source.uuid | - | "123e4567-e89b-12d3-a456-426614174000" | Universally Unique Identifier (UUID) of the source |
general.source.repo_url | - | "https://github.com/openai/gpt-3" | URL to the source repository |
general.base_model.count | base_model (derived from id) | 2 | Number of base models used to create the model |
general.base_model.{id}.name | base_model (derived from id) | "BERT" | Name or title of the base model |
general.base_model.{id}.author | - | "Google" | Name(s) of the author(s) of the base model |
general.base_model.{id}.version | base_model (derived from id) | "3.0" | Version number of the base model |
general.base_model.{id}.organization | base_model (derived from id) | "Google" | Organization or institution associated with the base model |
general.base_model.{id}.url | - | "https://arxiv.org/abs/1810.04805" | URL to the base model website or paper |
general.base_model.{id}.doi | - | "10.1234/5678" | Digital Object Identifier (DOI) of the base model |
general.base_model.{id}.uuid | - | "123e4567-e89b-12d3-a456-426614174000" | Universally Unique Identifier (UUID) of the base model |
general.base_model.{id}.repo_url | base_model (derived from id) | "https://github.com/google/bert" | URL to the base model source repository |
general.tags | tags + pipeline_tag | ["NLP", "Language Modeling"] | Tags associated with the model (e.g., categories) |
general.languages | language | ["English", "Spanish"] | Languages supported by the model |
general.datasets | datasets | ["Wikipedia", "Common Crawl"] | Datasets used to train the model |
Possible Refactor Approaches
No response
Metadata
Metadata
Assignees
Labels
No labels