metadata

library_name: transformers
tags:
  - finance
license: mit
datasets:
  - Recompense/amazon-appliances-lite-data
language:
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct

Model Card for Model ID

Predicts Prices based on product description.

Model Details

Model Description

This model predicts prices of amazon aplliances data based on a product description

Developed by: https://huggingface.co/Recompense
Model type: Transformer (causal, autoregressive)
Language(s) (NLP): English
License: MIT
Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Repository: [https://huggingface.co/Recompense/Midas-pricer/]

Uses

Primary use case: Generating estimated retail prices for household appliances from textual descriptions.
Example applications:

Assisting e-commerce teams in setting competitive price points

Supporting market analysis dashboards with on-the-fly price estimates
Not intended for: Financial advice or investment decisions

Out-of-Scope Use

Attempting to predict prices outside the appliances domain (e.g., electronics, furniture, vehicles) will likely yield unreliable results.
Using this model for any price-sensitive or regulatory decisions without human oversight is discouraged.

Bias, Risks, and Limitations

Data biases: The training dataset is drawn exclusively from Amazon appliance listings. Price distributions are skewed toward mid-range consumer electronics; extreme low or high‐end appliances are underrepresented.
Input sensitivity: Minor changes in phrasing or additional noisy tokens can shift predictions noticeably.
Generalization: The model does not understand supply chain disruptions, seasonality, or promotions—it only captures patterns seen in historical listing data.

Recommendations

Always validate model outputs against a small set of ground-truth prices before production deployment.
Use this model as an assistant, not an oracle: incorporate downstream business rules or domain expertise.
Regularly retrain or fine-tune on updated listing data to capture shifting market trends.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Recompense/Midas-pricer")
model = AutoModelForCausalLM.from_pretrained("Recompense/Midas-pricer", torch_dtype=torch.bfloat16)

# Prepare prompt
product_desc = "How much does this cost to the nearest dollar?\n\nSamsung 7kg top-load washing machine with digital inverter motor"
prompt = f"{product_desc}\n\nPrice is $"

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
attention_mask = torch.ones(inputs.shape, device="cuda")
generated = model.generate(inputs, attention_mask=attention_mask, max_new_tokens=3, num_return_sequences=1)
price_text = tokenizer.decode(generated[0], skip_special_tokens=True)

print(f"Estimated price: ${price_text}")

Training Details

Training Data

Dataset: Recompense/amazon-appliances-lite-data
Train/validation/test split: 80/10/10

Training Procedure

Training Hyperparameters

Fine-tuning framework: PyTorch + Hugging Face Accelerate
Precision: bf16 mixed precision
Batch size: 1 sequence
Learning rate: 1e-5 with linear warmup (10% of total steps)
Optimizer: AdamW

Evaluation

Testing Data, Factors & Metrics

Test set: Held-out 10% of listings (≈5 000 examples)
Metric: Root Mean Squared Logarithmic Error (RMSLE)
Hit@$40: Percentage of predictions within ±$40 of true price

Metric	Value
RMSLE	0.61
Hit@$40	85.2 %

Summary

The model achieves an RMSLE of 0.61, indicating good alignment between predicted and actual prices on a log scale, and correctly estimates within $40 in over 85% of test cases. This performance is competitive for rapid prototyping in price-sensitive applications.

Environmental Impact

Approximate compute emissions for fine-tuning (using ML CO₂ impact calculator):
Hardware: Tesla T4
Duration: 2 hours(0.06 epoch)
Cloud provider: Google Cloud, region US-Central
Estimated CO₂ emitted: 6 kg CO₂e

Technical Specifications

Model Architecture

Base model: Llama-3.1-8B (8 billion parameters)
Objective: Autoregressive language modeling with instruction tuning

Compute Infrastructure

Hardware: 4× Tesla T4 GPUs
Software:

PyTorch 2.x

transformers 5.x

accelerate 1.x

bitsandbytes (for 8-bit quantization optional inference)

Glossary

RMSLE (Root Mean Squared Logarithmic Error): Measures the square root of the average squared difference between log-transformed predictions and targets. Less sensitive to large absolute errors.
Hit@$40: Fraction of predictions whose absolute error is ≤ $40.

Model Card Authors

Damola Jimoh(Recompense)