Transformers
Safetensors
English
finance
Midas-pricer / README.md
Recompense's picture
Update README.md
cc103b0 verified
metadata
library_name: transformers
tags:
  - finance
license: mit
datasets:
  - Recompense/amazon-appliances-lite-data
language:
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct

Model Card for Model ID

Predicts Prices based on product description.

Model Details

Model Description

This model predicts prices of amazon aplliances data based on a product description

  • Developed by: https://huggingface.co/Recompense
  • Model type: Transformer (causal, autoregressive)
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model: meta-llama/Llama-3.1-8B-Instruct

Model Sources

Uses

  • Primary use case: Generating estimated retail prices for household appliances from textual descriptions.

  • Example applications:

    Assisting e-commerce teams in setting competitive price points

    Supporting market analysis dashboards with on-the-fly price estimates

  • Not intended for: Financial advice or investment decisions

Out-of-Scope Use

  • Attempting to predict prices outside the appliances domain (e.g., electronics, furniture, vehicles) will likely yield unreliable results.

  • Using this model for any price-sensitive or regulatory decisions without human oversight is discouraged.

Bias, Risks, and Limitations

  • Data biases: The training dataset is drawn exclusively from Amazon appliance listings. Price distributions are skewed toward mid-range consumer electronics; extreme low or high‐end appliances are underrepresented.

  • Input sensitivity: Minor changes in phrasing or additional noisy tokens can shift predictions noticeably.

  • Generalization: The model does not understand supply chain disruptions, seasonality, or promotions—it only captures patterns seen in historical listing data.

Recommendations

  • Always validate model outputs against a small set of ground-truth prices before production deployment.

  • Use this model as an assistant, not an oracle: incorporate downstream business rules or domain expertise.

  • Regularly retrain or fine-tune on updated listing data to capture shifting market trends.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Recompense/Midas-pricer")
model = AutoModelForCausalLM.from_pretrained("Recompense/Midas-pricer", torch_dtype=torch.bfloat16)

# Prepare prompt
product_desc = "How much does this cost to the nearest dollar?\n\nSamsung 7kg top-load washing machine with digital inverter motor"
prompt = f"{product_desc}\n\nPrice is $"

# Tokenize and generate
inputs = tokenizer(prompt, return_tensors="pt")
attention_mask = torch.ones(inputs.shape, device="cuda")
generated = model.generate(inputs, attention_mask=attention_mask, max_new_tokens=3, num_return_sequences=1)
price_text = tokenizer.decode(generated[0], skip_special_tokens=True)

print(f"Estimated price: ${price_text}")

Training Details

Training Data

  • Dataset: Recompense/amazon-appliances-lite-data

  • Train/validation/test split: 80/10/10

Training Procedure

Training Hyperparameters

  • Fine-tuning framework: PyTorch + Hugging Face Accelerate

  • Precision: bf16 mixed precision

  • Batch size: 1 sequence

  • Learning rate: 1e-5 with linear warmup (10% of total steps)

  • Optimizer: AdamW

Evaluation

Testing Data, Factors & Metrics

  • Test set: Held-out 10% of listings (≈5 000 examples)

  • Metric: Root Mean Squared Logarithmic Error (RMSLE)

  • Hit@$40: Percentage of predictions within ±$40 of true price

Metric Value
RMSLE 0.61
Hit@$40 85.2 %

Summary

The model achieves an RMSLE of 0.61, indicating good alignment between predicted and actual prices on a log scale, and correctly estimates within $40 in over 85% of test cases. This performance is competitive for rapid prototyping in price-sensitive applications.

Environmental Impact

  • Approximate compute emissions for fine-tuning (using ML CO₂ impact calculator):

  • Hardware: Tesla T4

  • Duration: 2 hours(0.06 epoch)

  • Cloud provider: Google Cloud, region US-Central

  • Estimated CO₂ emitted: 6 kg CO₂e

Technical Specifications

Model Architecture

  • Base model: Llama-3.1-8B (8 billion parameters)

  • Objective: Autoregressive language modeling with instruction tuning

Compute Infrastructure

  • Hardware: 4× Tesla T4 GPUs

  • Software:

    PyTorch 2.x

    transformers 5.x

    accelerate 1.x

    bitsandbytes (for 8-bit quantization optional inference)

Glossary

  • RMSLE (Root Mean Squared Logarithmic Error): Measures the square root of the average squared difference between log-transformed predictions and targets. Less sensitive to large absolute errors.

  • Hit@$40: Fraction of predictions whose absolute error is ≤ $40.

Model Card Authors

Damola Jimoh(Recompense)