arvindkaphley commited on
Commit
1ce788a
·
verified ·
1 Parent(s): f62bbdd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -40,10 +40,12 @@ Ruby Code Generator is a versatile tool crafted to streamline the interaction be
40
  - Load the bigcode/the-stack-smol dataset using the Hugging Face Datasets library.
41
  - Filter for the specified subset (data/ruby) and split (train).
42
  - Load the bigcode/starcoder2-3b model from the Hugging Face Hub with '4-bit' quantization.
 
43
  **2. Data Preprocessing:**
44
  - Tokenize the code text using the appropriate tokenizer for the chosen model.
45
  - Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
46
  - Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).
 
47
  **3. Configure Training:**
48
  - Initialize a Trainer object (likely from a library like Transformers).
49
  - Set training arguments based on the provided args:
 
40
  - Load the bigcode/the-stack-smol dataset using the Hugging Face Datasets library.
41
  - Filter for the specified subset (data/ruby) and split (train).
42
  - Load the bigcode/starcoder2-3b model from the Hugging Face Hub with '4-bit' quantization.
43
+
44
  **2. Data Preprocessing:**
45
  - Tokenize the code text using the appropriate tokenizer for the chosen model.
46
  - Apply necessary cleaning or normalization (e.g., removing comments, handling indentation).
47
  - Create input examples suitable for the model's architecture (e.g., with masked language modeling objectives).
48
+
49
  **3. Configure Training:**
50
  - Initialize a Trainer object (likely from a library like Transformers).
51
  - Set training arguments based on the provided args: