ncbi
/

Safetensors
qwen2
biology
bioinformatics
single-cell
Fangyinfff commited on
Commit
5926c03
·
verified ·
1 Parent(s): 78ed312

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -3
README.md CHANGED
@@ -1,3 +1,61 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ ## 🔬 How to Run Inference
7
+
8
+ The following example shows how to use `ncbi/Cell-o1` with structured input for reasoning-based cell type annotation.
9
+ The model expects both a system message and a user prompt containing multiple cells and candidate cell types.
10
+
11
+ ```python
12
+ from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
13
+
14
+ # 1. Load the model and tokenizer from the Hugging Face Hub
15
+ model_name = "ncbi/Cell-o1"
16
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
17
+ model = AutoModelForCausalLM.from_pretrained(model_name)
18
+ generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
19
+
20
+ # 2. A minimal batch example with 3 cells and 3 candidate types
21
+ example = {
22
+ "system_msg": (
23
+ "You are an expert assistant specialized in cell type annotation. "
24
+ "You will be given a batch of N cells from the same donor, where each cell represents a unique cell type. "
25
+ "For each cell, the top expressed genes are provided in descending order of expression. "
26
+ "Using both the gene expression data and donor information, determine the correct cell type for each cell. "
27
+ "You will also receive a list of N candidate cell types, and each candidate must be assigned to exactly one cell. "
28
+ "Ensure that you consider all cells and candidate types together, rather than annotating each cell individually. "
29
+ "Include your detailed reasoning within <think> and </think> tags, and provide your final answer within <answer> and </answer> tags. "
30
+ "The final answer should be a single string listing the assigned cell types in order, separated by ' | '."
31
+ ),
32
+
33
+ "user_msg": (
34
+ "Context: The cell is from a female at the 73-year-old stage, originating from the lung. The patient has been diagnosed with chronic obstructive pulmonary disease. The patient is a smoker. There is no cancer present. \n\n"
35
+ "Cell 1: MT2A, ACTB, MT1X, MTATP6P29, MYL9, MTND4LP30, CRIP1, DSTN, MTND2P13, MTCO2P22, S100A6, MTCYBP19, MALAT1, VIM, RPLP1, RGS5, TPT1, LGALS1, TPM2, MTND3P6, MTND1P22, PTMA, TMSB4X, STEAP1B, MT1M, LPP, RPL21\n"
36
+ "Cell 2: MALAT1, FTL, MTCO2P22, TMSB4X, B2M, MTND4LP30, IL6ST, RPS19, RBFOX2, CCSER1, RPL41, RPS27, RPL10, ACTB, MTATP6P29, MTND2P13, RPS12, STEAP1B, RPL13A, S100A4, RPL34, TMSB10, RPL28, RPL32, RPL39, RPL13\n"
37
+ "Cell 3: SCGB3A1, SCGB1A1, SLPI, WFDC2, TPT1, MTCO2P22, B2M, RPS18, RPS4X, RPS6, MTND4LP30, RPL34, RPS14, RPL31, STEAP1B, LCN2, RPLP1, IL6ST, S100A6, RPL21, RPL37A, ADGRL3, RPL37, RBFOX2, RPL41, RARRES1, RPL19\n\n"
38
+ "Match the cells above to one of the following cell types:\n"
39
+ "non-classical monocyte\nepithelial cell of lung\nsmooth muscle cell"
40
+ )
41
+ }
42
+
43
+ # 3. Convert to chat-style messages
44
+ messages = [
45
+ {"role": "system", "content": example["system_msg"]},
46
+ {"role": "user", "content": example["user_msg"]}
47
+ ]
48
+
49
+ # 4. Run inference
50
+ response = generator(
51
+ messages,
52
+ max_new_tokens=1000, # increase if your reasoning chain is longer
53
+ do_sample=False # deterministic decoding
54
+ )[0]["generated_text"]
55
+
56
+ # 5. Print the model’s reply (<think> + <answer>)
57
+ assistant_reply = response[-1]["content"] if isinstance(response, list) else response
58
+ print(assistant_reply)
59
+
60
+
61
+ ```