c-bone commited on
Commit
f6e3cb4
·
verified ·
1 Parent(s): b81d54d

Upload tokenizer

Browse files
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
spacegroups.txt ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ P6/mmm
2
+ Imma
3
+ P4_32_12
4
+ P4_2/mnm
5
+ Fd-3m
6
+ P3m1
7
+ P-3
8
+ P4mm
9
+ P4_332
10
+ P4/nnc
11
+ P2_12_12
12
+ Pnn2
13
+ Pbcn
14
+ P4_2/n
15
+ Cm
16
+ R3m
17
+ Cmce
18
+ Aea2
19
+ P-42_1m
20
+ P-42m
21
+ P2_13
22
+ R-3
23
+ Fm-3
24
+ Cmm2
25
+ Pn-3n
26
+ P6/mcc
27
+ P-6m2
28
+ P3_2
29
+ P-3m1
30
+ P3_212
31
+ I23
32
+ P-62m
33
+ P4_2nm
34
+ Pma2
35
+ Pmma
36
+ I-42m
37
+ P-31c
38
+ Pa-3
39
+ Pmmn
40
+ Pmmm
41
+ P4_2/ncm
42
+ I4/mcm
43
+ I-4m2
44
+ P3_1
45
+ Pcc2
46
+ Cmcm
47
+ I222
48
+ Fddd
49
+ P312
50
+ Cccm
51
+ P6_1
52
+ F-43c
53
+ P6_322
54
+ Pm-3
55
+ P3_121
56
+ P6_4
57
+ Ia-3d
58
+ Pm-3m
59
+ P2_1/c
60
+ C222_1
61
+ Pc
62
+ P4/n
63
+ Pba2
64
+ Ama2
65
+ Pbcm
66
+ P31m
67
+ Pcca
68
+ P222
69
+ P-43n
70
+ Pccm
71
+ P6_422
72
+ F23
73
+ P42_12
74
+ C222
75
+ Pnnn
76
+ P6_3cm
77
+ P4_12_12
78
+ P6/m
79
+ Fmm2
80
+ I4_1/a
81
+ P4/mbm
82
+ Pmn2_1
83
+ P4_2bc
84
+ P4_22_12
85
+ I-43d
86
+ I4/m
87
+ P4bm
88
+ Fdd2
89
+ P3
90
+ P6_122
91
+ Pnc2
92
+ P4_2/mcm
93
+ P4_122
94
+ Cmc2_1
95
+ P-6c2
96
+ R32
97
+ P4_1
98
+ P4_232
99
+ Pnna
100
+ P422
101
+ Pban
102
+ Cc
103
+ I4_122
104
+ P6_3/m
105
+ P6_3mc
106
+ I4_1/amd
107
+ P4_2
108
+ P4/nmm
109
+ Pmna
110
+ P4/m
111
+ Fm-3m
112
+ P4/mmm
113
+ Imm2
114
+ P4/ncc
115
+ P-62c
116
+ Ima2
117
+ P6_5
118
+ P2/c
119
+ P4/nbm
120
+ Ibam
121
+ P6_522
122
+ P6_3/mmc
123
+ I4/mmm
124
+ Fmmm
125
+ P2/m
126
+ P-4b2
127
+ I-4
128
+ C2/m
129
+ P4_2/mmc
130
+ P4
131
+ Fd-3c
132
+ P4_3
133
+ P2_1/m
134
+ I-43m
135
+ P-42c
136
+ F4_132
137
+ Pm
138
+ Pccn
139
+ P-4n2
140
+ P4_132
141
+ P23
142
+ I4cm
143
+ R3c
144
+ Amm2
145
+ Immm
146
+ Iba2
147
+ I4
148
+ Fd-3
149
+ P1
150
+ Pbam
151
+ P4_2/nbc
152
+ Im-3
153
+ P4_2/nnm
154
+ Pmc2_1
155
+ P-31m
156
+ R-3m
157
+ Ia-3
158
+ P622
159
+ F222
160
+ P2
161
+ P-1
162
+ Pmm2
163
+ P-4
164
+ Aem2
165
+ P6_222
166
+ P-3c1
167
+ P4_322
168
+ I422
169
+ Pnma
170
+ P6_3
171
+ P3c1
172
+ Pn-3
173
+ P4nc
174
+ P-6
175
+ P4/mcc
176
+ I2_12_12_1
177
+ P4_2/mbc
178
+ P31c
179
+ Ccc2
180
+ P4_2/nmc
181
+ P6_3/mcm
182
+ C2
183
+ Pbca
184
+ P-4c2
185
+ I4_1cd
186
+ P2_1
187
+ P3_112
188
+ P4_2mc
189
+ Pn-3m
190
+ C2/c
191
+ R3
192
+ P-43m
193
+ I432
194
+ P222_1
195
+ I-42d
196
+ I-4c2
197
+ P6cc
198
+ P6_2
199
+ P3_221
200
+ P321
201
+ Pca2_1
202
+ I4_1/acd
203
+ I4_132
204
+ F432
205
+ Pna2_1
206
+ Ccce
207
+ Ibca
208
+ P4/mnc
209
+ I4_1md
210
+ P2_12_12_1
211
+ R-3c
212
+ I2_13
213
+ P-4m2
214
+ Pm-3n
215
+ I4mm
216
+ F-43m
217
+ Pnnm
218
+ P-42_1c
219
+ Cmmm
220
+ P6mm
221
+ P4_2cm
222
+ P4_2/m
223
+ Im-3m
224
+ Fm-3c
225
+ I4_1
226
+ P4cc
227
+ Cmme
special_tokens_map.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<bos>",
3
+ "eos_token": "<eos>",
4
+ "pad_token": "<pad>",
5
+ "unk_token": "<unk>"
6
+ }
tokenizer_config.json CHANGED
@@ -1 +1 @@
1
- {"name": "CIFTokenizer", "unk_token": "<unk>"}
 
1
+ {"unk_token": "<unk>", "vocab_size": 374, "tokenizer_class": "CustomCIFTokenizer"}
vocabulary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"Si": 0, "C": 1, "Pb": 2, "I": 3, "Br": 4, "Cl": 5, "Eu": 6, "O": 7, "Fe": 8, "Sb": 9, "In": 10, "S": 11, "N": 12, "U": 13, "Mn": 14, "Lu": 15, "Se": 16, "Tl": 17, "Hf": 18, "Ir": 19, "Ca": 20, "Ta": 21, "Cr": 22, "K": 23, "Pm": 24, "Mg": 25, "Zn": 26, "Cu": 27, "Sn": 28, "Ti": 29, "B": 30, "W": 31, "P": 32, "H": 33, "Pd": 34, "As": 35, "Co": 36, "Np": 37, "Tc": 38, "Hg": 39, "Pu": 40, "Al": 41, "Tm": 42, "Tb": 43, "Ho": 44, "Nb": 45, "Ge": 46, "Zr": 47, "Cd": 48, "V": 49, "Sr": 50, "Ni": 51, "Rh": 52, "Th": 53, "Na": 54, "Ru": 55, "La": 56, "Re": 57, "Y": 58, "Er": 59, "Ce": 60, "Pt": 61, "Ga": 62, "Li": 63, "Cs": 64, "F": 65, "Ba": 66, "Te": 67, "Mo": 68, "Gd": 69, "Pr": 70, "Bi": 71, "Sc": 72, "Ag": 73, "Rb": 74, "Dy": 75, "Yb": 76, "Nd": 77, "Au": 78, "Os": 79, "Pa": 80, "Sm": 81, "Be": 82, "Ac": 83, "Xe": 84, "Kr": 85, "He": 86, "Ne": 87, "Ar": 88, "0": 89, "1": 90, "2": 91, "3": 92, "4": 93, "5": 94, "6": 95, "7": 96, "8": 97, "9": 98, "_cell_length_b": 99, "_atom_site_occupancy": 100, "_atom_site_attached_hydrogens": 101, "_cell_length_a": 102, "_cell_angle_beta": 103, "_symmetry_equiv_pos_as_xyz": 104, "_cell_angle_gamma": 105, "_atom_site_fract_x": 106, "_symmetry_space_group_name_H-M": 107, "_symmetry_Int_Tables_number": 108, "_chemical_formula_structural": 109, "_chemical_name_systematic": 110, "_atom_site_fract_y": 111, "_atom_site_symmetry_multiplicity": 112, "_chemical_formula_sum": 113, "_atom_site_label": 114, "_atom_site_type_symbol": 115, "_cell_length_c": 116, "_atom_site_B_iso_or_equiv": 117, "_symmetry_equiv_pos_site_id": 118, "_cell_volume": 119, "_atom_site_fract_z": 120, "_cell_angle_alpha": 121, "_cell_formula_units_Z": 122, "loop_": 123, "data_": 124, "_atom_type_symbol": 125, "_atom_type_electronegativity": 126, "_atom_type_radius": 127, "_atom_type_ionic_radius": 128, "_atom_type_oxidation_number": 129, "x": 130, "y": 131, "z": 132, ".": 133, "(": 134, ")": 135, "+": 136, "-": 137, "/": 138, "'": 139, ",": 140, " ": 141, "\n": 142, "P6/mmm_sg": 143, "Imma_sg": 144, "P4_32_12_sg": 145, "P4_2/mnm_sg": 146, "Fd-3m_sg": 147, "P3m1_sg": 148, "P-3_sg": 149, "P4mm_sg": 150, "P4_332_sg": 151, "P4/nnc_sg": 152, "P2_12_12_sg": 153, "Pnn2_sg": 154, "Pbcn_sg": 155, "P4_2/n_sg": 156, "Cm_sg": 157, "R3m_sg": 158, "Cmce_sg": 159, "Aea2_sg": 160, "P-42_1m_sg": 161, "P-42m_sg": 162, "P2_13_sg": 163, "R-3_sg": 164, "Fm-3_sg": 165, "Cmm2_sg": 166, "Pn-3n_sg": 167, "P6/mcc_sg": 168, "P-6m2_sg": 169, "P3_2_sg": 170, "P-3m1_sg": 171, "P3_212_sg": 172, "I23_sg": 173, "P-62m_sg": 174, "P4_2nm_sg": 175, "Pma2_sg": 176, "Pmma_sg": 177, "I-42m_sg": 178, "P-31c_sg": 179, "Pa-3_sg": 180, "Pmmn_sg": 181, "Pmmm_sg": 182, "P4_2/ncm_sg": 183, "I4/mcm_sg": 184, "I-4m2_sg": 185, "P3_1_sg": 186, "Pcc2_sg": 187, "Cmcm_sg": 188, "I222_sg": 189, "Fddd_sg": 190, "P312_sg": 191, "Cccm_sg": 192, "P6_1_sg": 193, "F-43c_sg": 194, "P6_322_sg": 195, "Pm-3_sg": 196, "P3_121_sg": 197, "P6_4_sg": 198, "Ia-3d_sg": 199, "Pm-3m_sg": 200, "P2_1/c_sg": 201, "C222_1_sg": 202, "Pc_sg": 203, "P4/n_sg": 204, "Pba2_sg": 205, "Ama2_sg": 206, "Pbcm_sg": 207, "P31m_sg": 208, "Pcca_sg": 209, "P222_sg": 210, "P-43n_sg": 211, "Pccm_sg": 212, "P6_422_sg": 213, "F23_sg": 214, "P42_12_sg": 215, "C222_sg": 216, "Pnnn_sg": 217, "P6_3cm_sg": 218, "P4_12_12_sg": 219, "P6/m_sg": 220, "Fmm2_sg": 221, "I4_1/a_sg": 222, "P4/mbm_sg": 223, "Pmn2_1_sg": 224, "P4_2bc_sg": 225, "P4_22_12_sg": 226, "I-43d_sg": 227, "I4/m_sg": 228, "P4bm_sg": 229, "Fdd2_sg": 230, "P3_sg": 231, "P6_122_sg": 232, "Pnc2_sg": 233, "P4_2/mcm_sg": 234, "P4_122_sg": 235, "Cmc2_1_sg": 236, "P-6c2_sg": 237, "R32_sg": 238, "P4_1_sg": 239, "P4_232_sg": 240, "Pnna_sg": 241, "P422_sg": 242, "Pban_sg": 243, "Cc_sg": 244, "I4_122_sg": 245, "P6_3/m_sg": 246, "P6_3mc_sg": 247, "I4_1/amd_sg": 248, "P4_2_sg": 249, "P4/nmm_sg": 250, "Pmna_sg": 251, "P4/m_sg": 252, "Fm-3m_sg": 253, "P4/mmm_sg": 254, "Imm2_sg": 255, "P4/ncc_sg": 256, "P-62c_sg": 257, "Ima2_sg": 258, "P6_5_sg": 259, "P2/c_sg": 260, "P4/nbm_sg": 261, "Ibam_sg": 262, "P6_522_sg": 263, "P6_3/mmc_sg": 264, "I4/mmm_sg": 265, "Fmmm_sg": 266, "P2/m_sg": 267, "P-4b2_sg": 268, "I-4_sg": 269, "C2/m_sg": 270, "P4_2/mmc_sg": 271, "P4_sg": 272, "Fd-3c_sg": 273, "P4_3_sg": 274, "P2_1/m_sg": 275, "I-43m_sg": 276, "P-42c_sg": 277, "F4_132_sg": 278, "Pm_sg": 279, "Pccn_sg": 280, "P-4n2_sg": 281, "P4_132_sg": 282, "P23_sg": 283, "I4cm_sg": 284, "R3c_sg": 285, "Amm2_sg": 286, "Immm_sg": 287, "Iba2_sg": 288, "I4_sg": 289, "Fd-3_sg": 290, "P1_sg": 291, "Pbam_sg": 292, "P4_2/nbc_sg": 293, "Im-3_sg": 294, "P4_2/nnm_sg": 295, "Pmc2_1_sg": 296, "P-31m_sg": 297, "R-3m_sg": 298, "Ia-3_sg": 299, "P622_sg": 300, "F222_sg": 301, "P2_sg": 302, "P-1_sg": 303, "Pmm2_sg": 304, "P-4_sg": 305, "Aem2_sg": 306, "P6_222_sg": 307, "P-3c1_sg": 308, "P4_322_sg": 309, "I422_sg": 310, "Pnma_sg": 311, "P6_3_sg": 312, "P3c1_sg": 313, "Pn-3_sg": 314, "P4nc_sg": 315, "P-6_sg": 316, "P4/mcc_sg": 317, "I2_12_12_1_sg": 318, "P4_2/mbc_sg": 319, "P31c_sg": 320, "Ccc2_sg": 321, "P4_2/nmc_sg": 322, "P6_3/mcm_sg": 323, "C2_sg": 324, "Pbca_sg": 325, "P-4c2_sg": 326, "I4_1cd_sg": 327, "P2_1_sg": 328, "P3_112_sg": 329, "P4_2mc_sg": 330, "Pn-3m_sg": 331, "C2/c_sg": 332, "R3_sg": 333, "P-43m_sg": 334, "I432_sg": 335, "P222_1_sg": 336, "I-42d_sg": 337, "I-4c2_sg": 338, "P6cc_sg": 339, "P6_2_sg": 340, "P3_221_sg": 341, "P321_sg": 342, "Pca2_1_sg": 343, "I4_1/acd_sg": 344, "I4_132_sg": 345, "F432_sg": 346, "Pna2_1_sg": 347, "Ccce_sg": 348, "Ibca_sg": 349, "P4/mnc_sg": 350, "I4_1md_sg": 351, "P2_12_12_1_sg": 352, "R-3c_sg": 353, "I2_13_sg": 354, "P-4m2_sg": 355, "Pm-3n_sg": 356, "I4mm_sg": 357, "F-43m_sg": 358, "Pnnm_sg": 359, "P-42_1c_sg": 360, "Cmmm_sg": 361, "P6mm_sg": 362, "P4_2cm_sg": 363, "P4_2/m_sg": 364, "Im-3m_sg": 365, "Fm-3c_sg": 366, "I4_1_sg": 367, "P4cc_sg": 368, "Cmme_sg": 369, "<unk>": 370, "<pad>": 371, "<bos>": 372, "<eos>": 373}