levalencia commited on
Commit
32af2de
·
1 Parent(s): c77b1c5

feat: add validation and cleaning methods for unique combinations and extracted values

Browse files

- Introduced _validate_and_clean_combinations method in UniqueIndicesCombinator to validate and clean unique combinations against possible values, enhancing data integrity.
- Added _validate_and_clean_values method in UniqueIndicesLoopAgent for similar validation and cleaning of extracted values.
- Improved logging to provide detailed insights during the validation process, including exact matches and substring matches.
- Updated relevant sections to ensure cleaned combinations and values are returned, maintaining consistency with structured output requirements.

logs/di_content/di_content_20250619_142032_tables.html ADDED
@@ -0,0 +1,1441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <title>Azure DI Tables</title>
5
+ <style>
6
+ body { font-family: Arial, sans-serif; margin: 20px; }
7
+ .table-container { margin-bottom: 40px; }
8
+ h2 { color: #333; }
9
+ table { border-collapse: collapse; width: 100%; margin-bottom: 10px; }
10
+ th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
11
+ th { background-color: #f5f5f5; }
12
+ hr { border: none; border-top: 2px solid #ccc; margin: 20px 0; }
13
+ </style>
14
+ </head>
15
+ <body>
16
+ <h1>Azure Document Intelligence Tables</h1>
17
+
18
+ <div class="table-container">
19
+ <h2>Table 1</h2>
20
+ <table border="1">
21
+ <tr>
22
+ <td>l Sales quote:</td>
23
+ <td>SQ20202722</td>
24
+ </tr>
25
+ <tr>
26
+ <td>l Project code:</td>
27
+ <td>P3016</td>
28
+ </tr>
29
+ <tr>
30
+ <td>l LNB number:</td>
31
+ <td>2023.050</td>
32
+ </tr>
33
+ <tr>
34
+ <td>l Project responsible:</td>
35
+ <td>Nathan Cardon</td>
36
+ </tr>
37
+ <tr>
38
+ <td>l Report name:</td>
39
+ <td>P3016_R11_v00</td>
40
+ </tr>
41
+ </table>
42
+ <hr>
43
+ </div>
44
+
45
+ <div class="table-container">
46
+ <h2>Table 2</h2>
47
+ <table border="1">
48
+ <tr>
49
+ <td>Test sample ID client</td>
50
+ <td>Test sample ID RIC</td>
51
+ <td>Protein concentration (mg/ML)</td>
52
+ </tr>
53
+ <tr>
54
+ <td>P066_FH0.7-0-hulgG-LALAPG-FJB</td>
55
+ <td>aFH0.7_T0</td>
56
+ <td>1.0</td>
57
+ </tr>
58
+ <tr>
59
+ <td>P066_FH0.7-0-hulgG-LALAPG-FJB</td>
60
+ <td>aFH.07_T4W</td>
61
+ <td>1.0</td>
62
+ </tr>
63
+ <tr>
64
+ <td>P066_FHR-1.3B4_0-hulgG-LALAPG-FJB</td>
65
+ <td>FHR-1.3B4_T0</td>
66
+ <td>1.0</td>
67
+ </tr>
68
+ <tr>
69
+ <td>P066_FHR-1.3B4_0-hulgG-LALAPG-FJB</td>
70
+ <td>FHR-1.3B4_T4W</td>
71
+ <td>1.0</td>
72
+ </tr>
73
+ <tr>
74
+ <td>P066_L5_H12_0-hulgG-LALAPG-FJB</td>
75
+ <td>L5_H12_T0</td>
76
+ <td>1.0</td>
77
+ </tr>
78
+ <tr>
79
+ <td>P066_L5_H12_0-hulgG-LALAPG-FJB</td>
80
+ <td>L5_H12_T4W</td>
81
+ <td>1.0</td>
82
+ </tr>
83
+ <tr>
84
+ <td>P066_L5_H31-0-hulgG-LALAPG-FJB</td>
85
+ <td>L5_H31_T0</td>
86
+ <td>1.0</td>
87
+ </tr>
88
+ <tr>
89
+ <td>P066_L5_H31-0-hulgG-LALAPG-FJB</td>
90
+ <td>L5_H31_T4W</td>
91
+ <td>1.0</td>
92
+ </tr>
93
+ <tr>
94
+ <td>P066_L14_H12_0-hulgG-LALAPG-FJB</td>
95
+ <td>L14_H12_T0</td>
96
+ <td>1.0</td>
97
+ </tr>
98
+ <tr>
99
+ <td>P066_L14_H12_0-hulgG-LALAPG-FJB</td>
100
+ <td>L14_H12_T4W</td>
101
+ <td>1.0</td>
102
+ </tr>
103
+ <tr>
104
+ <td>P066_L14_H31_0-hulgG-LALAPG-FJB</td>
105
+ <td>L14_H31_T0</td>
106
+ <td>1.0</td>
107
+ </tr>
108
+ <tr>
109
+ <td>P066_L14_H31_0-hulgG-LALAPG-FJB</td>
110
+ <td>L14-H31_T4W</td>
111
+ <td>1.0</td>
112
+ </tr>
113
+ </table>
114
+ <hr>
115
+ </div>
116
+
117
+ <div class="table-container">
118
+ <h2>Table 3</h2>
119
+ <table border="1">
120
+ <tr>
121
+ <td></td>
122
+ <td>aFH.07_T0</td>
123
+ <td>aFH.07_T4W</td>
124
+ </tr>
125
+ <tr>
126
+ <td>G0-GlcNAc</td>
127
+ <td>5.0%</td>
128
+ <td>4.5%</td>
129
+ </tr>
130
+ <tr>
131
+ <td>Man5</td>
132
+ <td>56.1%</td>
133
+ <td>56.3%</td>
134
+ </tr>
135
+ <tr>
136
+ <td>Man6</td>
137
+ <td>17.6%</td>
138
+ <td>17.4%</td>
139
+ </tr>
140
+ <tr>
141
+ <td>Man7</td>
142
+ <td>20.7%</td>
143
+ <td>21.6%</td>
144
+ </tr>
145
+ <tr>
146
+ <td>Man8</td>
147
+ <td>0.6%</td>
148
+ <td>0.2%</td>
149
+ </tr>
150
+ </table>
151
+ <hr>
152
+ </div>
153
+
154
+ <div class="table-container">
155
+ <h2>Table 4</h2>
156
+ <table border="1">
157
+ <tr>
158
+ <td></td>
159
+ <td>aFH.07_T0</td>
160
+ <td>aFH.07_T4W</td>
161
+ </tr>
162
+ <tr>
163
+ <td>Unknown peak</td>
164
+ <td>0.6%</td>
165
+ <td>1.3%</td>
166
+ </tr>
167
+ <tr>
168
+ <td>HC [G0F/G0] - 2*GlcNAc</td>
169
+ <td>1.5%</td>
170
+ <td>2.0%</td>
171
+ </tr>
172
+ <tr>
173
+ <td>HC [Man5-Man5]</td>
174
+ <td>16.7%</td>
175
+ <td>16.5%</td>
176
+ </tr>
177
+ <tr>
178
+ <td>HC [G0F-Man5]</td>
179
+ <td>10.9%</td>
180
+ <td>11.9%</td>
181
+ </tr>
182
+ <tr>
183
+ <td>HC [G0F/G0] - GlcNAc</td>
184
+ <td>16.5%</td>
185
+ <td>17.2%</td>
186
+ </tr>
187
+ <tr>
188
+ <td>HC [G0F/G0]</td>
189
+ <td>6.5%</td>
190
+ <td>6.0%</td>
191
+ </tr>
192
+ <tr>
193
+ <td>HC [G0F/G0F]</td>
194
+ <td>35.5%</td>
195
+ <td>33.8%</td>
196
+ </tr>
197
+ <tr>
198
+ <td>HC [G0F/G1F]</td>
199
+ <td>6.5%</td>
200
+ <td>5.9%</td>
201
+ </tr>
202
+ <tr>
203
+ <td>HC [G1F/G1F] or HC [G0F/G2F]</td>
204
+ <td>5.0%</td>
205
+ <td>4.8%</td>
206
+ </tr>
207
+ <tr>
208
+ <td>HC [G1F/G2F]</td>
209
+ <td>0.3%</td>
210
+ <td>0.6%</td>
211
+ </tr>
212
+ </table>
213
+ <hr>
214
+ </div>
215
+
216
+ <div class="table-container">
217
+ <h2>Table 5</h2>
218
+ <table border="1">
219
+ <tr>
220
+ <td>Sequence</td>
221
+ <td>Sequence location</td>
222
+ <td>Modification</td>
223
+ <td>Relative abundance</td>
224
+ <td>Relative abundance</td>
225
+ </tr>
226
+ <tr>
227
+ <td>Sequence</td>
228
+ <td>Sequence location</td>
229
+ <td>Modification</td>
230
+ <td>aFH.07_T0</td>
231
+ <td>aFH.07_T4W</td>
232
+ </tr>
233
+ <tr>
234
+ <td>QIVLSQSPTFLSASPGEK</td>
235
+ <td>LC (001-018)</td>
236
+ <td>pyroQ</td>
237
+ <td>86.8%</td>
238
+ <td>99.7%</td>
239
+ </tr>
240
+ <tr>
241
+ <td>QIVLSQSPTFLSASPGEK</td>
242
+ <td>LC (001-018)</td>
243
+ <td></td>
244
+ <td>13.2%</td>
245
+ <td>0.3%</td>
246
+ </tr>
247
+ <tr>
248
+ <td>QVQLQQSGPGLVQPSQSLSITCTVSDFSLAR</td>
249
+ <td>HC (001-031)</td>
250
+ <td>pyroQ</td>
251
+ <td>90.0%</td>
252
+ <td>100.0%</td>
253
+ </tr>
254
+ <tr>
255
+ <td>QVQLQQSGPGLVQPSQSLSITCTVSDFSLAR</td>
256
+ <td>HC (001-031)</td>
257
+ <td></td>
258
+ <td>10.0%</td>
259
+ <td>n.d</td>
260
+ </tr>
261
+ </table>
262
+ <hr>
263
+ </div>
264
+
265
+ <div class="table-container">
266
+ <h2>Table 6</h2>
267
+ <table border="1">
268
+ <tr>
269
+ <td>Sequence</td>
270
+ <td>Sequence location</td>
271
+ <td>Modification</td>
272
+ <td>Relative abundance</td>
273
+ <td>Relative abundance</td>
274
+ </tr>
275
+ <tr>
276
+ <td>Sequence</td>
277
+ <td>Sequence location</td>
278
+ <td>Modification</td>
279
+ <td>aFH.07_T0</td>
280
+ <td>aFH.07_T4W</td>
281
+ </tr>
282
+ <tr>
283
+ <td>YMHWYQQKPGASPKPWIFATSNLASGVPAR</td>
284
+ <td>LC (31-60)</td>
285
+ <td>Oxidation [+16 Da]</td>
286
+ <td>0.9%</td>
287
+ <td>1.0%</td>
288
+ </tr>
289
+ <tr>
290
+ <td>YMHWYQQKPGASPKPWIFATSNLASGVPAR</td>
291
+ <td>LC (31-60)</td>
292
+ <td></td>
293
+ <td>99.1%</td>
294
+ <td>99.0%</td>
295
+ </tr>
296
+ </table>
297
+ <hr>
298
+ </div>
299
+
300
+ <div class="table-container">
301
+ <h2>Table 7</h2>
302
+ <table border="1">
303
+ <tr>
304
+ <td>Sequence</td>
305
+ <td>Sequence location</td>
306
+ <td>Modification</td>
307
+ <td>Relative abundance</td>
308
+ <td>Relative abundance</td>
309
+ </tr>
310
+ <tr>
311
+ <td>Sequence</td>
312
+ <td>Sequence location</td>
313
+ <td>Modification</td>
314
+ <td>aFH.07_T0</td>
315
+ <td>aFH.07_T4W</td>
316
+ </tr>
317
+ <tr>
318
+ <td>LNINKDNSK</td>
319
+ <td>HC (72-75)</td>
320
+ <td></td>
321
+ <td>99.5%</td>
322
+ <td>98.9%</td>
323
+ </tr>
324
+ <tr>
325
+ <td>LNINKDNSK</td>
326
+ <td>HC (72-75)</td>
327
+ <td>Deamidation</td>
328
+ <td>0.5%</td>
329
+ <td>1.1%</td>
330
+ </tr>
331
+ </table>
332
+ <hr>
333
+ </div>
334
+
335
+ <div class="table-container">
336
+ <h2>Table 8</h2>
337
+ <table border="1">
338
+ <tr>
339
+ <td>Sequence</td>
340
+ <td>Sequence location</td>
341
+ <td>Modification</td>
342
+ <td>Relative abundance</td>
343
+ <td>Relative abundance</td>
344
+ </tr>
345
+ <tr>
346
+ <td>Sequence</td>
347
+ <td>Sequence location</td>
348
+ <td>Modification</td>
349
+ <td>aFH.07_T0</td>
350
+ <td>aFH.07_T4W</td>
351
+ </tr>
352
+ <tr>
353
+ <td>VEAEDAATYYCQQWSIIPPTFGNGTK</td>
354
+ <td>LC (77-102)</td>
355
+ <td>GO-GICNAc</td>
356
+ <td>2.6%</td>
357
+ <td>4.0%</td>
358
+ </tr>
359
+ <tr>
360
+ <td>VEAEDAATYYCQQWSIIPPTFGNGTK</td>
361
+ <td>LC (77-102)</td>
362
+ <td>Man5</td>
363
+ <td>54.9%</td>
364
+ <td>57.3%</td>
365
+ </tr>
366
+ <tr>
367
+ <td>VEAEDAATYYCQQWSIIPPTFGNGTK</td>
368
+ <td>LC (77-102)</td>
369
+ <td>Man6</td>
370
+ <td>21.1%</td>
371
+ <td>18.8%</td>
372
+ </tr>
373
+ <tr>
374
+ <td>VEAEDAATYYCQQWSIIPPTFGNGTK</td>
375
+ <td>LC (77-102)</td>
376
+ <td>Man7</td>
377
+ <td>21.4%</td>
378
+ <td>20.0%</td>
379
+ </tr>
380
+ </table>
381
+ <hr>
382
+ </div>
383
+
384
+ <div class="table-container">
385
+ <h2>Table 9</h2>
386
+ <table border="1">
387
+ <tr>
388
+ <td>Sequence</td>
389
+ <td>Sequence location</td>
390
+ <td>Modification</td>
391
+ <td>Relative abundance</td>
392
+ <td>Relative abundance</td>
393
+ </tr>
394
+ <tr>
395
+ <td>Sequence</td>
396
+ <td>Sequence location</td>
397
+ <td>Modification</td>
398
+ <td>aFH.07_T0</td>
399
+ <td>aFH.07_T4W</td>
400
+ </tr>
401
+ <tr>
402
+ <td>MNSLQANDTAIYYCAR</td>
403
+ <td>HC (82-97)</td>
404
+ <td>Non glycosylated</td>
405
+ <td>n.d</td>
406
+ <td>n.d</td>
407
+ </tr>
408
+ <tr>
409
+ <td>MNSLQANDTAIYYCAR</td>
410
+ <td>HC (82-97)</td>
411
+ <td>G0F-GlcNAc</td>
412
+ <td>16.3%</td>
413
+ <td>20.8%</td>
414
+ </tr>
415
+ <tr>
416
+ <td>MNSLQANDTAIYYCAR</td>
417
+ <td>HC (82-97)</td>
418
+ <td>G0</td>
419
+ <td>4.2%</td>
420
+ <td>3.7%</td>
421
+ </tr>
422
+ <tr>
423
+ <td>MNSLQANDTAIYYCAR</td>
424
+ <td>HC (82-97)</td>
425
+ <td>G0F</td>
426
+ <td>36.5%</td>
427
+ <td>34.0%</td>
428
+ </tr>
429
+ <tr>
430
+ <td>MNSLQANDTAIYYCAR</td>
431
+ <td>HC (82-97)</td>
432
+ <td>G1F</td>
433
+ <td>4.9%</td>
434
+ <td>5.1%</td>
435
+ </tr>
436
+ <tr>
437
+ <td>MNSLQANDTAIYYCAR</td>
438
+ <td>HC (82-97)</td>
439
+ <td>G2F</td>
440
+ <td>5.7%</td>
441
+ <td>4.8%</td>
442
+ </tr>
443
+ <tr>
444
+ <td>MNSLQANDTAIYYCAR</td>
445
+ <td>HC (82-97)</td>
446
+ <td>Man5</td>
447
+ <td>32.4%</td>
448
+ <td>31.5%</td>
449
+ </tr>
450
+ </table>
451
+ <hr>
452
+ </div>
453
+
454
+ <div class="table-container">
455
+ <h2>Table 10</h2>
456
+ <table border="1">
457
+ <tr>
458
+ <td>Sequence</td>
459
+ <td>Sequence location</td>
460
+ <td>Modification</td>
461
+ <td>Relative abundance</td>
462
+ <td>Relative abundance</td>
463
+ </tr>
464
+ <tr>
465
+ <td>Sequence</td>
466
+ <td>Sequence location</td>
467
+ <td>Modification</td>
468
+ <td>aFH.07_T0</td>
469
+ <td>aFH.07_T4W</td>
470
+ </tr>
471
+ <tr>
472
+ <td>EEQYNSTYR</td>
473
+ <td>HC (293-301)</td>
474
+ <td>Non glycosylated</td>
475
+ <td>n.d</td>
476
+ <td>n.d</td>
477
+ </tr>
478
+ <tr>
479
+ <td>EEQYNSTYR</td>
480
+ <td>HC (293-301)</td>
481
+ <td>Man5</td>
482
+ <td>20.9%</td>
483
+ <td>22.5%</td>
484
+ </tr>
485
+ <tr>
486
+ <td>EEQYNSTYR</td>
487
+ <td>HC (293-301)</td>
488
+ <td>G0</td>
489
+ <td>n.D</td>
490
+ <td>n.d</td>
491
+ </tr>
492
+ <tr>
493
+ <td>EEQYNSTYR</td>
494
+ <td>HC (293-301)</td>
495
+ <td>G0F</td>
496
+ <td>79.1%</td>
497
+ <td>77.5%</td>
498
+ </tr>
499
+ <tr>
500
+ <td>EEQYNSTYR</td>
501
+ <td>HC (293-301)</td>
502
+ <td>G1F</td>
503
+ <td>n.d</td>
504
+ <td>n.d</td>
505
+ </tr>
506
+ <tr>
507
+ <td>EEQYNSTYR</td>
508
+ <td>HC (293-301)</td>
509
+ <td>G2F</td>
510
+ <td>n.d</td>
511
+ <td>n.d</td>
512
+ </tr>
513
+ </table>
514
+ <hr>
515
+ </div>
516
+
517
+ <div class="table-container">
518
+ <h2>Table 11</h2>
519
+ <table border="1">
520
+ <tr>
521
+ <td>Sequence</td>
522
+ <td>Sequence location</td>
523
+ <td>Modification</td>
524
+ <td>Relative abundance*</td>
525
+ <td>Relative abundance*</td>
526
+ </tr>
527
+ <tr>
528
+ <td>Sequence</td>
529
+ <td>Sequence location</td>
530
+ <td>Modification</td>
531
+ <td>aFH.07_T0</td>
532
+ <td>aFH.07_T4W</td>
533
+ </tr>
534
+ <tr>
535
+ <td>STSGGTAALGCLVK</td>
536
+ <td>HC (134-147)</td>
537
+ <td></td>
538
+ <td>99.9%</td>
539
+ <td>98.8%</td>
540
+ </tr>
541
+ <tr>
542
+ <td>GTAALGCLVK</td>
543
+ <td>HC (134-147)</td>
544
+ <td>Clipping</td>
545
+ <td>0.1%</td>
546
+ <td>1.2%</td>
547
+ </tr>
548
+ </table>
549
+ <hr>
550
+ </div>
551
+
552
+ <div class="table-container">
553
+ <h2>Table 12</h2>
554
+ <table border="1">
555
+ <tr>
556
+ <td>Blue:</td>
557
+ <td>VH and VL</td>
558
+ </tr>
559
+ <tr>
560
+ <td>Blue:</td>
561
+ <td>CDR</td>
562
+ </tr>
563
+ <tr>
564
+ <td>Green:</td>
565
+ <td>N-glycosylation site</td>
566
+ </tr>
567
+ </table>
568
+ <hr>
569
+ </div>
570
+
571
+ <div class="table-container">
572
+ <h2>Table 13</h2>
573
+ <table border="1">
574
+ <tr>
575
+ <td>Sequence</td>
576
+ <td>Sequence location</td>
577
+ <td>Modification</td>
578
+ <td>Relative abundance</td>
579
+ <td>Relative abundance</td>
580
+ </tr>
581
+ <tr>
582
+ <td>Sequence</td>
583
+ <td>Sequence location</td>
584
+ <td>Modification</td>
585
+ <td>FHR-1.3B4_T0</td>
586
+ <td>FHR-1.3B4_T4W</td>
587
+ </tr>
588
+ <tr>
589
+ <td>QIVLSQSPTILSASPGEK</td>
590
+ <td>LC (1-18)</td>
591
+ <td>pyro Q</td>
592
+ <td>96.1%</td>
593
+ <td>100.0%</td>
594
+ </tr>
595
+ <tr>
596
+ <td>QIVLSQSPTILSASPGEK</td>
597
+ <td>LC (1-18)</td>
598
+ <td></td>
599
+ <td>3.9%</td>
600
+ <td>n.d</td>
601
+ </tr>
602
+ <tr>
603
+ <td>QVQLR</td>
604
+ <td>HC (1-5)</td>
605
+ <td>pyro Q</td>
606
+ <td>96.7%</td>
607
+ <td>100.0%</td>
608
+ </tr>
609
+ <tr>
610
+ <td>QVQLR</td>
611
+ <td>HC (1-5)</td>
612
+ <td></td>
613
+ <td>3.3%</td>
614
+ <td>n.d</td>
615
+ </tr>
616
+ </table>
617
+ <hr>
618
+ </div>
619
+
620
+ <div class="table-container">
621
+ <h2>Table 14</h2>
622
+ <table border="1">
623
+ <tr>
624
+ <td>Sequence</td>
625
+ <td>Sequence location</td>
626
+ <td>Modification</td>
627
+ <td>Relative abundance</td>
628
+ <td>Relative abundance</td>
629
+ </tr>
630
+ <tr>
631
+ <td>Sequence</td>
632
+ <td>Sequence location</td>
633
+ <td>Modification</td>
634
+ <td>FHR-1.3B4_T0</td>
635
+ <td>FHR-1.3B4_T4W</td>
636
+ </tr>
637
+ <tr>
638
+ <td>MNSLQADDTAIYYCAR</td>
639
+ <td>HC (82-97)</td>
640
+ <td></td>
641
+ <td>99.3%</td>
642
+ <td>99.0%</td>
643
+ </tr>
644
+ <tr>
645
+ <td>MNSLQADDTAIYYCAR</td>
646
+ <td>HC (82-97)</td>
647
+ <td>Ox [+ 16 Da]</td>
648
+ <td>0.7%</td>
649
+ <td>1.0%</td>
650
+ </tr>
651
+ </table>
652
+ <hr>
653
+ </div>
654
+
655
+ <div class="table-container">
656
+ <h2>Table 15</h2>
657
+ <table border="1">
658
+ <tr>
659
+ <td>Sequence</td>
660
+ <td>Sequence location</td>
661
+ <td>Modification</td>
662
+ <td>Relative abundance</td>
663
+ <td>Relative abundance</td>
664
+ </tr>
665
+ <tr>
666
+ <td>Sequence</td>
667
+ <td>Sequence location</td>
668
+ <td>Modification</td>
669
+ <td>FHR-1.3B4_T0</td>
670
+ <td>FHR-1.3B4_T4W</td>
671
+ </tr>
672
+ <tr>
673
+ <td>MNSLQADDTAIYYCAR</td>
674
+ <td>HC (82-97)</td>
675
+ <td></td>
676
+ <td>97.6%</td>
677
+ <td>79.7%</td>
678
+ </tr>
679
+ <tr>
680
+ <td>MNSLQADDTAIYYCAR</td>
681
+ <td>HC (82-97)</td>
682
+ <td>Deamidation</td>
683
+ <td>2.4%</td>
684
+ <td>20.3%</td>
685
+ </tr>
686
+ </table>
687
+ <hr>
688
+ </div>
689
+
690
+ <div class="table-container">
691
+ <h2>Table 16</h2>
692
+ <table border="1">
693
+ <tr>
694
+ <td>Sequence</td>
695
+ <td>Sequence location</td>
696
+ <td>Modification</td>
697
+ <td>Relative abundance*</td>
698
+ <td>Relative abundance*</td>
699
+ </tr>
700
+ <tr>
701
+ <td>Sequence</td>
702
+ <td>Sequence location</td>
703
+ <td>Modification</td>
704
+ <td>FHR-1.3B4_T0</td>
705
+ <td>FHR-1.3B4_T4W</td>
706
+ </tr>
707
+ <tr>
708
+ <td>STSGGTAALGCLVK</td>
709
+ <td>HC (134-147)</td>
710
+ <td></td>
711
+ <td>99.9%</td>
712
+ <td>98.7%</td>
713
+ </tr>
714
+ <tr>
715
+ <td>GTAALGCLVK</td>
716
+ <td>HC (134-147)</td>
717
+ <td>Clipping</td>
718
+ <td>0.1%</td>
719
+ <td>1.3%</td>
720
+ </tr>
721
+ <tr>
722
+ <td>SSSNPLTFGAGTK</td>
723
+ <td>LC (91-103)</td>
724
+ <td></td>
725
+ <td>99.5%</td>
726
+ <td>97.3%</td>
727
+ </tr>
728
+ <tr>
729
+ <td>PLTFGAGTK</td>
730
+ <td>LC (91-103)</td>
731
+ <td>Clipping</td>
732
+ <td>0.5%</td>
733
+ <td>2.7%</td>
734
+ </tr>
735
+ </table>
736
+ <hr>
737
+ </div>
738
+
739
+ <div class="table-container">
740
+ <h2>Table 17</h2>
741
+ <table border="1">
742
+ <tr>
743
+ <td>Blue:</td>
744
+ <td>VH and VL</td>
745
+ </tr>
746
+ <tr>
747
+ <td>Blue:</td>
748
+ <td>CDR</td>
749
+ </tr>
750
+ <tr>
751
+ <td>Green:</td>
752
+ <td>N-glycosylation site</td>
753
+ </tr>
754
+ </table>
755
+ <hr>
756
+ </div>
757
+
758
+ <div class="table-container">
759
+ <h2>Table 18</h2>
760
+ <table border="1">
761
+ <tr>
762
+ <td>Sequence</td>
763
+ <td>Sequence location</td>
764
+ <td>Modification</td>
765
+ <td>Relative abundance</td>
766
+ <td>Relative abundance</td>
767
+ </tr>
768
+ <tr>
769
+ <td>Sequence</td>
770
+ <td>Sequence location</td>
771
+ <td>Modification</td>
772
+ <td>L5-H12_T0</td>
773
+ <td>L5-H12_T4w</td>
774
+ </tr>
775
+ <tr>
776
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
777
+ <td>HC (001-038)</td>
778
+ <td>pyro Q</td>
779
+ <td>85.5%</td>
780
+ <td>99.3%</td>
781
+ </tr>
782
+ <tr>
783
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
784
+ <td>HC (001-038)</td>
785
+ <td></td>
786
+ <td>14.5%</td>
787
+ <td>0.7%</td>
788
+ </tr>
789
+ </table>
790
+ <hr>
791
+ </div>
792
+
793
+ <div class="table-container">
794
+ <h2>Table 19</h2>
795
+ <table border="1">
796
+ <tr>
797
+ <td>Sequence</td>
798
+ <td>Sequence location</td>
799
+ <td>Modification</td>
800
+ <td>Relative abundance*</td>
801
+ <td>Relative abundance*</td>
802
+ </tr>
803
+ <tr>
804
+ <td>Sequence</td>
805
+ <td>Sequence location</td>
806
+ <td>Modification</td>
807
+ <td>L5-H12_T0</td>
808
+ <td>L5-H12_T4w</td>
809
+ </tr>
810
+ <tr>
811
+ <td>STSGGTAALGCLVK</td>
812
+ <td>HC (134-147)</td>
813
+ <td></td>
814
+ <td>99.9%</td>
815
+ <td>98.7%</td>
816
+ </tr>
817
+ <tr>
818
+ <td>GTAALGCLVK</td>
819
+ <td>HC (134-147)</td>
820
+ <td>Clipping</td>
821
+ <td>0.1%</td>
822
+ <td>1.3%</td>
823
+ </tr>
824
+ <tr>
825
+ <td>SSSNPLTFGAGTK</td>
826
+ <td>LC (91-103)</td>
827
+ <td></td>
828
+ <td>99.8%</td>
829
+ <td>98.9%</td>
830
+ </tr>
831
+ <tr>
832
+ <td>PLTFGAGTK</td>
833
+ <td>LC (91-103)</td>
834
+ <td>Clipping</td>
835
+ <td>0.2%</td>
836
+ <td>1.1%</td>
837
+ </tr>
838
+ </table>
839
+ <hr>
840
+ </div>
841
+
842
+ <div class="table-container">
843
+ <h2>Table 20</h2>
844
+ <table border="1">
845
+ <tr>
846
+ <td>Blue:</td>
847
+ <td>VH and VL</td>
848
+ </tr>
849
+ <tr>
850
+ <td>Blue:</td>
851
+ <td>CDR</td>
852
+ </tr>
853
+ <tr>
854
+ <td>Green:</td>
855
+ <td>N-glycosylation site</td>
856
+ </tr>
857
+ </table>
858
+ <hr>
859
+ </div>
860
+
861
+ <div class="table-container">
862
+ <h2>Table 21</h2>
863
+ <table border="1">
864
+ <tr>
865
+ <td>Sequence</td>
866
+ <td>Sequence location</td>
867
+ <td>Modification</td>
868
+ <td>Relative abundance</td>
869
+ <td>Relative abundance</td>
870
+ </tr>
871
+ <tr>
872
+ <td>Sequence</td>
873
+ <td>Sequence location</td>
874
+ <td>Modification</td>
875
+ <td>L5-H31_T0</td>
876
+ <td>L5-H31_T4w</td>
877
+ </tr>
878
+ <tr>
879
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
880
+ <td>HC (001-038)</td>
881
+ <td>pyro Q</td>
882
+ <td>83.5%</td>
883
+ <td>99.5%</td>
884
+ </tr>
885
+ <tr>
886
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
887
+ <td>HC (001-038)</td>
888
+ <td></td>
889
+ <td>16.5%</td>
890
+ <td>0.5%</td>
891
+ </tr>
892
+ </table>
893
+ <hr>
894
+ </div>
895
+
896
+ <div class="table-container">
897
+ <h2>Table 22</h2>
898
+ <table border="1">
899
+ <tr>
900
+ <td>Sequence</td>
901
+ <td>Sequence location</td>
902
+ <td>Modification</td>
903
+ <td>Relative abundance</td>
904
+ <td>Relative abundance</td>
905
+ </tr>
906
+ <tr>
907
+ <td>Sequence</td>
908
+ <td>Sequence location</td>
909
+ <td>Modification</td>
910
+ <td>L5-H31_T0</td>
911
+ <td>L5-H31_T4w</td>
912
+ </tr>
913
+ <tr>
914
+ <td>NFGNYAMDFWGQGTSVTVSSASTK</td>
915
+ <td>HC(98-121)</td>
916
+ <td>Ox. [+ 16 Da]</td>
917
+ <td>4.9%</td>
918
+ <td>1.9%</td>
919
+ </tr>
920
+ <tr>
921
+ <td>NFGNYAMDFWGQGTSVTVSSASTK</td>
922
+ <td>HC(98-121)</td>
923
+ <td></td>
924
+ <td>95.1%</td>
925
+ <td>98.1%</td>
926
+ </tr>
927
+ </table>
928
+ <hr>
929
+ </div>
930
+
931
+ <div class="table-container">
932
+ <h2>Table 23</h2>
933
+ <table border="1">
934
+ <tr>
935
+ <td>Sequence</td>
936
+ <td>Sequence location</td>
937
+ <td>Modification</td>
938
+ <td>Relative abundance</td>
939
+ <td>Relative abundance</td>
940
+ </tr>
941
+ <tr>
942
+ <td>Sequence</td>
943
+ <td>Sequence location</td>
944
+ <td>Modification</td>
945
+ <td>L5-H31_T0</td>
946
+ <td>L5-H31_T4w</td>
947
+ </tr>
948
+ <tr>
949
+ <td>SSSNPLTFGAGTK</td>
950
+ <td>LC (91-103)</td>
951
+ <td></td>
952
+ <td>99.8%</td>
953
+ <td>99.5%</td>
954
+ </tr>
955
+ <tr>
956
+ <td>SSSNPLTFGAGTK</td>
957
+ <td>LC (91-103)</td>
958
+ <td>deamidation</td>
959
+ <td>0.2%</td>
960
+ <td>0.5%</td>
961
+ </tr>
962
+ </table>
963
+ <hr>
964
+ </div>
965
+
966
+ <div class="table-container">
967
+ <h2>Table 24</h2>
968
+ <table border="1">
969
+ <tr>
970
+ <td>Sequence</td>
971
+ <td>Sequence location</td>
972
+ <td>Modification</td>
973
+ <td>Relative abundance*</td>
974
+ <td>Relative abundance*</td>
975
+ </tr>
976
+ <tr>
977
+ <td>Sequence</td>
978
+ <td>Sequence location</td>
979
+ <td>Modification</td>
980
+ <td>L5-H31_T0</td>
981
+ <td>L5-H31_T4w</td>
982
+ </tr>
983
+ <tr>
984
+ <td>STSGGTAALGCLVK</td>
985
+ <td>HC (134-147)</td>
986
+ <td></td>
987
+ <td>99.9%</td>
988
+ <td>98.8%</td>
989
+ </tr>
990
+ <tr>
991
+ <td>GTAALGCLVK</td>
992
+ <td>HC (134-147)</td>
993
+ <td>Clipping</td>
994
+ <td>0.1%</td>
995
+ <td>1.2%</td>
996
+ </tr>
997
+ <tr>
998
+ <td>SSSNPLTFGAGTK</td>
999
+ <td>LC (91-103)</td>
1000
+ <td></td>
1001
+ <td>99.9%</td>
1002
+ <td>98.8%</td>
1003
+ </tr>
1004
+ <tr>
1005
+ <td>PLTFGAGTK</td>
1006
+ <td>LC (91-103)</td>
1007
+ <td>Clipping</td>
1008
+ <td>0.1%</td>
1009
+ <td>1.2%</td>
1010
+ </tr>
1011
+ </table>
1012
+ <hr>
1013
+ </div>
1014
+
1015
+ <div class="table-container">
1016
+ <h2>Table 25</h2>
1017
+ <table border="1">
1018
+ <tr>
1019
+ <td>Blue:</td>
1020
+ <td>VH and VL</td>
1021
+ </tr>
1022
+ <tr>
1023
+ <td>Blue:</td>
1024
+ <td>CDR</td>
1025
+ </tr>
1026
+ <tr>
1027
+ <td>Green:</td>
1028
+ <td>N-glycosylation site</td>
1029
+ </tr>
1030
+ </table>
1031
+ <hr>
1032
+ </div>
1033
+
1034
+ <div class="table-container">
1035
+ <h2>Table 26</h2>
1036
+ <table border="1">
1037
+ <tr>
1038
+ <td>Sequence</td>
1039
+ <td>Sequence location</td>
1040
+ <td>Modification</td>
1041
+ <td>Relative abundance</td>
1042
+ <td>Relative abundance</td>
1043
+ </tr>
1044
+ <tr>
1045
+ <td>Sequence</td>
1046
+ <td>Sequence location</td>
1047
+ <td>Modification</td>
1048
+ <td>L14-H12_T0</td>
1049
+ <td>L14-H12_T4w</td>
1050
+ </tr>
1051
+ <tr>
1052
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
1053
+ <td>HC(001-038)</td>
1054
+ <td>pyroQ</td>
1055
+ <td>85.9%</td>
1056
+ <td>99.3%</td>
1057
+ </tr>
1058
+ <tr>
1059
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
1060
+ <td>HC(001-038)</td>
1061
+ <td></td>
1062
+ <td>14.1%</td>
1063
+ <td>0.7%</td>
1064
+ </tr>
1065
+ </table>
1066
+ <hr>
1067
+ </div>
1068
+
1069
+ <div class="table-container">
1070
+ <h2>Table 27</h2>
1071
+ <table border="1">
1072
+ <tr>
1073
+ <td>Sequence</td>
1074
+ <td>Sequence location</td>
1075
+ <td>Modification</td>
1076
+ <td>Relative abundance</td>
1077
+ <td>Relative abundance</td>
1078
+ </tr>
1079
+ <tr>
1080
+ <td>Sequence</td>
1081
+ <td>Sequence location</td>
1082
+ <td>Modification</td>
1083
+ <td>L14-H12_T0</td>
1084
+ <td>L14-H12_T4w</td>
1085
+ </tr>
1086
+ <tr>
1087
+ <td>ASTSVTYMHWYQQKPGK</td>
1088
+ <td>LC(25-41)</td>
1089
+ <td>Ox. [+16 Da]</td>
1090
+ <td>0.3%</td>
1091
+ <td>0.3%</td>
1092
+ </tr>
1093
+ <tr>
1094
+ <td>ASTSVTYMHWYQQKPGK</td>
1095
+ <td>LC(25-41)</td>
1096
+ <td></td>
1097
+ <td>99.7%</td>
1098
+ <td>99.7%</td>
1099
+ </tr>
1100
+ </table>
1101
+ <hr>
1102
+ </div>
1103
+
1104
+ <div class="table-container">
1105
+ <h2>Table 28</h2>
1106
+ <table border="1">
1107
+ <tr>
1108
+ <td>Sequence</td>
1109
+ <td>Sequence location</td>
1110
+ <td>Modification</td>
1111
+ <td>Relative abundance</td>
1112
+ <td>Relative abundance</td>
1113
+ </tr>
1114
+ <tr>
1115
+ <td>Sequence</td>
1116
+ <td>Sequence location</td>
1117
+ <td>Modification</td>
1118
+ <td>L14-H12_T0</td>
1119
+ <td>L14-H12_T4w</td>
1120
+ </tr>
1121
+ <tr>
1122
+ <td>SSSNPLTFGAGTK</td>
1123
+ <td>LC (91-103)</td>
1124
+ <td></td>
1125
+ <td>99.9%</td>
1126
+ <td>99.4%</td>
1127
+ </tr>
1128
+ <tr>
1129
+ <td>SSSNPLTFGAGTK</td>
1130
+ <td>LC (91-103)</td>
1131
+ <td>deamidation</td>
1132
+ <td>0.1%</td>
1133
+ <td>0.6%</td>
1134
+ </tr>
1135
+ </table>
1136
+ <hr>
1137
+ </div>
1138
+
1139
+ <div class="table-container">
1140
+ <h2>Table 29</h2>
1141
+ <table border="1">
1142
+ <tr>
1143
+ <td>Sequence</td>
1144
+ <td>Sequence location</td>
1145
+ <td>Modification</td>
1146
+ <td>Relative abundance*</td>
1147
+ <td>Relative abundance*</td>
1148
+ </tr>
1149
+ <tr>
1150
+ <td>Sequence</td>
1151
+ <td>Sequence location</td>
1152
+ <td>Modification</td>
1153
+ <td>L14-H12_T0</td>
1154
+ <td>L14-H12_T4w</td>
1155
+ </tr>
1156
+ <tr>
1157
+ <td>STSGGTAALGCLVK</td>
1158
+ <td>HC (134-147)</td>
1159
+ <td></td>
1160
+ <td>99.9%</td>
1161
+ <td>98.9%</td>
1162
+ </tr>
1163
+ <tr>
1164
+ <td>GTAALGCLVK</td>
1165
+ <td>HC (134-147)</td>
1166
+ <td>Clipping</td>
1167
+ <td>0.1%</td>
1168
+ <td>1.1%</td>
1169
+ </tr>
1170
+ <tr>
1171
+ <td>SSSNPLTFGAGTK</td>
1172
+ <td>LC (91-103)</td>
1173
+ <td></td>
1174
+ <td>99.7%</td>
1175
+ <td>98.6%</td>
1176
+ </tr>
1177
+ <tr>
1178
+ <td>PLTFGAGTK</td>
1179
+ <td>LC (91-103)</td>
1180
+ <td>Clipping</td>
1181
+ <td>0.3%</td>
1182
+ <td>1.4%</td>
1183
+ </tr>
1184
+ </table>
1185
+ <hr>
1186
+ </div>
1187
+
1188
+ <div class="table-container">
1189
+ <h2>Table 30</h2>
1190
+ <table border="1">
1191
+ <tr>
1192
+ <td>Blue:</td>
1193
+ <td>VH and VL</td>
1194
+ </tr>
1195
+ <tr>
1196
+ <td>Blue:</td>
1197
+ <td>CDR</td>
1198
+ </tr>
1199
+ <tr>
1200
+ <td>Green:</td>
1201
+ <td>N-glycosylation site</td>
1202
+ </tr>
1203
+ </table>
1204
+ <hr>
1205
+ </div>
1206
+
1207
+ <div class="table-container">
1208
+ <h2>Table 31</h2>
1209
+ <table border="1">
1210
+ <tr>
1211
+ <td>Sequence</td>
1212
+ <td>Sequence location</td>
1213
+ <td>Modification</td>
1214
+ <td>Relative abundance</td>
1215
+ <td>Relative abundance</td>
1216
+ </tr>
1217
+ <tr>
1218
+ <td>Sequence</td>
1219
+ <td>Sequence location</td>
1220
+ <td>Modification</td>
1221
+ <td>L14-H31_T0</td>
1222
+ <td>L14-H31_T4w</td>
1223
+ </tr>
1224
+ <tr>
1225
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
1226
+ <td>HC(001-038)</td>
1227
+ <td>pyroQ</td>
1228
+ <td>82.6%</td>
1229
+ <td>100.0%</td>
1230
+ </tr>
1231
+ <tr>
1232
+ <td>QVQLQESGPGLVKPSQTLSLTCTVSGFSLTNYGVYWIR</td>
1233
+ <td>HC(001-038)</td>
1234
+ <td></td>
1235
+ <td>17.4%</td>
1236
+ <td>n.d</td>
1237
+ </tr>
1238
+ </table>
1239
+ <hr>
1240
+ </div>
1241
+
1242
+ <div class="table-container">
1243
+ <h2>Table 32</h2>
1244
+ <table border="1">
1245
+ <tr>
1246
+ <td>Sequence</td>
1247
+ <td>Sequence location</td>
1248
+ <td>Modification</td>
1249
+ <td>Relative abundance</td>
1250
+ <td>Relative abundance</td>
1251
+ </tr>
1252
+ <tr>
1253
+ <td>Sequence</td>
1254
+ <td>Sequence location</td>
1255
+ <td>Modification</td>
1256
+ <td>L14-H31_T0</td>
1257
+ <td>L14-H31_T4w</td>
1258
+ </tr>
1259
+ <tr>
1260
+ <td>ASTSVTYMHWYQQKPGK</td>
1261
+ <td>LC(25-41)</td>
1262
+ <td>Ox. [+16 Da]</td>
1263
+ <td>0.5%</td>
1264
+ <td>0.4%</td>
1265
+ </tr>
1266
+ <tr>
1267
+ <td>ASTSVTYMHWYQQKPGK</td>
1268
+ <td>LC(25-41)</td>
1269
+ <td></td>
1270
+ <td>99.5%</td>
1271
+ <td>99.6%</td>
1272
+ </tr>
1273
+ </table>
1274
+ <hr>
1275
+ </div>
1276
+
1277
+ <div class="table-container">
1278
+ <h2>Table 33</h2>
1279
+ <table border="1">
1280
+ <tr>
1281
+ <td>Sequence</td>
1282
+ <td>Sequence location</td>
1283
+ <td>Modification</td>
1284
+ <td>Relative abundance</td>
1285
+ <td>Relative abundance</td>
1286
+ </tr>
1287
+ <tr>
1288
+ <td>Sequence</td>
1289
+ <td>Sequence location</td>
1290
+ <td>Modification</td>
1291
+ <td>L14-H31_T0</td>
1292
+ <td>L14-H31_T4w</td>
1293
+ </tr>
1294
+ <tr>
1295
+ <td>SSSNPLTFGAGTK</td>
1296
+ <td>LC (91-103)</td>
1297
+ <td></td>
1298
+ <td>99.9%</td>
1299
+ <td>99.5%</td>
1300
+ </tr>
1301
+ <tr>
1302
+ <td>SSSNPLTFGAGTK</td>
1303
+ <td>LC (91-103)</td>
1304
+ <td>deamidation</td>
1305
+ <td>0.1%</td>
1306
+ <td>0.5%</td>
1307
+ </tr>
1308
+ </table>
1309
+ <hr>
1310
+ </div>
1311
+
1312
+ <div class="table-container">
1313
+ <h2>Table 34</h2>
1314
+ <table border="1">
1315
+ <tr>
1316
+ <td>Sequence</td>
1317
+ <td>Sequence location</td>
1318
+ <td>Modification</td>
1319
+ <td>Relative abundance*</td>
1320
+ <td>Relative abundance*</td>
1321
+ </tr>
1322
+ <tr>
1323
+ <td>Sequence</td>
1324
+ <td>Sequence location</td>
1325
+ <td>Modification</td>
1326
+ <td>L14-H31_T0</td>
1327
+ <td>L14-H31_T4w</td>
1328
+ </tr>
1329
+ <tr>
1330
+ <td>STSGGTAALGCLVK</td>
1331
+ <td>HC (134-147)</td>
1332
+ <td></td>
1333
+ <td>99.9%</td>
1334
+ <td>98.9%</td>
1335
+ </tr>
1336
+ <tr>
1337
+ <td>GTAALGCLVK</td>
1338
+ <td>HC (134-147)</td>
1339
+ <td>Clipping</td>
1340
+ <td>0.1%</td>
1341
+ <td>1.1%</td>
1342
+ </tr>
1343
+ <tr>
1344
+ <td>SSSNPLTFGAGTK</td>
1345
+ <td>LC (91-103)</td>
1346
+ <td></td>
1347
+ <td>99.7%</td>
1348
+ <td>98.4%</td>
1349
+ </tr>
1350
+ <tr>
1351
+ <td>PLTFGAGTK</td>
1352
+ <td>LC (91-103)</td>
1353
+ <td>Clipping</td>
1354
+ <td>0.3%</td>
1355
+ <td>1.6%</td>
1356
+ </tr>
1357
+ </table>
1358
+ <hr>
1359
+ </div>
1360
+
1361
+ <div class="table-container">
1362
+ <h2>Table 35</h2>
1363
+ <table border="1">
1364
+ <tr>
1365
+ <td>Blue:</td>
1366
+ <td>VH and VL</td>
1367
+ </tr>
1368
+ <tr>
1369
+ <td>Blue:</td>
1370
+ <td>CDR</td>
1371
+ </tr>
1372
+ <tr>
1373
+ <td>Green:</td>
1374
+ <td>N-glycosylation site</td>
1375
+ </tr>
1376
+ </table>
1377
+ <hr>
1378
+ </div>
1379
+
1380
+ <div class="table-container">
1381
+ <h2>Table 36</h2>
1382
+ <table border="1">
1383
+ <tr>
1384
+ <td>Nathan Cardon</td>
1385
+ <td>Date:</td>
1386
+ </tr>
1387
+ <tr>
1388
+ <td>Sr Research Associate</td>
1389
+ <td>Signature:</td>
1390
+ </tr>
1391
+ <tr>
1392
+ <td>Mabelle Meersseman</td>
1393
+ <td>Date:</td>
1394
+ </tr>
1395
+ <tr>
1396
+ <td>Group Leader</td>
1397
+ <td>Signature:</td>
1398
+ </tr>
1399
+ <tr>
1400
+ <td>Approver</td>
1401
+ <td></td>
1402
+ </tr>
1403
+ <tr>
1404
+ <td>Koen Sandra Ph.D.</td>
1405
+ <td>Date:</td>
1406
+ </tr>
1407
+ <tr>
1408
+ <td>CEO</td>
1409
+ <td>Signature:</td>
1410
+ </tr>
1411
+ </table>
1412
+ <hr>
1413
+ </div>
1414
+
1415
+ <div class="table-container">
1416
+ <h2>Table 37</h2>
1417
+ <table border="1">
1418
+ <tr>
1419
+ <td>Version</td>
1420
+ <td>Date of issue</td>
1421
+ <td>Reason for version update</td>
1422
+ </tr>
1423
+ <tr>
1424
+ <td>00</td>
1425
+ <td>25NOV24</td>
1426
+ <td>Draft</td>
1427
+ </tr>
1428
+ <tr>
1429
+ <td></td>
1430
+ <td></td>
1431
+ <td></td>
1432
+ </tr>
1433
+ <tr>
1434
+ <td></td>
1435
+ <td></td>
1436
+ <td></td>
1437
+ </tr>
1438
+ </table>
1439
+ <hr>
1440
+ </div>
1441
+ </body></html>
src/agents/unique_indices_combinator.py CHANGED
@@ -106,6 +106,76 @@ class UniqueIndicesCombinator(BaseAgent):
106
 
107
  return result
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  def _extract_unique_combinations(self, text: str, context: str, unique_indices: List[str], unique_indices_descriptions: Dict[str, str]) -> Optional[str]:
110
  """Use LLM to extract unique combinations of indices from the document."""
111
  self.logger.info("Starting _extract_unique_combinations")
@@ -168,6 +238,8 @@ class UniqueIndicesCombinator(BaseAgent):
168
  - Each object in the combinations array represents one unique combination of values
169
  - Each key must be one of the unique indices: {unique_indices}
170
  - Each value must be a string containing the value found in the document
 
 
171
  - Do not include any explanatory text, notes, or markdown formatting
172
  </formatting rules>
173
 
@@ -210,24 +282,34 @@ class UniqueIndicesCombinator(BaseAgent):
210
  # Handle both new format (object with combinations array) and old format (direct array)
211
  if isinstance(json_value, dict) and "combinations" in json_value:
212
  combinations = json_value["combinations"]
213
- self.logger.info(f"Successfully parsed JSON response with {len(combinations)} unique combinations")
 
 
 
 
 
214
 
215
  # Log the first combination as an example
216
- if combinations and len(combinations) > 0:
217
- self.logger.info(f"Example combination: {json.dumps(combinations[0], indent=2)}")
218
 
219
- self.logger.debug(f"All combinations: {json.dumps(combinations, indent=2)}")
220
- return json.dumps(combinations, indent=2)
221
  elif isinstance(json_value, list):
222
  # Fallback for old format (direct array)
223
- self.logger.info(f"Successfully parsed JSON response with {len(json_value)} unique combinations (legacy format)")
 
 
 
 
 
224
 
225
  # Log the first combination as an example
226
- if json_value and len(json_value) > 0:
227
- self.logger.info(f"Example combination: {json.dumps(json_value[0], indent=2)}")
228
 
229
- self.logger.debug(f"All combinations: {json.dumps(json_value, indent=2)}")
230
- return json.dumps(json_value, indent=2)
231
  else:
232
  self.logger.error(f"Unexpected JSON structure: {json_value}")
233
  return None
 
106
 
107
  return result
108
 
109
+ def _validate_and_clean_combinations(self, combinations: List[Dict[str, str]], unique_indices_descriptions: Dict[str, str]) -> List[Dict[str, str]]:
110
+ """Validate and clean unique combinations against possible values."""
111
+ cleaned_combinations = []
112
+
113
+ for i, combination in enumerate(combinations):
114
+ self.logger.info(f"Validating combination {i+1}: {combination}")
115
+ cleaned_combination = {}
116
+
117
+ for index, value in combination.items():
118
+ if not value or value in ["null", "None", ""]:
119
+ cleaned_combination[index] = value
120
+ continue
121
+
122
+ # Get possible values for this index
123
+ index_desc = unique_indices_descriptions.get(index, {})
124
+ if isinstance(index_desc, dict):
125
+ possible_values = index_desc.get('possible_values', '')
126
+ else:
127
+ possible_values = ''
128
+
129
+ if possible_values:
130
+ # Parse possible values (could be comma-separated string or list)
131
+ if isinstance(possible_values, str):
132
+ possible_list = [v.strip() for v in possible_values.split(',') if v.strip()]
133
+ else:
134
+ possible_list = possible_values
135
+
136
+ self.logger.info(f"Validating index '{index}' with value '{value}' against possible values: {possible_list}")
137
+
138
+ # First check for exact match
139
+ if value in possible_list:
140
+ cleaned_combination[index] = value
141
+ self.logger.info(f"Exact match found for '{index}': {value}")
142
+ continue
143
+
144
+ # Check for substring matches (e.g., "T0wxy" contains "0w")
145
+ best_match = None
146
+ best_score = 0
147
+
148
+ for possible_value in possible_list:
149
+ # Check if possible value is contained in extracted value
150
+ if possible_value in value:
151
+ score = len(possible_value) / len(value) # Prefer longer matches
152
+ if score > best_score:
153
+ best_match = possible_value
154
+ best_score = score
155
+
156
+ # Also check if extracted value is contained in possible value
157
+ elif value in possible_value:
158
+ score = len(value) / len(possible_value)
159
+ if score > best_score:
160
+ best_match = possible_value
161
+ best_score = score
162
+
163
+ if best_match and best_score > 0.3: # At least 30% match
164
+ cleaned_combination[index] = best_match
165
+ self.logger.info(f"Cleaned '{index}': '{value}' -> '{best_match}' (score: {best_score:.2f})")
166
+ else:
167
+ # No good match found, keep original but log warning
168
+ cleaned_combination[index] = value
169
+ self.logger.warning(f"No good match found for '{index}' value '{value}' in possible values {possible_list}")
170
+ else:
171
+ # No possible values specified, keep original
172
+ cleaned_combination[index] = value
173
+
174
+ cleaned_combinations.append(cleaned_combination)
175
+ self.logger.info(f"Cleaned combination {i+1}: {cleaned_combination}")
176
+
177
+ return cleaned_combinations
178
+
179
  def _extract_unique_combinations(self, text: str, context: str, unique_indices: List[str], unique_indices_descriptions: Dict[str, str]) -> Optional[str]:
180
  """Use LLM to extract unique combinations of indices from the document."""
181
  self.logger.info("Starting _extract_unique_combinations")
 
238
  - Each object in the combinations array represents one unique combination of values
239
  - Each key must be one of the unique indices: {unique_indices}
240
  - Each value must be a string containing the value found in the document
241
+ - **IMPORTANT**: When possible values are specified for an index, you MUST return exactly one of those values, not variations or partial matches
242
+ - Clean the extracted values to remove any formatting characters, spaces, or hidden characters
243
  - Do not include any explanatory text, notes, or markdown formatting
244
  </formatting rules>
245
 
 
282
  # Handle both new format (object with combinations array) and old format (direct array)
283
  if isinstance(json_value, dict) and "combinations" in json_value:
284
  combinations = json_value["combinations"]
285
+ self.logger.info(f"Raw parsed JSON response with {len(combinations)} unique combinations")
286
+
287
+ # Validate and clean the combinations against possible values
288
+ cleaned_combinations = self._validate_and_clean_combinations(combinations, unique_indices_descriptions)
289
+
290
+ self.logger.info(f"Successfully validated and cleaned {len(cleaned_combinations)} unique combinations")
291
 
292
  # Log the first combination as an example
293
+ if cleaned_combinations and len(cleaned_combinations) > 0:
294
+ self.logger.info(f"Example combination: {json.dumps(cleaned_combinations[0], indent=2)}")
295
 
296
+ self.logger.debug(f"All combinations: {json.dumps(cleaned_combinations, indent=2)}")
297
+ return json.dumps(cleaned_combinations, indent=2)
298
  elif isinstance(json_value, list):
299
  # Fallback for old format (direct array)
300
+ self.logger.info(f"Raw parsed JSON response with {len(json_value)} unique combinations (legacy format)")
301
+
302
+ # Validate and clean the combinations against possible values
303
+ cleaned_combinations = self._validate_and_clean_combinations(json_value, unique_indices_descriptions)
304
+
305
+ self.logger.info(f"Successfully validated and cleaned {len(cleaned_combinations)} unique combinations (legacy format)")
306
 
307
  # Log the first combination as an example
308
+ if cleaned_combinations and len(cleaned_combinations) > 0:
309
+ self.logger.info(f"Example combination: {json.dumps(cleaned_combinations[0], indent=2)}")
310
 
311
+ self.logger.debug(f"All combinations: {json.dumps(cleaned_combinations, indent=2)}")
312
+ return json.dumps(cleaned_combinations, indent=2)
313
  else:
314
  self.logger.error(f"Unexpected JSON structure: {json_value}")
315
  return None
src/agents/unique_indices_loop_agent.py CHANGED
@@ -105,6 +105,69 @@ class UniqueIndicesLoopAgent(BaseAgent):
105
  self.logger.warning("No complete results generated")
106
  return None
107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  def _extract_additional_fields(self, text: str, context: str, combination: Dict[str, str],
109
  fields_to_extract: List[str], field_descriptions: Dict) -> Optional[Dict[str, str]]:
110
  """Extract additional field values for a specific unique combination."""
@@ -153,8 +216,10 @@ class UniqueIndicesLoopAgent(BaseAgent):
153
  1. Find the section of the document that corresponds to this specific unique combination
154
  2. Extract the values for the additional fields: {', '.join(fields_to_extract)}
155
  3. Look for data that matches this specific combination (Protein Lot, Peptide, Timepoint, Modification)
156
- 4. Return ONLY the JSON object with the additional field values, no explanations
157
- 5. If a field value is not found, use null or empty string
 
 
158
 
159
  Example response format:
160
  {{
@@ -185,8 +250,13 @@ class UniqueIndicesLoopAgent(BaseAgent):
185
  if result and result.lower() not in ["none", "null", "n/a"]:
186
  try:
187
  json_value = json.loads(result)
188
- self.logger.info(f"Successfully extracted additional fields: {json.dumps(json_value, indent=2)}")
189
- return json_value
 
 
 
 
 
190
  except json.JSONDecodeError:
191
  self.logger.error(f"Failed to parse LLM response as JSON")
192
  self.logger.error(f"Invalid JSON response: {result}")
 
105
  self.logger.warning("No complete results generated")
106
  return None
107
 
108
+ def _validate_and_clean_values(self, extracted_values: Dict[str, str], field_descriptions: Dict) -> Dict[str, str]:
109
+ """Validate and clean extracted values against possible values."""
110
+ cleaned_values = {}
111
+
112
+ for field, value in extracted_values.items():
113
+ if not value or value in ["null", "None", ""]:
114
+ cleaned_values[field] = value
115
+ continue
116
+
117
+ # Get possible values for this field
118
+ field_desc = field_descriptions.get(field, {})
119
+ if isinstance(field_desc, dict):
120
+ possible_values = field_desc.get('possible_values', '')
121
+ else:
122
+ possible_values = ''
123
+
124
+ if possible_values:
125
+ # Parse possible values (could be comma-separated string or list)
126
+ if isinstance(possible_values, str):
127
+ possible_list = [v.strip() for v in possible_values.split(',') if v.strip()]
128
+ else:
129
+ possible_list = possible_values
130
+
131
+ self.logger.info(f"Validating field '{field}' with value '{value}' against possible values: {possible_list}")
132
+
133
+ # First check for exact match
134
+ if value in possible_list:
135
+ cleaned_values[field] = value
136
+ self.logger.info(f"Exact match found for '{field}': {value}")
137
+ continue
138
+
139
+ # Check for substring matches (e.g., "T0wxy" contains "0w")
140
+ best_match = None
141
+ best_score = 0
142
+
143
+ for possible_value in possible_list:
144
+ # Check if possible value is contained in extracted value
145
+ if possible_value in value:
146
+ score = len(possible_value) / len(value) # Prefer longer matches
147
+ if score > best_score:
148
+ best_match = possible_value
149
+ best_score = score
150
+
151
+ # Also check if extracted value is contained in possible value
152
+ elif value in possible_value:
153
+ score = len(value) / len(possible_value)
154
+ if score > best_score:
155
+ best_match = possible_value
156
+ best_score = score
157
+
158
+ if best_match and best_score > 0.3: # At least 30% match
159
+ cleaned_values[field] = best_match
160
+ self.logger.info(f"Cleaned '{field}': '{value}' -> '{best_match}' (score: {best_score:.2f})")
161
+ else:
162
+ # No good match found, keep original but log warning
163
+ cleaned_values[field] = value
164
+ self.logger.warning(f"No good match found for '{field}' value '{value}' in possible values {possible_list}")
165
+ else:
166
+ # No possible values specified, keep original
167
+ cleaned_values[field] = value
168
+
169
+ return cleaned_values
170
+
171
  def _extract_additional_fields(self, text: str, context: str, combination: Dict[str, str],
172
  fields_to_extract: List[str], field_descriptions: Dict) -> Optional[Dict[str, str]]:
173
  """Extract additional field values for a specific unique combination."""
 
216
  1. Find the section of the document that corresponds to this specific unique combination
217
  2. Extract the values for the additional fields: {', '.join(fields_to_extract)}
218
  3. Look for data that matches this specific combination (Protein Lot, Peptide, Timepoint, Modification)
219
+ 4. **IMPORTANT**: When possible values are specified for a field, you MUST return exactly one of those values, not variations or partial matches
220
+ 5. Clean the extracted values to remove any formatting characters, spaces, or hidden characters
221
+ 6. Return ONLY the JSON object with the additional field values, no explanations
222
+ 7. If a field value is not found, use null or empty string
223
 
224
  Example response format:
225
  {{
 
250
  if result and result.lower() not in ["none", "null", "n/a"]:
251
  try:
252
  json_value = json.loads(result)
253
+ self.logger.info(f"Raw extracted fields: {json.dumps(json_value, indent=2)}")
254
+
255
+ # Validate and clean extracted values against possible values
256
+ cleaned_values = self._validate_and_clean_values(json_value, field_descriptions)
257
+
258
+ self.logger.info(f"Successfully extracted and cleaned additional fields: {json.dumps(cleaned_values, indent=2)}")
259
+ return cleaned_values
260
  except json.JSONDecodeError:
261
  self.logger.error(f"Failed to parse LLM response as JSON")
262
  self.logger.error(f"Invalid JSON response: {result}")