Spaces:
Sleeping
Sleeping
File size: 27,936 Bytes
ef4c8c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
2025-06-12 18:18:01,037 - WARNING - Using default email for Entrez. Set ENTREZ_EMAIL environment variable.
2025-06-12 18:18:01,037 - INFO - Starting arXiv paper collection...
2025-06-12 18:18:01,038 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100
2025-06-12 18:18:03,165 - INFO - Got first page: 100 of 1236760 total results
2025-06-12 18:18:03,172 - INFO - Sleeping: 2.828948 seconds
2025-06-12 18:18:06,004 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=100&max_results=100
2025-06-12 18:18:06,953 - INFO - Sleeping: 2.866122 seconds
2025-06-12 18:18:09,824 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=200&max_results=100
2025-06-12 18:18:11,783 - INFO - Sleeping: 2.823819 seconds
2025-06-12 18:18:14,608 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=300&max_results=100
2025-06-12 18:18:16,436 - INFO - Sleeping: 2.857095 seconds
2025-06-12 18:18:19,301 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=400&max_results=100
2025-06-12 18:18:22,022 - INFO - Sleeping: 2.790207 seconds
2025-06-12 18:18:24,820 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:18:25,173 - INFO - Sleeping: 2.998001 seconds
2025-06-12 18:18:28,181 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:18:28,988 - INFO - Sleeping: 2.999010 seconds
2025-06-12 18:18:32,000 - INFO - Requesting page (first: False, try: 2): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:18:32,507 - INFO - Sleeping: 2.998957 seconds
2025-06-12 18:18:35,519 - INFO - Requesting page (first: False, try: 3): https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:18:36,061 - WARNING - Empty page returned for query 'cat:physics* OR cat:astro-ph* OR cat:cond-mat* OR cat:hep-th OR cat:quant-ph OR cat:math-ph': Page of results was unexpectedly empty (https://export.arxiv.org/api/query?search_query=cat%3Aphysics%2A+OR+cat%3Aastro-ph%2A+OR+cat%3Acond-mat%2A+OR+cat%3Ahep-th+OR+cat%3Aquant-ph+OR+cat%3Amath-ph&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100)
2025-06-12 18:18:36,065 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100
2025-06-12 18:18:36,888 - INFO - Got first page: 100 of 50293 total results
2025-06-12 18:18:36,896 - INFO - Sleeping: 2.871087 seconds
2025-06-12 18:18:39,783 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=100&max_results=100
2025-06-12 18:18:40,466 - INFO - Sleeping: 2.870444 seconds
2025-06-12 18:18:43,339 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=200&max_results=100
2025-06-12 18:18:44,012 - INFO - Sleeping: 2.874603 seconds
2025-06-12 18:18:46,893 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=300&max_results=100
2025-06-12 18:18:47,688 - INFO - Sleeping: 2.858048 seconds
2025-06-12 18:18:50,552 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=400&max_results=100
2025-06-12 18:18:51,370 - INFO - Sleeping: 2.870823 seconds
2025-06-12 18:18:54,246 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:18:54,960 - INFO - Sleeping: 2.886596 seconds
2025-06-12 18:18:57,856 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=600&max_results=100
2025-06-12 18:18:58,568 - INFO - Sleeping: 2.886486 seconds
2025-06-12 18:19:01,466 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=700&max_results=100
2025-06-12 18:19:02,219 - INFO - Sleeping: 2.867826 seconds
2025-06-12 18:19:05,103 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=800&max_results=100
2025-06-12 18:19:06,346 - INFO - Sleeping: 2.766637 seconds
2025-06-12 18:19:09,120 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=900&max_results=100
2025-06-12 18:19:10,043 - INFO - Sleeping: 2.877552 seconds
2025-06-12 18:19:12,929 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1000&max_results=100
2025-06-12 18:19:13,641 - INFO - Sleeping: 2.873434 seconds
2025-06-12 18:19:16,525 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1100&max_results=100
2025-06-12 18:19:17,281 - INFO - Sleeping: 2.871482 seconds
2025-06-12 18:19:20,161 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1200&max_results=100
2025-06-12 18:19:20,990 - INFO - Sleeping: 2.872492 seconds
2025-06-12 18:19:23,876 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1300&max_results=100
2025-06-12 18:19:24,633 - INFO - Sleeping: 2.873157 seconds
2025-06-12 18:19:27,510 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1400&max_results=100
2025-06-12 18:19:28,249 - INFO - Sleeping: 2.872219 seconds
2025-06-12 18:19:31,132 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1500&max_results=100
2025-06-12 18:19:31,787 - INFO - Sleeping: 2.871294 seconds
2025-06-12 18:19:34,660 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1600&max_results=100
2025-06-12 18:19:35,423 - INFO - Sleeping: 2.864608 seconds
2025-06-12 18:19:38,291 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1700&max_results=100
2025-06-12 18:19:38,496 - INFO - Sleeping: 2.998046 seconds
2025-06-12 18:19:41,498 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1700&max_results=100
2025-06-12 18:19:41,682 - INFO - Sleeping: 2.998049 seconds
2025-06-12 18:19:44,693 - INFO - Requesting page (first: False, try: 2): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1700&max_results=100
2025-06-12 18:19:45,568 - INFO - Sleeping: 2.874692 seconds
2025-06-12 18:19:48,448 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1800&max_results=100
2025-06-12 18:19:48,654 - INFO - Sleeping: 2.998000 seconds
2025-06-12 18:19:51,668 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1800&max_results=100
2025-06-12 18:19:52,436 - INFO - Sleeping: 2.877867 seconds
2025-06-12 18:19:55,323 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1900&max_results=100
2025-06-12 18:19:56,074 - INFO - Sleeping: 2.878102 seconds
2025-06-12 18:19:58,961 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2000&max_results=100
2025-06-12 18:19:59,730 - INFO - Sleeping: 2.846435 seconds
2025-06-12 18:20:02,587 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2100&max_results=100
2025-06-12 18:20:02,802 - INFO - Sleeping: 2.997978 seconds
2025-06-12 18:20:05,801 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2100&max_results=100
2025-06-12 18:20:06,645 - INFO - Sleeping: 2.882026 seconds
2025-06-12 18:20:09,537 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2200&max_results=100
2025-06-12 18:20:10,681 - INFO - Sleeping: 2.867912 seconds
2025-06-12 18:20:13,558 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2300&max_results=100
2025-06-12 18:20:15,163 - INFO - Sleeping: 2.874383 seconds
2025-06-12 18:20:18,052 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2400&max_results=100
2025-06-12 18:20:19,022 - INFO - Sleeping: 2.885731 seconds
2025-06-12 18:20:21,916 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2500&max_results=100
2025-06-12 18:20:22,743 - INFO - Sleeping: 2.880111 seconds
2025-06-12 18:20:25,633 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2600&max_results=100
2025-06-12 18:20:26,848 - INFO - Sleeping: 2.877337 seconds
2025-06-12 18:20:29,728 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2700&max_results=100
2025-06-12 18:20:29,961 - INFO - Sleeping: 2.999086 seconds
2025-06-12 18:20:32,973 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2700&max_results=100
2025-06-12 18:20:33,783 - INFO - Sleeping: 2.870358 seconds
2025-06-12 18:20:36,664 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2800&max_results=100
2025-06-12 18:20:36,929 - INFO - Sleeping: 2.997254 seconds
2025-06-12 18:20:39,936 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2800&max_results=100
2025-06-12 18:20:40,834 - INFO - Sleeping: 2.876953 seconds
2025-06-12 18:20:43,716 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Aq-bio%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2900&max_results=100
2025-06-12 18:20:44,816 - INFO - Requesting page (first: True, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=0&max_results=100
2025-06-12 18:20:46,192 - INFO - Got first page: 100 of 100310 total results
2025-06-12 18:20:46,198 - INFO - Sleeping: 2.859482 seconds
2025-06-12 18:20:49,073 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=100&max_results=100
2025-06-12 18:20:49,789 - INFO - Sleeping: 2.869352 seconds
2025-06-12 18:20:52,669 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=200&max_results=100
2025-06-12 18:20:53,467 - INFO - Sleeping: 2.862511 seconds
2025-06-12 18:20:56,338 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=300&max_results=100
2025-06-12 18:20:57,071 - INFO - Sleeping: 2.870255 seconds
2025-06-12 18:20:59,951 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=400&max_results=100
2025-06-12 18:21:00,728 - INFO - Sleeping: 2.869636 seconds
2025-06-12 18:21:03,604 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=500&max_results=100
2025-06-12 18:21:04,393 - INFO - Sleeping: 2.865000 seconds
2025-06-12 18:21:07,272 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=600&max_results=100
2025-06-12 18:21:08,029 - INFO - Sleeping: 2.858943 seconds
2025-06-12 18:21:10,895 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=700&max_results=100
2025-06-12 18:21:11,768 - INFO - Sleeping: 2.866744 seconds
2025-06-12 18:21:14,640 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=800&max_results=100
2025-06-12 18:21:15,488 - INFO - Sleeping: 2.720050 seconds
2025-06-12 18:21:18,211 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=900&max_results=100
2025-06-12 18:21:19,122 - INFO - Sleeping: 2.844511 seconds
2025-06-12 18:21:21,982 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1000&max_results=100
2025-06-12 18:21:22,772 - INFO - Sleeping: 2.871176 seconds
2025-06-12 18:21:25,647 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1100&max_results=100
2025-06-12 18:21:25,925 - INFO - Sleeping: 2.997949 seconds
2025-06-12 18:21:28,932 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1100&max_results=100
2025-06-12 18:21:29,774 - INFO - Sleeping: 2.864288 seconds
2025-06-12 18:21:32,644 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1200&max_results=100
2025-06-12 18:21:33,454 - INFO - Sleeping: 2.860076 seconds
2025-06-12 18:21:36,317 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1300&max_results=100
2025-06-12 18:21:36,605 - INFO - Sleeping: 2.997453 seconds
2025-06-12 18:21:39,607 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1300&max_results=100
2025-06-12 18:21:40,404 - INFO - Sleeping: 2.856277 seconds
2025-06-12 18:21:43,276 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1400&max_results=100
2025-06-12 18:21:44,085 - INFO - Sleeping: 2.862912 seconds
2025-06-12 18:21:46,964 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1500&max_results=100
2025-06-12 18:21:47,858 - INFO - Sleeping: 2.860433 seconds
2025-06-12 18:21:50,732 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1600&max_results=100
2025-06-12 18:21:51,504 - INFO - Sleeping: 2.874451 seconds
2025-06-12 18:21:54,387 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1700&max_results=100
2025-06-12 18:21:55,722 - INFO - Sleeping: 2.859315 seconds
2025-06-12 18:21:58,585 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1800&max_results=100
2025-06-12 18:21:59,503 - INFO - Sleeping: 2.863854 seconds
2025-06-12 18:22:02,377 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1900&max_results=100
2025-06-12 18:22:02,618 - INFO - Sleeping: 2.997967 seconds
2025-06-12 18:22:05,628 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=1900&max_results=100
2025-06-12 18:22:06,677 - INFO - Sleeping: 2.844775 seconds
2025-06-12 18:22:09,533 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2000&max_results=100
2025-06-12 18:22:09,792 - INFO - Sleeping: 2.998977 seconds
2025-06-12 18:22:12,797 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2000&max_results=100
2025-06-12 18:22:13,677 - INFO - Sleeping: 2.860952 seconds
2025-06-12 18:22:16,551 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2100&max_results=100
2025-06-12 18:22:17,381 - INFO - Sleeping: 2.862895 seconds
2025-06-12 18:22:20,259 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2200&max_results=100
2025-06-12 18:22:21,092 - INFO - Sleeping: 2.865440 seconds
2025-06-12 18:22:23,963 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2300&max_results=100
2025-06-12 18:22:24,738 - INFO - Sleeping: 2.854685 seconds
2025-06-12 18:22:27,605 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2400&max_results=100
2025-06-12 18:22:28,443 - INFO - Sleeping: 2.866245 seconds
2025-06-12 18:22:31,321 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2500&max_results=100
2025-06-12 18:22:32,401 - INFO - Sleeping: 2.857156 seconds
2025-06-12 18:22:35,269 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2600&max_results=100
2025-06-12 18:22:35,481 - INFO - Sleeping: 2.997016 seconds
2025-06-12 18:22:38,486 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2600&max_results=100
2025-06-12 18:22:39,346 - INFO - Sleeping: 2.856990 seconds
2025-06-12 18:22:42,208 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2700&max_results=100
2025-06-12 18:22:43,031 - INFO - Sleeping: 2.852790 seconds
2025-06-12 18:22:45,889 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2800&max_results=100
2025-06-12 18:22:46,748 - INFO - Sleeping: 2.858054 seconds
2025-06-12 18:22:49,610 - INFO - Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2900&max_results=100
2025-06-12 18:22:49,923 - INFO - Sleeping: 2.997999 seconds
2025-06-12 18:22:52,927 - INFO - Requesting page (first: False, try: 1): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2900&max_results=100
2025-06-12 18:22:53,180 - INFO - Sleeping: 2.998443 seconds
2025-06-12 18:22:56,182 - INFO - Requesting page (first: False, try: 2): https://export.arxiv.org/api/query?search_query=cat%3Acond-mat.mtrl-sci+OR+cat%3Amaterials%2A&id_list=&sortBy=submittedDate&sortOrder=descending&start=2900&max_results=100
2025-06-12 18:22:57,297 - INFO - Saved checkpoint to scientific_corpus_data\arxiv_papers.jsonl
2025-06-12 18:22:57,297 - INFO - Collected 5989 arXiv papers in 296.26s
2025-06-12 18:22:57,310 - INFO - Starting PubMed paper collection...
2025-06-12 18:23:14,143 - INFO - Saved checkpoint to scientific_corpus_data\pubmed_papers.jsonl
2025-06-12 18:23:14,143 - INFO - Collected 2671 PubMed papers in 16.83s
2025-06-12 18:23:14,143 - INFO - Starting FineWeb-Edu collection...
2025-06-12 18:23:34,470 - INFO - Collected 10000 FineWeb samples
2025-06-12 18:23:38,652 - INFO - Collected 20000 FineWeb samples
2025-06-12 18:23:43,218 - INFO - Collected 30000 FineWeb samples
2025-06-12 18:23:43,221 - INFO - Processing 30000 FineWeb samples
2025-06-12 18:24:03,830 - INFO - Saved checkpoint to scientific_corpus_data\fineweb_edu.jsonl
2025-06-12 18:24:03,831 - INFO - Collected 29616 FineWeb-Edu papers in 49.69s
2025-06-12 18:24:03,873 - INFO - Processing 5989 arxiv papers...
2025-06-12 18:24:05,244 - INFO - Processed 5989/5989 arxiv papers
2025-06-12 18:24:05,244 - INFO - Unknown domains: 0, Unknown sections: 3349
2025-06-12 18:24:05,244 - INFO - Processing 2671 biology papers...
2025-06-12 18:24:05,765 - INFO - Processed 2605/2671 biology papers
2025-06-12 18:24:05,765 - INFO - Unknown domains: 0, Unknown sections: 1015
2025-06-12 18:24:05,765 - INFO - Processing 29616 education papers...
2025-06-12 18:24:39,231 - INFO - Processed 159402/29616 education papers
2025-06-12 18:24:39,231 - INFO - Unknown domains: 29616, Unknown sections: 21161
2025-06-12 19:06:41,335 - INFO - Received signal 2, shutting down gracefully. Frame: <frame at 0x0000023E5AF0BBC0, file 'C:\\Users\\kunya\\AppData\\Local\\Programs\\Python\\Python310\\lib\\threading.py', line 320, code wait>
2025-06-12 19:06:43,708 - WARNING - Using default email for Entrez. Set ENTREZ_EMAIL environment variable.
2025-06-12 19:06:43,710 - INFO - Starting arXiv paper collection...
2025-06-12 19:06:43,711 - INFO - Saved checkpoint to scientific_corpus_data\arxiv_papers.jsonl
2025-06-12 19:06:43,712 - INFO - Collected 0 arXiv papers in 0.00s
2025-06-12 19:06:43,713 - INFO - Starting PubMed paper collection...
2025-06-12 19:06:43,715 - INFO - Saved checkpoint to scientific_corpus_data\pubmed_papers.jsonl
2025-06-12 19:06:43,715 - INFO - Collected 0 PubMed papers in 0.00s
2025-06-12 19:06:43,716 - INFO - Shutdown in progress, aborting retries.
2025-06-12 19:16:11,718 - INFO - Received signal 2, shutting down gracefully. Frame: <frame at 0x0000023E7696F880, file 'C:\\Users\\kunya\\AppData\\Local\\Programs\\Python\\Python310\\lib\\selectors.py', line 315, code _select>
|