add ressources 1 (#32)
Browse files- add ressources 1 (31d2d7654c86abf91f87a59a26e5ade4a44a1825)
Co-authored-by: Elie Bakouch <[email protected]>
- src/index.html +51 -1
src/index.html
CHANGED
|
@@ -2464,6 +2464,11 @@
|
|
| 2464 |
<a href="https://arxiv.org/abs/2312.11805"><strong>Gemini</strong></a>
|
| 2465 |
<p>Presents Google's multimodal model architecture capable of processing text, images, audio, and video inputs.</p>
|
| 2466 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2467 |
|
| 2468 |
<div>
|
| 2469 |
<a href="https://arxiv.org/abs/2412.19437v1"><strong>DeepSeek-V3</strong></a>
|
|
@@ -2472,7 +2477,6 @@
|
|
| 2472 |
|
| 2473 |
|
| 2474 |
<h3>Training Frameworks</h3>
|
| 2475 |
-
|
| 2476 |
<div>
|
| 2477 |
<a href="https://github.com/facebookresearch/fairscale/tree/main"><strong>FairScale</strong></a>
|
| 2478 |
<p>PyTorch extension library for large-scale training, offering various parallelism and optimization techniques.</p>
|
|
@@ -2525,6 +2529,11 @@
|
|
| 2525 |
<p>Comprehensive guide to understanding and optimizing GPU memory usage in PyTorch.</p>
|
| 2526 |
</div>
|
| 2527 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2528 |
<div>
|
| 2529 |
<a href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html"><strong>TensorBoard Profiler Tutorial</strong></a>
|
| 2530 |
<p>Guide to using TensorBoard's profiling tools for PyTorch models.</p>
|
|
@@ -2586,6 +2595,11 @@
|
|
| 2586 |
<a href="https://arxiv.org/abs/1710.03740"><strong>Mixed precision training</strong></a>
|
| 2587 |
<p>Introduces mixed precision training techniques for deep learning models.</p>
|
| 2588 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2589 |
|
| 2590 |
<h3>Hardware</h3>
|
| 2591 |
|
|
@@ -2603,6 +2617,11 @@
|
|
| 2603 |
<a href="https://www.semianalysis.com/p/100000-h100-clusters-power-network"><strong>Semianalysis - 100k H100 cluster</strong></a>
|
| 2604 |
<p>Analysis of large-scale H100 GPU clusters and their implications for AI infrastructure.</p>
|
| 2605 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2606 |
|
| 2607 |
<h3>Others</h3>
|
| 2608 |
|
|
@@ -2630,6 +2649,37 @@
|
|
| 2630 |
<a href="https://www.harmdevries.com/post/context-length/"><strong>Harm's blog for long context</strong></a>
|
| 2631 |
<p>Investigation into long context training in terms of data and training cost.</p>
|
| 2632 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2633 |
|
| 2634 |
<h2>Appendix</h2>
|
| 2635 |
|
|
|
|
| 2464 |
<a href="https://arxiv.org/abs/2312.11805"><strong>Gemini</strong></a>
|
| 2465 |
<p>Presents Google's multimodal model architecture capable of processing text, images, audio, and video inputs.</p>
|
| 2466 |
</div>
|
| 2467 |
+
|
| 2468 |
+
<div>
|
| 2469 |
+
<a href="https://arxiv.org/abs/2407.21783"><strong>Llama 3</strong></a>
|
| 2470 |
+
<p>The Llama 3 Herd of Models</p>
|
| 2471 |
+
</div>
|
| 2472 |
|
| 2473 |
<div>
|
| 2474 |
<a href="https://arxiv.org/abs/2412.19437v1"><strong>DeepSeek-V3</strong></a>
|
|
|
|
| 2477 |
|
| 2478 |
|
| 2479 |
<h3>Training Frameworks</h3>
|
|
|
|
| 2480 |
<div>
|
| 2481 |
<a href="https://github.com/facebookresearch/fairscale/tree/main"><strong>FairScale</strong></a>
|
| 2482 |
<p>PyTorch extension library for large-scale training, offering various parallelism and optimization techniques.</p>
|
|
|
|
| 2529 |
<p>Comprehensive guide to understanding and optimizing GPU memory usage in PyTorch.</p>
|
| 2530 |
</div>
|
| 2531 |
|
| 2532 |
+
<div>
|
| 2533 |
+
<a href="https://huggingface.co/blog/train_memory"><strong>Memory profiling walkthrough on a simple example</strong></a>
|
| 2534 |
+
<p>Visualize and understand GPU memory in PyTorch.</p>
|
| 2535 |
+
</div>
|
| 2536 |
+
|
| 2537 |
<div>
|
| 2538 |
<a href="https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html"><strong>TensorBoard Profiler Tutorial</strong></a>
|
| 2539 |
<p>Guide to using TensorBoard's profiling tools for PyTorch models.</p>
|
|
|
|
| 2595 |
<a href="https://arxiv.org/abs/1710.03740"><strong>Mixed precision training</strong></a>
|
| 2596 |
<p>Introduces mixed precision training techniques for deep learning models.</p>
|
| 2597 |
</div>
|
| 2598 |
+
|
| 2599 |
+
<div>
|
| 2600 |
+
<a href="https://main-horse.github.io/posts/visualizing-6d/"><strong>@main_horse blog</strong></a>
|
| 2601 |
+
<p>Visualizing 6D Mesh Parallelism</p>
|
| 2602 |
+
</div>
|
| 2603 |
|
| 2604 |
<h3>Hardware</h3>
|
| 2605 |
|
|
|
|
| 2617 |
<a href="https://www.semianalysis.com/p/100000-h100-clusters-power-network"><strong>Semianalysis - 100k H100 cluster</strong></a>
|
| 2618 |
<p>Analysis of large-scale H100 GPU clusters and their implications for AI infrastructure.</p>
|
| 2619 |
</div>
|
| 2620 |
+
|
| 2621 |
+
<div>
|
| 2622 |
+
<a href="https://modal.com/gpu-glossary/readme"><strong>Modal GPU Glossary </strong></a>
|
| 2623 |
+
<p>CUDA docs for human</p>
|
| 2624 |
+
</div>
|
| 2625 |
|
| 2626 |
<h3>Others</h3>
|
| 2627 |
|
|
|
|
| 2649 |
<a href="https://www.harmdevries.com/post/context-length/"><strong>Harm's blog for long context</strong></a>
|
| 2650 |
<p>Investigation into long context training in terms of data and training cost.</p>
|
| 2651 |
</div>
|
| 2652 |
+
|
| 2653 |
+
<div>
|
| 2654 |
+
<a href="https://www.youtube.com/@GPUMODE/videos"><strong>GPU Mode</strong></a>
|
| 2655 |
+
<p>A GPU reading group and community.</p>
|
| 2656 |
+
</div>
|
| 2657 |
+
|
| 2658 |
+
<div>
|
| 2659 |
+
<a href="https://youtube.com/playlist?list=PLvtrkEledFjqOLuDB_9FWL3dgivYqc6-3&si=fKWPotx8BflLAUkf"><strong>EleutherAI Youtube channel</strong></a>
|
| 2660 |
+
<p>ML Scalability & Performance Reading Group</p>
|
| 2661 |
+
</div>
|
| 2662 |
+
|
| 2663 |
+
<div>
|
| 2664 |
+
<a href="https://jax-ml.github.io/scaling-book/"><strong>Google Jax Scaling book</strong></a>
|
| 2665 |
+
<p>How to Scale Your Model</p>
|
| 2666 |
+
</div>
|
| 2667 |
+
|
| 2668 |
+
<div>
|
| 2669 |
+
<a href="https://github.com/facebookresearch/capi/blob/main/fsdp.py"><strong>@fvsmassa & @TimDarcet FSDP</strong></a>
|
| 2670 |
+
<p>Standalone ~500 LoC FSDP implementation</p>
|
| 2671 |
+
</div>
|
| 2672 |
+
|
| 2673 |
+
<div>
|
| 2674 |
+
<a href="https://www.thonking.ai/"><strong>thonking.ai</strong></a>
|
| 2675 |
+
<p>Some of Horace He blogpost</p>
|
| 2676 |
+
</div>
|
| 2677 |
+
|
| 2678 |
+
<div>
|
| 2679 |
+
<a href="https://gordicaleksa.medium.com/eli5-flash-attention-5c44017022ad"><strong>Aleksa's ELI5 Flash Attention</strong></a>
|
| 2680 |
+
<p>Easy explanation of Flash Attention</p>
|
| 2681 |
+
</div>
|
| 2682 |
+
|
| 2683 |
|
| 2684 |
<h2>Appendix</h2>
|
| 2685 |
|