Spaces:

nanotron
/

ultrascale-playbook

Running

tfrere commited on 11 days ago

Commit

56fa026

1 Parent(s): dfe72b9

ui: comment memoryusage image

Files changed (2) hide show

dist/index.html CHANGED Viewed

@@ -842,7 +842,7 @@
                 frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
             });
         </script> -->
-        <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
         <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>

                 frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
             });
         </script> -->
+        <!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
         <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>

src/index.html CHANGED Viewed

@@ -842,7 +842,7 @@
                 frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
             });
         </script> -->
-        <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
         <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>

                 frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
             });
         </script> -->
+        <!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
         <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>