ui: comment memoryusage image
Browse files- dist/index.html +1 -1
- src/index.html +1 -1
dist/index.html
CHANGED
@@ -842,7 +842,7 @@
|
|
842 |
frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
|
843 |
});
|
844 |
</script> -->
|
845 |
-
<p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
|
846 |
|
847 |
|
848 |
<p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
|
|
|
842 |
frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
|
843 |
});
|
844 |
</script> -->
|
845 |
+
<!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
|
846 |
|
847 |
|
848 |
<p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
|
src/index.html
CHANGED
@@ -842,7 +842,7 @@
|
|
842 |
frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
|
843 |
});
|
844 |
</script> -->
|
845 |
-
<p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
|
846 |
|
847 |
|
848 |
<p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
|
|
|
842 |
frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
|
843 |
});
|
844 |
</script> -->
|
845 |
+
<!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
|
846 |
|
847 |
|
848 |
<p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
|