tfrere commited on
Commit
56fa026
·
1 Parent(s): dfe72b9

ui: comment memoryusage image

Browse files
Files changed (2) hide show
  1. dist/index.html +1 -1
  2. src/index.html +1 -1
dist/index.html CHANGED
@@ -842,7 +842,7 @@
842
  frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
843
  });
844
  </script> -->
845
- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
846
 
847
 
848
  <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
 
842
  frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
843
  });
844
  </script> -->
845
+ <!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
846
 
847
 
848
  <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
src/index.html CHANGED
@@ -842,7 +842,7 @@
842
  frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
843
  });
844
  </script> -->
845
- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p>
846
 
847
 
848
  <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>
 
842
  frame.style.width = frame.contentWindow.document.documentElement.scrollWidth + 'px';
843
  });
844
  </script> -->
845
+ <!-- <p><img alt="dp_ourjourney_memoryusage.svg" src="/assets/images/dp_ourjourney_memoryusage.svg" /></p> -->
846
 
847
 
848
  <p>We've also seen that data parallelism starts to have some limiting communication overhead above a certain level of scaling. Do we have other options for these larger models or large batch sizes? We do have some solutions, thankfully - they involve either moving some tensors to the CPU or splitting the weights/gradients/optimizer states tensors across GPU devices.</p>