elungky commited on
Commit
9530488
·
1 Parent(s): 785a197

Fix Dockerfile syntax: separate chmod +x into its own RUN instruction

Browse files
Files changed (1) hide show
  1. Dockerfile +16 -3
Dockerfile CHANGED
@@ -49,13 +49,26 @@ RUN . $CONDA_DIR/etc/profile.d/conda.sh && \
49
  torchaudio==2.3.1 \
50
  --index-url https://download.pytorch.org/whl/cu121
51
 
52
- # Install Transformer Engine separately after PyTorch and cuDNN are in place and headers are linked.
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  RUN . $CONDA_DIR/etc/profile.d/conda.sh && \
54
  conda activate cosmos-predict1 && \
55
- #pip install transformer-engine[pytorch]==1.12.0
56
 
57
  # Make the start.sh script executable.
58
- RUN chmod +x /app/start.sh
59
 
60
  # Set the default command to run when the container starts.
61
  CMD ["/app/start.sh"]
 
49
  torchaudio==2.3.1 \
50
  --index-url https://download.pytorch.org/whl/cu121
51
 
52
+ # Install Transformer Engine
53
+ # Assuming you have the transformer_engine_torch-1.12.0+cu121-cp310-cp310-linux_x86_64.whl copied to /tmp/
54
+ # If you are still trying to compile it, ensure the symlinks are in place.
55
+ # If you are using the wheel, ensure the COPY command for the wheel is present.
56
+ # Based on your last successful LFS push, it seems you are trying to install the wheel.
57
+ # The previous comment in your Dockerfile was `#pip install transformer-engine[pytorch]==1.12.0`
58
+ # and the error was during the build of that.
59
+ # Let's use the wheel installation as that was the intended fix for OOM.
60
+
61
+ # Copy the pre-built Transformer Engine wheel into the container
62
+ # Ensure the filename matches your actual wheel file.
63
+ COPY ./transformer_engine_torch-1.12.0+cu121-cp310-cp310-linux_x86_64.whl /tmp/ # Assuming this line is present and correct
64
+
65
+ # Install Transformer Engine using the pre-built wheel
66
  RUN . $CONDA_DIR/etc/profile.d/conda.sh && \
67
  conda activate cosmos-predict1 && \
68
+ pip install --no-cache-dir /tmp/transformer_engine_torch-1.12.0+cu121-cp310-cp310-linux_x86_64.whl
69
 
70
  # Make the start.sh script executable.
71
+ RUN chmod +x /app/start.sh # --- THIS WAS THE OFFENDING LINE, NOW SEPARATED ---
72
 
73
  # Set the default command to run when the container starts.
74
  CMD ["/app/start.sh"]