FoodFlow_GNN_Model / README.md
qzhang533's picture
Update README.md
9e7ead4 verified
metadata
license: mit
language:
  - en
metrics:
  - accuracy
pipeline_tag: graph-ml
tags:
  - gnn
  - trade_flow
  - food_security
  - pytorch
  - regression

Overview

This model predicts food trade flows between U.S. counties and Freight Analysis Framework (FAF) zones using Graph Neural Networks (GNNs). It addresses the challenges of sparsity in trade data by applying a two-stage hurdle model that distinguishes between the presence and magnitude of trade. This model supports applications in economic planning, infrastructure design, and food security policy.

Model Details

  • Developed by: Qianheng Zhang & ICICLE Team
  • Funded by: NSF AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)
  • Model type: Graph Neural Network (GAT and GCN variants)
  • Language(s): English (for documentation and metadata)
  • License: MIT License
  • Framework: PyTorch, PyTorch Geometric

File Structure

FoodFlow_GNN_Model/
β”œβ”€β”€ README.md                           
β”œβ”€β”€ requirements.txt                   
β”œβ”€β”€ .gitattributes                    
β”‚
β”œβ”€β”€ FAF_level(Old)/                    # FAF zone-level implementation
β”‚   β”œβ”€β”€ data/                          # FAF-level datasets
β”‚   β”‚   β”œβ”€β”€ FAF5_SCTG1.csv            # FAF trade flow data (SCTG1 commodity)
β”‚   β”‚   β”œβ”€β”€ faf_features.csv          # FAF zone node features
β”‚   β”‚   β”œβ”€β”€ FAF_distance_matrix.csv   # Distance matrix between FAF zones
β”‚   β”‚   β”œβ”€β”€ dataTransformation.ipynb  # Data preprocessing notebook
β”‚   β”‚   └── shapefiles/               # Geographic boundary files
β”‚   β”œβ”€β”€ model.py                      # GNN model architecture
β”‚   β”œβ”€β”€ utils.py                      # Utility functions
β”‚   β”œβ”€β”€ test_model.py                 # Model inference script
β”‚   β”œβ”€β”€ training.ipynb                # Model training notebook
β”‚   β”œβ”€β”€ model.yaml                    # Model configuration
β”‚   β”œβ”€β”€ gcn_model.pt                  # Trained GCN model weights
β”‚   └── gat_model.pt                  # Trained GAT model weights
β”‚
└── County_level(New)/                 # County-level implementation
    β”œβ”€β”€ code/                         # Source code
    β”‚   β”œβ”€β”€ model.py                  # Enhanced GNN model architecture
    β”‚   β”œβ”€β”€ utils.py                  # Utility functions
    β”‚   β”œβ”€β”€ training.ipynb            # County-level training notebook
    β”‚   └── inference.ipynb           # County-level inference notebook
    β”œβ”€β”€ data/                         # County-level datasets
    β”‚   β”œβ”€β”€ county_aligned_filtered.csv.zip  # County node features (compressed)
    β”‚   β”œβ”€β”€ faf_features_aligned_filtered.csv # Aligned FAF features
    β”‚   β”œβ”€β”€ FAF5_SCTG1.csv           # FAF trade flow data
    β”‚   β”œβ”€β”€ FAF5_SCTG2.csv           # FAF trade flow data (SCTG2)
    β”‚   β”œβ”€β”€ FAF5_SCTG3.csv           # FAF trade flow data (SCTG3)
    β”‚   β”œβ”€β”€ FAF5_SCTG4.csv           # FAF trade flow data (SCTG4)
    β”‚   β”œβ”€β”€ FAF5_SCTG5.csv           # FAF trade flow data (SCTG5)
    β”‚   β”œβ”€β”€ FAF5_SCTG6.csv           # FAF trade flow data (SCTG6)
    β”‚   β”œβ”€β”€ FAF5_SCTG7.csv           # FAF trade flow data (SCTG7)
    β”‚   └── FAF_distance_matrix.csv  # Distance matrix
    └── models/                       # Trained model weights
        β”œβ”€β”€ best_model1_gcn.pth      # GCN model variant 1
        β”œβ”€β”€ best_model2_gcn.pth      # GCN model variant 2
        β”œβ”€β”€ best_model3_gcn.pth      # GCN model variant 3
        β”œβ”€β”€ best_model4_gcn.pth      # GCN model variant 4
        β”œβ”€β”€ best_model5_gcn.pth      # GCN model variant 5
        β”œβ”€β”€ best_model6_gcn.pth      # GCN model variant 6
        └── best_model7_gcn.pth      # GCN model variant 7

Key Components

  • FAF_level(Old): Original implementation for Freight Analysis Framework zone-level predictions
  • County_level(New): Enhanced implementation supporting county-level inference using FAF-trained models
  • Data Files: Trade flow data across different Standard Classification of Transported Goods (SCTG) categories
  • Models: Multiple trained GCN model variants for robust predictions
  • Notebooks: Interactive training and inference workflows

Uses

FAF-Level Inferencing

python test_model.py --model_type GAT --model_path gat_model.pt --node_features data/faf_features.csv --edges data/FAF5_SCTG1.csv --distance_matrix data/FAF_distance_matrix.csv

County-Level Inferencing

The model supports county-level food flow prediction using models trained on FAF-level data. This enables fine-grained analysis at the county level while leveraging the broader patterns learned from FAF zone training.

Key Features of County-Level Inferencing:

  • Spatial Disaggregation: Models trained on FAF zones can be applied to county-level data for finer spatial resolution
  • Comprehensive Coverage: Supports all 3,143 mainland U.S. counties (excluding territories)
  • Feature Alignment: County features are aligned with FAF training data structure
  • Batch Processing: Efficient inference on large county-to-county networks (~9.8M edges)
  • Two-Stage Prediction: Provides both existence probability and flow magnitude predictions

Direct Use

  • Predicting food flows between regions using node (county/FAF zone) and edge features
  • Modeling economic connectivity and transportation dependency
  • County-level supply chain analysis and planning
  • Fine-grained spatial analysis at county resolution

Downstream Use

  • Spatial forecasting of trade changes under policy shifts
  • Identifying critical counties for supply chain resilience
  • Local economic impact assessment
  • Infrastructure planning at county granularity

Out-of-Scope Use

  • Real-time trade forecasting
  • Non-U.S. geographic settings without retraining

Bias, Risks, and Limitations

  • Bias: Model predictions depend on historical FAF data and may not reflect unexpected future disruptions (e.g., disasters, pandemics)
  • Limitations: Prediction is limited to predefined commodity codes (SCTG1)
  • Data quality: Assumes accuracy of FAF flow data and economic indicators
  • County-level considerations: County predictions may inherit biases from FAF-level training data

Recommendations

Users should:

  • Evaluate model generalizability before applying it to non-FAF settings
  • Interpret sparse predictions carefullyβ€”zeros may result from missing data, not true absence
  • Validate county-level predictions against local economic indicators when available
  • Consider the transfer learning nature of county-level inference from FAF-trained models