metadata
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: graph-ml
tags:
- gnn
- trade_flow
- food_security
- pytorch
- regression
Overview
This model predicts food trade flows between U.S. counties and Freight Analysis Framework (FAF) zones using Graph Neural Networks (GNNs). It addresses the challenges of sparsity in trade data by applying a two-stage hurdle model that distinguishes between the presence and magnitude of trade. This model supports applications in economic planning, infrastructure design, and food security policy.
Model Details
- Developed by: Qianheng Zhang & ICICLE Team
- Funded by: NSF AI Institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)
- Model type: Graph Neural Network (GAT and GCN variants)
- Language(s): English (for documentation and metadata)
- License: MIT License
- Framework: PyTorch, PyTorch Geometric
File Structure
FoodFlow_GNN_Model/
βββ README.md
βββ requirements.txt
βββ .gitattributes
β
βββ FAF_level(Old)/ # FAF zone-level implementation
β βββ data/ # FAF-level datasets
β β βββ FAF5_SCTG1.csv # FAF trade flow data (SCTG1 commodity)
β β βββ faf_features.csv # FAF zone node features
β β βββ FAF_distance_matrix.csv # Distance matrix between FAF zones
β β βββ dataTransformation.ipynb # Data preprocessing notebook
β β βββ shapefiles/ # Geographic boundary files
β βββ model.py # GNN model architecture
β βββ utils.py # Utility functions
β βββ test_model.py # Model inference script
β βββ training.ipynb # Model training notebook
β βββ model.yaml # Model configuration
β βββ gcn_model.pt # Trained GCN model weights
β βββ gat_model.pt # Trained GAT model weights
β
βββ County_level(New)/ # County-level implementation
βββ code/ # Source code
β βββ model.py # Enhanced GNN model architecture
β βββ utils.py # Utility functions
β βββ training.ipynb # County-level training notebook
β βββ inference.ipynb # County-level inference notebook
βββ data/ # County-level datasets
β βββ county_aligned_filtered.csv.zip # County node features (compressed)
β βββ faf_features_aligned_filtered.csv # Aligned FAF features
β βββ FAF5_SCTG1.csv # FAF trade flow data
β βββ FAF5_SCTG2.csv # FAF trade flow data (SCTG2)
β βββ FAF5_SCTG3.csv # FAF trade flow data (SCTG3)
β βββ FAF5_SCTG4.csv # FAF trade flow data (SCTG4)
β βββ FAF5_SCTG5.csv # FAF trade flow data (SCTG5)
β βββ FAF5_SCTG6.csv # FAF trade flow data (SCTG6)
β βββ FAF5_SCTG7.csv # FAF trade flow data (SCTG7)
β βββ FAF_distance_matrix.csv # Distance matrix
βββ models/ # Trained model weights
βββ best_model1_gcn.pth # GCN model variant 1
βββ best_model2_gcn.pth # GCN model variant 2
βββ best_model3_gcn.pth # GCN model variant 3
βββ best_model4_gcn.pth # GCN model variant 4
βββ best_model5_gcn.pth # GCN model variant 5
βββ best_model6_gcn.pth # GCN model variant 6
βββ best_model7_gcn.pth # GCN model variant 7
Key Components
- FAF_level(Old): Original implementation for Freight Analysis Framework zone-level predictions
- County_level(New): Enhanced implementation supporting county-level inference using FAF-trained models
- Data Files: Trade flow data across different Standard Classification of Transported Goods (SCTG) categories
- Models: Multiple trained GCN model variants for robust predictions
- Notebooks: Interactive training and inference workflows
Uses
FAF-Level Inferencing
python test_model.py --model_type GAT --model_path gat_model.pt --node_features data/faf_features.csv --edges data/FAF5_SCTG1.csv --distance_matrix data/FAF_distance_matrix.csv
County-Level Inferencing
The model supports county-level food flow prediction using models trained on FAF-level data. This enables fine-grained analysis at the county level while leveraging the broader patterns learned from FAF zone training.
Key Features of County-Level Inferencing:
- Spatial Disaggregation: Models trained on FAF zones can be applied to county-level data for finer spatial resolution
- Comprehensive Coverage: Supports all 3,143 mainland U.S. counties (excluding territories)
- Feature Alignment: County features are aligned with FAF training data structure
- Batch Processing: Efficient inference on large county-to-county networks (~9.8M edges)
- Two-Stage Prediction: Provides both existence probability and flow magnitude predictions
Direct Use
- Predicting food flows between regions using node (county/FAF zone) and edge features
- Modeling economic connectivity and transportation dependency
- County-level supply chain analysis and planning
- Fine-grained spatial analysis at county resolution
Downstream Use
- Spatial forecasting of trade changes under policy shifts
- Identifying critical counties for supply chain resilience
- Local economic impact assessment
- Infrastructure planning at county granularity
Out-of-Scope Use
- Real-time trade forecasting
- Non-U.S. geographic settings without retraining
Bias, Risks, and Limitations
- Bias: Model predictions depend on historical FAF data and may not reflect unexpected future disruptions (e.g., disasters, pandemics)
- Limitations: Prediction is limited to predefined commodity codes (SCTG1)
- Data quality: Assumes accuracy of FAF flow data and economic indicators
- County-level considerations: County predictions may inherit biases from FAF-level training data
Recommendations
Users should:
- Evaluate model generalizability before applying it to non-FAF settings
- Interpret sparse predictions carefullyβzeros may result from missing data, not true absence
- Validate county-level predictions against local economic indicators when available
- Consider the transfer learning nature of county-level inference from FAF-trained models