Industrial Policy Classification Model v1.0
This model classifies text documents to determine whether they describe industrial policy goals. It was fine-tuned from bert-base-uncased on a dataset of policy documents and measures.
Accompanies the paper:
Juhász, Réka, Lane, Nathan J., Oehlsen, Emily, and Perez, Veronica C. (2025). Measuring Industrial Policy: A Text-Based Approach. National Bureau of Economic Research. Available at: https://www.nber.org/papers/w33895
The output data is available at: industrialpolicydata.com
Model Description
This is a BERT-based text classification model trained to identify industrial policy intentions in text. The model can classify text into 3 categories:
- IP goal (0): Text describes an industrial policy objective or intervention
- No IP goal (1): Text does not describe an industrial policy objective
- Not enough information (2): Insufficient information to determine policy intent
The model was trained on expert-annotated policy documents. The input data for this project was provided in 2023 by the Global Trade Alerts project. See the Global Trade Alert (2025) data Available at: https://www.globaltradealert.org/
Intended Use
This model is designed for research purposes to analyze policy documents, government measures, and related texts to identify industrial policy intentions. It can be used by:
- Economics researchers studying industrial policy
- Policy analysts examining government interventions
- Data scientists working with policy text classification
- Government agencies analyzing policy effectiveness
Model Performance
- Accuracy: 0.941
- F1 Score: 0.941
- Precision: 0.941
- Recall: 0.941
- Test Loss: 0.2886
Metrics evaluated on held-out test set
Training Data
The model was trained on expert-annotated policy documents. The input data for this project was provided by the Global Trade Alerts project.
Training Procedure
Model Architecture
- Base model: bert-base-uncased
- Architecture: BertForSequenceClassification
- Number of labels: 3
- Fine-tuning approach: Full model fine-tuning with classification head
Training Configuration
- Optimization: Hyperparameter tuning using Optuna for optimal performance
- Data balancing: Oversampling applied to handle class imbalance
- Validation strategy: Stratified splits with income-based validation
- Cross-validation: Income group validation to test generalization
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
# Load model and tokenizer
model_name = "industrialpolicygroup/industrialpolicy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Create classification pipeline
classifier = pipeline("text-classification",
model=model,
tokenizer=tokenizer)
# Example usage
text = "Government provides subsidies to promote renewable energy development"
result = classifier(text)
print(result)
# Expected output format:
# [{'label': 'LABEL_0', 'score': 0.95}]
#
# Label mappings:
Limitations and Bias
- The model is trained primarily on English text from the Global Trade Alerts project
- Performance may vary on policy domains not well-represented in training data
- The model reflects the annotation guidelines and may not capture all nuances of industrial policy
- Bias towards certain types of policy language present in training data
- May require domain adaptation for highly specialized policy areas
Evaluation and Validation
The model underwent rigorous evaluation including:
- Standard train/validation/test splits
- Income-based validation across country groups
- Cross-domain evaluation on different policy types
- Comparison with traditional machine learning baselines
Ethical Considerations
This model is intended for research and analysis purposes. Users should be aware that:
- Policy classification can have implications for economic research and policy recommendations
- The model's outputs should be interpreted by domain experts
- Results should be validated against human expert judgment for critical applications
Citation
If you use this model in your research, please cite:
@techreport{NBERw33895,
title = "Measuring Industrial Policy: A Text-Based Approach",
author = "Juhász, Réka and Lane, Nathan J and Oehlsen, Emily and Perez, Veronica C",
institution = "National Bureau of Economic Research",
type = "Working Paper",
series = "Working Paper Series",
number = "33895",
year = "2025",
month = "June",
doi = {10.3386/w33895},
URL = "http://www.nber.org/papers/w33895",
abstract = {Since the 18th century, policymakers have debated the merits of industrial policy (IP). Yet, economists lack basic facts about its use due to measurement challenges. We propose a new approach to IP measurement based on information contained in policy text. We show how off-the-shelf supervised machine learning tools can be used to categorize industrial policies at scale. Using this approach, we validate longstanding concerns with earlier approaches to measurement which conflate IP with other types of policy. We apply our methodology to a global database of commercial policy descriptions, and provide a first look at IP use at the country, industry, and year levels (2010-2022). The new data on IP suggest that i) IP is on the rise; ii) modern IP tends to use subsidies and export promotion measures as opposed to tariffs; iii) rich countries heavily dominate IP use; iv) IP tends to target sectors with an established comparative advantage, particularly in high-income countries.},
}
Model Details
- Developed by: Industrial Policy Group
- Model type: Text Classification (BERT-based)
- Language: English
- License: Apache 2.0
- Fine-tuned from: bert-base-uncased
Technical Specifications
Architecture Details
- Model Type: BERT
- Architecture Class: BertForSequenceClassification
- Transformers Version: 4.52.4
Model Dimensions
- Vocabulary Size: 30,522
- Hidden Size: 768
- Number of Attention Heads: 12
- Number of Hidden Layers: 12
- Intermediate Size: 3,072
- Max Position Embeddings: 512
Training Configuration
- Hidden Dropout Probability: 0.1
- Attention Dropout Probability: 0.1
- Layer Norm Epsilon: 1e-12
- Initializer Range: 0.02
Classification Configuration
- Number of Labels: Unknown
- Problem Type: single_label_classification
- Padding Token ID: 0
- Position Embedding Type: absolute
- Torch Dtype: float32
- Use Cache: True
Model Size and Requirements
- Model Size:
109M parameters (418MB on disk) - Input: Text (up to 512 tokens)
- Output: Classification probabilities for 3 classes
- Framework: PyTorch + Transformers
- Precision: float32
Citations for source data
Global Trade Alert (2025). Global Trade Alert Database. Available at: https://www.globaltradealert.org/
Contact
For questions about this model or the research, please contact the Industrial Policy Group.
Model card auto-generated on 2025-06-19 14:07:03 from model files Source model: bert-base-uncased-3_classes-finetuned_hub_ready_20250617_151525
- Downloads last month
- 236
Model tree for industrialpolicygroup/industrialpolicy-classifier
Base model
google-bert/bert-base-uncased