Industrial Policy Classification Model v1.0

This model classifies text documents to determine whether they describe industrial policy goals. It was fine-tuned from bert-base-uncased on a dataset of policy documents and measures.

Accompanies the paper:

Juhász, Réka, Lane, Nathan J., Oehlsen, Emily, and Perez, Veronica C. (2025). Measuring Industrial Policy: A Text-Based Approach. National Bureau of Economic Research. Available at: https://www.nber.org/papers/w33895

The output data is available at: industrialpolicydata.com

Model Description

This is a BERT-based text classification model trained to identify industrial policy intentions in text. The model can classify text into 3 categories:

IP goal (0): Text describes an industrial policy objective or intervention
No IP goal (1): Text does not describe an industrial policy objective
Not enough information (2): Insufficient information to determine policy intent

The model was trained on expert-annotated policy documents. The input data for this project was provided in 2023 by the Global Trade Alerts project. See the Global Trade Alert (2025) data Available at: https://www.globaltradealert.org/

Intended Use

This model is designed for research purposes to analyze policy documents, government measures, and related texts to identify industrial policy intentions. It can be used by:

Economics researchers studying industrial policy
Policy analysts examining government interventions
Data scientists working with policy text classification
Government agencies analyzing policy effectiveness

Model Performance

Accuracy: 0.941
F1 Score: 0.941
Precision: 0.941
Recall: 0.941
Test Loss: 0.2886

Metrics evaluated on held-out test set

Training Data

The model was trained on expert-annotated policy documents. The input data for this project was provided by the Global Trade Alerts project.

Training Procedure

Model Architecture

Base model: bert-base-uncased
Architecture: BertForSequenceClassification
Number of labels: 3
Fine-tuning approach: Full model fine-tuning with classification head

Training Configuration

Optimization: Hyperparameter tuning using Optuna for optimal performance
Data balancing: Oversampling applied to handle class imbalance
Validation strategy: Stratified splits with income-based validation
Cross-validation: Income group validation to test generalization

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load model and tokenizer
model_name = "industrialpolicygroup/industrialpolicy-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create classification pipeline
classifier = pipeline("text-classification", 
                     model=model, 
                     tokenizer=tokenizer)

# Example usage
text = "Government provides subsidies to promote renewable energy development"
result = classifier(text)
print(result)

# Expected output format:
# [{'label': 'LABEL_0', 'score': 0.95}]
# 
# Label mappings:

Limitations and Bias

The model is trained primarily on English text from the Global Trade Alerts project
Performance may vary on policy domains not well-represented in training data
The model reflects the annotation guidelines and may not capture all nuances of industrial policy
Bias towards certain types of policy language present in training data
May require domain adaptation for highly specialized policy areas

Evaluation and Validation

The model underwent rigorous evaluation including:

Standard train/validation/test splits
Income-based validation across country groups
Cross-domain evaluation on different policy types
Comparison with traditional machine learning baselines

Ethical Considerations

This model is intended for research and analysis purposes. Users should be aware that:

Policy classification can have implications for economic research and policy recommendations
The model's outputs should be interpreted by domain experts
Results should be validated against human expert judgment for critical applications

Citation

If you use this model in your research, please cite:

@techreport{NBERw33895,
 title = "Measuring Industrial Policy: A Text-Based Approach",
 author = "Juhász, Réka and Lane, Nathan J and Oehlsen, Emily and Perez, Veronica C",
 institution = "National Bureau of Economic Research",
 type = "Working Paper",
 series = "Working Paper Series",
 number = "33895",
 year = "2025",
 month = "June",
 doi = {10.3386/w33895},
 URL = "http://www.nber.org/papers/w33895",
 abstract = {Since the 18th century, policymakers have debated the merits of industrial policy (IP). Yet, economists lack basic facts about its use due to measurement challenges. We propose a new approach to IP measurement based on information contained in policy text. We show how off-the-shelf supervised machine learning tools can be used to categorize industrial policies at scale. Using this approach, we validate longstanding concerns with earlier approaches to measurement which conflate IP with other types of policy. We apply our methodology to a global database of commercial policy descriptions, and provide a first look at IP use at the country, industry, and year levels (2010-2022). The new data on IP suggest that i) IP is on the rise; ii) modern IP tends to use subsidies and export promotion measures as opposed to tariffs; iii) rich countries heavily dominate IP use; iv) IP tends to target sectors with an established comparative advantage, particularly in high-income countries.},
}

Model Details

Developed by: Industrial Policy Group
Model type: Text Classification (BERT-based)
Language: English
License: Apache 2.0
Fine-tuned from: bert-base-uncased

Technical Specifications

Architecture Details

Model Type: BERT
Architecture Class: BertForSequenceClassification
Transformers Version: 4.52.4

Model Dimensions

Vocabulary Size: 30,522
Hidden Size: 768
Number of Attention Heads: 12
Number of Hidden Layers: 12
Intermediate Size: 3,072
Max Position Embeddings: 512

Training Configuration

Hidden Dropout Probability: 0.1
Attention Dropout Probability: 0.1
Layer Norm Epsilon: 1e-12
Initializer Range: 0.02

Classification Configuration

Number of Labels: Unknown
Problem Type: single_label_classification
Padding Token ID: 0
Position Embedding Type: absolute
Torch Dtype: float32
Use Cache: True

Model Size and Requirements

Model Size: ~~109M parameters (~~418MB on disk)
Input: Text (up to 512 tokens)
Output: Classification probabilities for 3 classes
Framework: PyTorch + Transformers
Precision: float32

Citations for source data

Global Trade Alert (2025). Global Trade Alert Database. Available at: https://www.globaltradealert.org/

Contact

For questions about this model or the research, please contact the Industrial Policy Group.

Model card auto-generated on 2025-06-19 14:07:03 from model files Source model: bert-base-uncased-3_classes-finetuned_hub_ready_20250617_151525

industrialpolicygroup
/

industrialpolicy-classifier

Industrial Policy Classification Model v1.0

Model Description

Intended Use

Model Performance

Training Data

Training Procedure

Model Architecture

Training Configuration

Usage

Limitations and Bias

Evaluation and Validation

Ethical Considerations

Citation

Model Details

Technical Specifications

Architecture Details

Model Dimensions

Training Configuration

Classification Configuration

Model Size and Requirements

Citations for source data

Contact

Model tree for industrialpolicygroup/industrialpolicy-classifier