Skip to main content

Vocational Training Program Really Work?

· 10 min read
Sho SEKINE
Head of Applied Science at mercari, Principal Data Scientist at Fast Retailing, Co-founder AI Allye

"Does vocational training truly boost participants' future earnings?"

For policymakers and business leaders, measuring the real impact of such programs is a critical challenge. A simple comparison between participants and non-participants is often misleading—for instance, highly motivated individuals might be more likely to sign up, skewing the results.

To solve this, we need Causal Inference.

In this post, we revisit a classic case study based on LaLonde's National Supported Work Demonstration (NSW) data. We will move beyond the textbook theory and demonstrate how to strip away bias to uncover the true program effect.

Today, let's analyze this data using Allye Pro.

CATE Prediction

1. Data Generation

The data is available in the causaldata package. We will use it to create a mixed dataset (nsw_cps_mixed_data) that combines the experimental treatment group with the observational control group.

You can use the code below to generate the data. Or you can also download csv file from here.

from causaldata import nsw_mixtape, cps_mixtape
import pandas as pd

# NSW randomized experiment
df_nsw = nsw_mixtape.load_pandas().data.copy()
# CPS observational data
df_cps = cps_mixtape.load_pandas().data.copy()
common_cols = [
"age", "educ", "black", "hisp", "marr",
"nodegree", "re74", "re75", "re78"
]
df_cps_use = df_cps[common_cols].copy()
df_cps_use["treat"] = 0
df_cps_use["source"] = "CPS"
# Select only the treated group from the experimental data
df_nsw_use = df_nsw[df_nsw["treat"] == 1][common_cols + ["treat"]].copy()
df_nsw_use["source"] = "NSW"
# Combine them to form a biased dataset
df_mixed = pd.concat(
[df_nsw_use, df_cps_use],
axis=0,
ignore_index=True
)
df_mixed['treat'] = df_mixed['treat'].astype('category')
df_mixed.head()

Here is a breakdown of the variables in the dataset:

VariableDefinitionRoleDetails
treatTreatment IndicatorTreatment1 = Received Job Training, 0 = Did not receive. This is the key variable for our analysis.
ageAgeCovariateAge of the participant.
educEducationCovariateYears of education completed (e.g., 12 = High School graduate).
blackBlack (Dummy)Covariate1 = Black, 0 = Otherwise.
hispHispanic (Dummy)Covariate1 = Hispanic, 0 = Otherwise.
marrMarried (Dummy)Covariate1 = Married, 0 = Single/Other.
nodegreeNo Degree (Dummy)Covariate1 = No High School Degree, 0 = Has Degree. Used to identify dropouts.
re74Real Earnings 1974CovariatePre-treatment Income 1. Indicates economic status before the program. Participants often have low values here.
re75Real Earnings 1975CovariatePre-treatment Income 2. Immediate pre-program income. Often zero for participants in this dataset.
re78Real Earnings 1978OutcomePost-treatment Income. The target variable. We want to see if treat=1 leads to an increase here.
sourceData SourceMetadataOrigin of the record ('NSW' for experimental treated, 'CPS' for observational control).

2. A/A Test and Checking Bias in Treatment Effects

The NSW dataset consists of individuals who sought and received vocational training. The cps_mixtape data, however, represents a general population sample.

There are likely many underlying factors that motivate someone to seek vocational training. First, let's perform a quick A/A Test to check if the two groups are homogeneous.

A/A Test Results

VariableGroupSample SizeAverage95% CIEffect ΔLift (%)p-valueSignificant
ageControl1599233.23[33.05, 33.40]---No
Treated18525.82[24.78, 26.85]-7.41-22.3%0.000Yes
educControl1599212.03[11.98, 12.07]---No
Treated18510.35[10.05, 10.64]-1.68-14.0%0.000Yes
blackControl159920.07[0.07, 0.08]---No
Treated1850.84[0.79, 0.90]+0.77+1046.7%0.000Yes
marrControl159920.71[0.70, 0.72]---No
Treated1850.19[0.13, 0.25]-0.52-73.4%0.000Yes
nodegreeControl159920.30[0.29, 0.30]---No
Treated1850.71[0.64, 0.77]+0.41+139.4%0.000Yes
re74Control1599214016.80[13868.47, 14165.13]---No
Treated1852095.57[1386.75, 2804.39]-11921.23-85.0%0.000Yes
re75Control1599213650.80[13507.11, 13794.49]---No
Treated1851532.06[1065.09, 1999.02]-12118.75-88.8%0.000Yes
re78Control1599214846.66[14697.13, 14996.19]---No
Treated1856349.14[5207.95, 7490.34]-8497.52-57.2%0.000Yes

Those who received vocational training are generally younger, have lower education levels, and significantly lower pre-training earnings (re74, re75).

Just because the re78 (earnings in 1978) is higher for the non-treated group doesn't mean the training was pointless. It simply suggests that even if the training had a positive effect, it wasn't enough to close the massive initial gap between the two groups. The A/B test reports a negative effect of -$8497.52, but we cannot conclude this is the causal effect of the intervention due to the severe selection bias.

3. Propensity Score Matching

To address this bias, we apply Propensity Score Matching (PSM), a standard technique in causal inference.

We select covariates for balancing (e.g., demographics, prior earnings) and choose the outcome variable.

PSM Report

Looking at the Love Plot and the balance table, we can see that the discrepancies identified in the A/A test have been successfully mitigated. The matching process has created a control group that is statistically very similar to the treated group.

Now, let's run an A/B Test on this matched dataset:

VariableGroupSample SizeAverage95% CIEffect ΔLift (%)p-valueSignificant
re78Control (0)1644564.52[3736.96, 5392.07]---No
Treated (1)1646429.95[5227.35, 7632.55]+1865.43+40.9%0.012Yes

Matched A/B Test

We now estimate a positive effect of $1865.43. This difference is statistically significant.

4. Validation: Checking the Answer Key

Since the original NSW dataset is from a Randomized Controlled Trial (RCT), we can calculate the true experimental effect by comparing the treated group with the experimental control group (not the CPS data). (While there is some slight bias in nodegree, the groups are largely balanced.)

True RCT Effect

Analysis Settings

  • Treatment Variable: treat
  • Control Group: 0
  • Test Type: Auto (based on variable type)
  • Confidence Level: 95%
  • Multiple Comparison Correction: None
OutcomeGroupSampleAverageAbs CIEffect ΔLift (%)Effect CI (Δ)p-valueSignificant
ageControl26025.05[24.19, 25.92]----No
Treatment18525.82[24.78, 26.85]+0.76+3.0%-0.266No
educControl26010.09[9.89, 10.29]----No
Treatment18510.35[10.05, 10.64]+0.26+2.6%-0.150No
blackControl2600.83[0.78, 0.87]----No
Treatment1850.84[0.79, 0.90]+0.02+2.0%-0.647No
hispControl2600.11[0.07, 0.15]----No
Treatment1850.06[0.03, 0.09]-0.05-44.8%-0.064No
marrControl2600.15[0.11, 0.20]----No
Treatment1850.19[0.13, 0.25]+0.04+23.0%-0.334No
nodegreeControl2600.83[0.79, 0.88]----No
Treatment1850.71[0.64, 0.77]-0.13-15.2%-0.002Yes
re74Control2602107.03[1412.41, 2801.65]----No
Treatment1852095.57[1386.75, 2804.39]-11.45-0.5%-0.982No
re75Control2601266.91[887.97, 1645.85]----No
Treatment1851532.06[1065.09, 1999.02]+265.15+20.9%-0.385No
re78Control2604554.80[3885.10, 5224.50]----No
Treatment1856349.14[5207.95, 7490.34]+1794.34+39.4%-0.008Yes

The true effect is +$1794.34. Our PSM estimate of $1865.43 differs by less than 4%, demonstrating that PSM was able to recover the causal effect with high accuracy from the observational data.

5. Advanced Topics: Heterogeneous Treatment Effects

CATE Estimation

Using machine learning, we can go a step further and estimate the Conditional Average Treatment Effect (CATE) for individuals. Given the small sample size and high variance, we'll use LinearDML, which provides robust CATE estimation.

LinearDML

By averaging the predicted CATE for the treated individuals (treat = 1), we can compare this result with our previous average treatment effects.

Mean CATE

The calculated result is $1495. While there is a ~16.7% deviation from the true $1794, it is a massive improvement over the naive observational comparison (-$8497) and provides a directional estimate good enough for decision-making.

One more tip for the accurate understanding

In the LinearDML report, the factors contributing to CATE showed that both re74 and re75 had negative coefficients, with re74 showing a particularly strong negative correlation.

Effect Model Coefficients

It makes intuitive sense that people with higher prior earnings might benefit less from basic vocational training. However, the fact that re74 (income 4 years prior) had a much stronger correlation than re75 (income 3 years prior) seemed odd.

Before jumping to conclusions, we should check for multicollinearity, as LinearDML (being a linear model) is sensitive to it.

Checking the scatter plot and correlation between re74 and re75, we find a high correlation coefficient (r=0.87). The plot also suggests a ceiling effect.

re74 vs re75

This collinearity might be distorting the coefficients. To fix this, we can filter out the ceiling values as outliers and apply Principal Component Analysis (PCA) to re74 and re75 to create orthogonal components.

  • PC1: Positively correlated with both re74 and re75 (represents overall income level).
  • PC2: Represents the difference/variance between the years.

Re-running LinearDML with PC1 and PC2 instead of the raw variables yields the following:

PCA LinearDML

Both components still show a negative correlation with CATE, but PC1 (overall income level) has the strongest negative correlation. This confirms our hypothesis: Vocational training is less effective for those who already have high earning potential. It wasn't about re74 specifically, but the general income level.

Additionally, age shows a positive correlation, suggesting that older participants (within this demographic) benefited more from the training than younger ones.

6. Conclusion and Summary

Our analysis of the NSW vocational training program revealed several key insights:

  1. Bias Correction: Simple comparison of observational data led to a misleading negative effect (-$8500). Propensity Score Matching successfully corrected this bias, estimating a positive effect (+$1865) very close to the true experimental benchmark (+$1794).
  2. Targeting Efficiency: Vocational training budgets and manpower are limited. To maximize effectiveness, our CATE analysis suggests a clear policy direction:
    • Focus on those with lower prior earnings. The training has diminishing returns for those with higher baseline income.
    • Prioritize older applicants. Within this group, older individuals showed higher treatment effects.

Simply looking at post-training income (re78) might tempt administrators to select candidates who are likely to earn more anyway (high prior earners). However, our causal analysis proves this would be a mistake—those individuals benefit the least from the program. The true value of the training is maximized by targeting those who need it most.

Data Science Is Fun! Getting It Right Is What Makes It Valuable.

Achieve deeper understanding and higher-quality outputs in data science—beyond your peers. If you want to explore the data yourself, grab the dataset and try reproducing these results in Allye!

You can try Allye Base for free.