東京大学社会科学研究所

東京大学

MENU

研究

 

研究会・セミナー案内 (今後の開催予定)

2024年度

9月開催分

日時 2024年9月9日(火)15時~16時40分
場所 オンライン(Zoom)
タイトル Statistical Analysis with Machine Learning Predicted Variables
報告者 勝又裕斗(東京大学社会科学研究所)
報告要旨

Scholars in the social sciences are increasingly relying on machine learning (ML) techniques to construct data from large corpora of text and images.
The ML-generated variables are subsequently utilized in statistical analysis to address substantive questions through regression and hypothesis testing.
However, this approach can introduce substantial bias and lead to incorrect inferences due to prediction errors during the machine learning stage.
In this paper, we present an approach that incorporates ML-generated variables into regression analysis while ensuring consistency and asymptotic normality.
The proposed approach leverages a small-scale human-coded sample to capture the bias in the naive estimator, without the need for strict assumptions about the structure of prediction errors. Furthermore, we have developed diagnostic tools to assess how much additional ML-generated data and/or human coding reduce uncertainties in the main analysis.
We illustrate the effectiveness of our method by revisiting a study on the sources of election fraud with ballot image data and regression analysis.

使用言語 日本語
参加申込 ※所員限定の開催となります。

TOP