I presented Peer Effects in the Demand for Private Health Insurance (joint work with Alicia Rambaldi and Chris Rose) paper in Brown Bag Series at the University of Queensland.
Critical CART Hyperparameters in Synthpop
Creating Synthetic Data with R
I have been working on a project to create synthetic data for a long while. I have realized that the synthpop package was producing identical values, whereas it was supposed to be producing values from a predictive posterior distribution. I have been using the CART algorithm for its flexibility, but the model must be overfitting, even though I do not have too many variables. So, how can one prevent creating identical values? The answer was given in the article: “synthpop: An R package for generating synthetic versions of sensitive microdata for statistical disclosure control”.
[Read More]
An Implementation of Double Machine Learning with XGboost in R
A Benchmark Estimate
This is an attempt to estimate Double Machine Learning with XGboost algorithm in R. The purpose is to create a benchmark estimation with DML. The user can choose various machine learning algorithms, where optimizing hyperparameters can be time-consuming. XGboost is a very useful in this regard. This script can be used to produce substantially accurate preliminary results. Repository is here.
[Read More]
Solution to Memory Leak in R with callr package
An example calling other packages in callr
I have been working with huge samples recently. When you work with large samples, memory leak is a common problem. I have been extensively using garbage collector, but it is not helping much. So, you need to write your codes efficiently.
[Read More]
A Fast Method to Create Synthetic Data with Python
Python package available in PyPI: synloc
I have been working on a project to create synthetic data. I mostly used the R package synthpop in the project. I have been thinking about a very simple algorithm to create synthetic data using the nearest neighbor algorithm since then. I have created a Python package named synloc. I discuss the practical and theoretical here: Generating Synthetic Data with The Nearest Neighbors Algorithm
[Read More]