This is an attempt to estimate Double Machine Learning with XGboost algorithm in R. The purpose is to create a benchmark estimation with DML. The user can choose various machine learning algorithms, where optimizing hyperparameters can be time-consuming. XGboost is a very useful in this regard. This script can be used to produce substantially accurate preliminary results. Repository is here.
[Read More]
Solution to Memory Leak in R with callr package
An example calling other packages in callr
I have been working with huge samples recently. When you work with large samples, memory leak is a common problem. I have been extensively using garbage collector, but it is not helping much. So, you need to write your codes efficiently.
[Read More]
A Fast Method to Create Synthetic Data with Python
Python package available in PyPI: synloc
I have been working on a project to create synthetic data. I mostly used the R package synthpop in the project. I have been thinking about a very simple algorithm to create synthetic data using the nearest neighbor algorithm since then. I have created a Python package named synloc. I discuss the practical and theoretical here: Generating Synthetic Data with The Nearest Neighbors Algorithm
[Read More]
Creating Sparse Adjacency Matrix from Group Membership with igraph - R Programming
Block-diagonal matrix with data.table package
It took me days to come up with an efficient solution to create an adjacency matrix from group membership. Consider the following data:
[Read More]
Reducing Matrix Computation Time in R
Using sparseMatrix from Matrix package
In order to increase computation time, I transformed loops into matrix operations in an algorithm. Nevertheless, my matrices were extremely large, and thus computation was slower than I expected. I was using the %*% operator in R to do matrix multiplication. I found out that it is not possible to achieve dramatically faster computations with other background programming languages (e.g., using Rcppor JuliaCall). I tried and failed.
[Read More]