So you might or might not know, I was working on HyperLearn –> a faster optimized ML package designed to make everything at least 50% (I hope) faster.
Thanks so much for all the support Redditors for HyperLearn! https://github.com/danielhanchen/hyperlearn [Made it to the Trending Github list for Jup Notebooks!! yayy!]
Anyways, I didn’t update the code a lot, but that’s because I was busily testing and finding out which algos were the most stable and best.
Key findings for N = 5,000 P = 6,000 [more features than N near square matrix]
For pseudoinverse, (used in Linear Reg, Ridge Reg, lots of other algos), JIT, Scipy MKL, PinvH, Pinv2 and HyperLearn’s Pinv are very similar. PyTorch’s is clearly problematic, having close to over x4 slower than Scipy MKL.
For Eigh (used in PCA, LDA, QDA, other algos), Sklearn’s PCA utilises SVD. Clearly, not a good idea, since it is much better to compute the eigenvec / eigenval on XTX. JIT Eigh is the clear winner at 14.5 seconds on XTX, whilst Numpy is 2x slower. Torch likewise is slower once again…
So, for PCA, a speedup of 3 times is seem if using JIT compiled Eigh when compared to Sklearn’s PCA
To solve X*theta = y, Torch GELS is super unstable. Like really. If you use Torch GELS, don’t forget to call theta_hat[np.isnan(theta_hat) | np.isinf(theta_hat)] = 0, or else results are problematic. All other algos have very similar MSEs, and HyperLearn’s Regularized Cholesky Solve takes a mere 0.635 seconds when compared to say using Sklearn’s next fastest Ridge Solve (via cholesky) by over 100% (after considering matrix multiplication time) –> HyperLearn 2.89s vs 4.53s Sklearn.
So to conclude:
HyperLearn’s Pseudoinverse has no speed improvement
HyperLearn’s PCA will have over 2 times speed boost. (200% improvement)
HyperLearn’s Linear Solvers will be over 1 times faster. (100% improvement)
Help make HyperLearn better! All contributors are welcome, as this is truly an overwhelming project… https://github.com/danielhanchen/hyperlearn
Lower Time == better