Mageswaran DHow to keep Cloud cost under control?Collect the hardware metrics (CPU/memory utilization, disk I/O)3 min read·Apr 16, 2022----
Mageswaran D2022 : No FileSystem for scheme: s3?How many times did you face this error? S3 or S3a schema not found error while working with AWS S3 + Spark outside the EMR?3 min read·Mar 24, 2022--5--5
Mageswaran DPySpark Practice ProblemsHow to transform array of arrays into columns?3 min read·Jan 2, 2022----
Mageswaran DDynamic Programming Patterns to Ace InterviewsFollowing materials are taken from freecodeCamp.org Youtube video course https://www.youtube.com/watch?v=oBt53YbR9Kk strongly recommend to…9 min read·Dec 15, 2021----
Mageswaran DWhat if I wanted to submit remote PySpark jobs to AWS EMR without worrying about library dependen…Typically Spark cluster is used to run ETL jobs or some streaming jobs, where everything is managed by developers for developers, where…5 min read·Dec 8, 2021----
Mageswaran DNLP: What it takes to design a full stack DeepLearning based Receipts form filling system using NER?Online Colab Notebook for model training.11 min read·Nov 29, 2021----
Mageswaran DNLP: What it takes to model Google search suggestions or auto complete?Dataset : google_wellformed_query 25K examples8 min read·Nov 10, 2021--1--1
Mageswaran DHow to build custom NER HuggingFace dataset for receipts and train with HuggingFace Transformers…Disclaimer: It is assumed that you have some working knowledge in Hugging face library and datasets library, to begin with.5 min read·Oct 28, 2021--1--1
Mageswaran DA dive into Apache Spark Parquet Reader for small size filesWhile working part of my current project, I was asked how Spark reads Parquet files and how does it achieves the parallelisation. Very…8 min read·Oct 26, 2021--2--2
Mageswaran DSome interesting interview Q&A around “randomness”1 How to sample a random Element from an infinite stream?2 min read·May 24, 2021----