How to keep Cloud cost under control?Collect the hardware metrics (CPU/memory utilization, disk I/O)Apr 16, 2022Apr 16, 2022
2022 : No FileSystem for scheme: s3?How many times did you face this error? S3 or S3a schema not found error while working with AWS S3 + Spark outside the EMR?Mar 24, 20225Mar 24, 20225
Dynamic Programming Patterns to Ace InterviewsFollowing materials are taken from freecodeCamp.org Youtube video course https://www.youtube.com/watch?v=oBt53YbR9Kk strongly recommend to…Dec 15, 2021Dec 15, 2021
What if I wanted to submit remote PySpark jobs to AWS EMR without worrying about library dependen…Typically Spark cluster is used to run ETL jobs or some streaming jobs, where everything is managed by developers for developers, where…Dec 8, 2021Dec 8, 2021
NLP: What it takes to design a full stack DeepLearning based Receipts form filling system using NER?Online Colab Notebook for model training.Nov 29, 2021Nov 29, 2021
NLP: What it takes to model Google search suggestions or auto complete?Dataset : google_wellformed_query 25K examplesNov 10, 20211Nov 10, 20211
How to build custom NER HuggingFace dataset for receipts and train with HuggingFace Transformers…Disclaimer: It is assumed that you have some working knowledge in Hugging face library and datasets library, to begin with.Oct 28, 20211Oct 28, 20211
A dive into Apache Spark Parquet Reader for small size filesWhile working part of my current project, I was asked how Spark reads Parquet files and how does it achieves the parallelisation. Very…Oct 26, 20212Oct 26, 20212
Some interesting interview Q&A around “randomness”1 How to sample a random Element from an infinite stream?May 24, 2021May 24, 2021