Data Engineer AG1
Objective
Computer engineering graduate with 5+ Years of experience in development, proficient in Scala, having a deep understanding of
Data Management, Big-Data tools, and technologies, seeking full-time senior software developer/team lead roles.
TECHNICAL SKILLS
Languages:
Big Data:
Modules And Frameworks:
ETL Tools:
Cloud:
Data Store:
Utility and DS Misc:
Scala, Java, Python
Spark, Kafka, Storm, HDFS, HBase, Hadoop MapReduce
Akka HTTP, Akka Actors, Jersey Framework, Spring Boot
Nifi, AWS Data lake, Glue, Snowplow Analytics
AWS: Glue, Lambda, EMR, Elasticbeanstalk, Kinesis; Azure: HDInsight
SQL, Postgres, Redshift, Elasticsearch, MongoDB, Tigergpragh
Docker, Kubernetes, Jenkins, Atlassian Tools ( Jira, Confluence ) ), Maven, Gradle, SBT, GitHub,
Weka, Deeplearning4j, Spacy, Spark-NLP
PROFESSIONAL EXPERIENCE
Algoscale Technologies, Noida
June 2015 – present
Big Data Engineer/ Tech Lead
• Manage and lead the big data team, mentoring the interns to ensure there is no bottleneck.
• Understanding the client requirements to formulate a project model, implement the solution, communicate the progress
to the client at regular intervals. Follow agile methodologies for efficient development.
• Take care of the complete project life cycle from its inception till the final deliverable to ensure that requirements are
met.
PROJECTS
Data Management Platform for Ad-Market
Dec 2016 – present
(Scala, Akka, Nifi, AWS Glue, Spark, Postgres, MongoDB, Tigergpragh)
• Created an ETL pipeline from scratch for Campaign Performance data for various platforms DV360, Adwords, Facebook,
campaign manager, etc. Ingesting terabytes of data for numerous Advertisers using Nifi and AWS Data lake and
processing using spark. This helped our client immensely to get insights on campaigns running across different platforms.
•
Wrote several Scala web backend applications, handling large amounts of real-time data, created rest services in Akka
HTTP, used classic Actors while consuming from Kafka. These applications are the core part of the DMP product. This
product is booming with currently many Big advertisers as customers.
Network And Security App Data
Jan 2019 – Mar 2020
Snowplow Analytics, AWS (Load balancer, S3, EC2, ElasticBeanstalk, Kinesis Firehose, Redshift)
• Created an ETL pipeline from scratch for Real-time processing of 100's of millions of events daily coming from websites
and their security app using Snowplow analytics. This helped the client to get invaluable insights and retargeting users.
This Improved their growth to 10%.
Real-Time Dashboard Populate for a SaaS CAR Product (Spark Streaming, Kafka, MQTT)
Mar 2020 – Dec 2020
• Populate a car Service live dashboard in near real-time using Spark Streaming. Earlier the Dashboard had a latency of 10
sec in processing some aggregated results, Using this pipeline improved the latency to 1-2 sec.
IAB Taxonomy Category Prediction (Hadoop MapReduce, HBase, Storm, Elasticsearch)
Feb 2018 – Dec 2018
• This Project was designed to classify Italian data according to the Interactive Advertising Bureau (IAB) standard
taxonomy. Data mining and scraping techniques were used to collect data from online sources. The model was trained on
20+ categories and 300+ subcategories with over 80%.
• NER, NLP, Age-Gender, and other Data science models are also applied to the incoming data.
Education
July 2011 – Apr 2015 Bachelor in computer science
AWARDS
Received Performer of the Month numerous times in Algoscale Technologies.