Focused on building scalable data pipelines, robust ETL/ELT workflows, and cloud-based data solutions in large-scale environments.
The main tools and technologies I utilize to ensure data integrity and workflow efficiency.
- ๐ Focused on Data Engineering and Big Data Systems architecture.
- ๐ก Interested in scalable pipelines, real-time streaming (Kafka), and Cloud Data Warehousing (AWS/GCP).
- ๐ Prioritizing data integrity, automated workflow orchestration, and efficient infrastructure management.
- ๐ Driven by passion for building systems that enable data-driven decision making.
ย ย ย ย
- Email: [email protected]
- GitHub: github.com/lyraa88
ย - Established a Docker-based deployment (Serving) environment for ML models and implemented CI/CD pipeline concepts.
- Emphasis on reproducibility and operational management. ย - Github Repo
ย - Designed and implemented a real-time ingestion and streaming pipeline for sensor/audio data using Kafka. ย - Built automated batch ETL jobs (using Spark/Airflow) and established an AWS S3 data lake. ย - Github Repo
ย - Developed Airflow DAGs to automate the collection, transformation, and loading of user data into a PostgreSQL data mart. ย - Focus on designing data validation and quality check processes. ย - Github Repo
ย - Developed scripts for efficient collection, integration, and normalization of fragmented marketplace data. ย - Emphasis on the data preparation stage and schema design for analysis. ย - Paper Link
