Hello, welcome to my data science portfolio.

Hello, welcome to my Data Science project portfolio.

On this page, I demonstrate my ability to solve business problems using Data Science concepts and tools, through projects with public datasets.

You will also find my professional experience, skills, tools, and concepts related to Data Science.

Feel free to get in touch through the links at the bottom of the page.

About Me

My name is Erick Gomes, I hold a Master's degree in Computational Physics from Universidade Federal Fluminense (UFF), with solid experience in Data Science and Machine Learning Engineering. I work as a Senior Data Scientist, with strong expertise in high-impact projects, leading MLOps initiatives, data engineering, and the development of AI solutions for business. My background includes strategic roles in the financial market, development of robust pipelines, deployment of machine learning models in production, and technical leadership of multidisciplinary teams.

I have expertise in applying advanced machine learning algorithms in both materials physics and corporate environments, as well as consolidated experience in integrating solutions with cloud platforms, process automation, and data project management. I am always looking to improve my skills and deliver high-value results for organizations.

Currently, I work as a Senior Data Scientist in the Financial Market, focusing on the development of innovative solutions based on Open Finance data.

Experience

Data Scientist | Machine Learning Engineer (MLOps) - Mobi2buy

Period: April 2024 - present

Location: São Paulo, Brazil (Remote)

Responsible for the data ingestion pipeline in the MLOps workflow using AWS (S3, Glue, Athena, EC2).
Development and refactoring of Python code for integration with AWS via SDK.
Responsible for the MLOps pipeline using MLflow, DVC, Github Actions, Evidentlyai, EC2, Airflow, Prometheus, Grafana, and Terraform.
Responsible for building, deploying, and maintaining machine learning models:
- Propensity model for the collection squad.
- Sales forecasting model.
- Churn prediction model.
Responsible for building and maintaining data engineering pipelines using AWS (Glue, Lambda, Redshift):
- Consumption of transactional database (OLTP) for analytical environment (OLAP).
- Consumption of data from API (REST and GraphQL) for analytical environment.
Development of Chatbots using Generative AI, Langchain, Vertex AI (Agent Builder):
- Using Langchain to integrate with various GenAI models.
- Using Langchain to build Retrieval Augmented Generation (RAG) systems.
- Using Langchain, CrewAI, AutoGen, and Langflow to build AI Agents.
- Building systems with multiple AI agents.
- Agent monitoring.
- Development of agents for Computer Vision.
Responsible for backlog management and leading Scrum ceremonies.

Data Scientist - Klavi

Period: August 2023 - April 2024

Location: São Paulo, Brazil (Hybrid)

Development of Churn models for clients of Financial Institutions.
Experience building dashboards using Power BI.
Process automation and ETL pipeline construction (Extraction, Transformation, and Loading) for creating tables consumed by other areas.
Extraction of structured and unstructured data from different sources.
Use of Python for building machine learning and data analysis projects.

Data Scientist - Tecnologia Bancária (TecBan)

Period: July 2022 - July 2023

Location: São Paulo, Brazil (Remote)

Data analysis and proof-of-concept development focused on financial information.
Data structuring to generate strategic business insights.
Market trend studies, especially in the context of Open Finance.
Report generation for specific business demands.
Development of predictive models for default forecasting and online payment fraud detection.
Application of data balancing techniques to improve model quality.
Evaluation of classification models using metrics such as AUC, Precision, Recall, F1-Score, and AUPRC.
API development, deployment, and model deployment.

Main Responsibilities:

Sandbox construction using regulated Open Finance data.
Execution of ETL pipeline to load simulated data into the database.
Creation of investment copilots using LLMs (Large Language Models) and open data from Open Finance.

Graduate Teaching Assistant (Machine Learning applied to Physics) - Universidade Federal Fluminense

Period: July 2023 - December 2023

As a teaching assistant for the Machine Learning applied to Physics course, I played a central role in exploring and applying the fundamental principles of machine learning to solve specific challenges in theoretical and experimental physics.

My technical contribution involved guiding students in implementing supervised and unsupervised learning algorithms, such as regression, classification, neural networks, and clustering methods, to analyze datasets from various sources. Using languages and libraries such as Python, TensorFlow, and scikit-learn, I explored data preprocessing methods, feature selection, and model optimization to extract accurate insights and reliable predictions.

I was responsible for weekly guidance in the graduate course and assisting in the completion of course activities.

Master's Researcher | Data Science Researcher - Universidade Federal Fluminense

Period: March 2022 - present

Location: Niterói, Rio de Janeiro, Brazil

Developing my research in the area of machine learning applied to Physics, I have been using best practices in data science to propose new solutions in the analysis of large databases.

Main Results:

Use of regression algorithms to build predictive models aimed at determining the formation energy of materials and, thus, establishing a set of materials with thermodynamic stability.
Use of classification algorithms to build predictive models to separate metallic and insulating materials.
Construction of regression models to predict various properties of insulating materials.

3+ years as Undergraduate Researcher | Data Science Researcher

During my undergraduate research, I developed skills in various computational tools such as shell script programming, parallel computing, and Linux systems. I was responsible for conducting studies related to materials physics through computational simulation, a field that involves producing large amounts of data to be analyzed.

2+ years developing university extension projects

During the extension project, I used embedded systems such as Arduino and ESP32 to produce physics experiments. I was also responsible for coordinating/guiding a group of undergraduate students to produce similar experiments.

2 years as Electronics Technician

During the electronics technician course, I learned about electronic systems and components, which later enabled me to develop projects in data analysis of electronic circuits and embedded systems.

Skills

Programming Languages, Databases, and Operating Systems

Programming Language: Python (Pandas, Numpy, Scikit-learn).
C and C++ programming.
SQL, MySQL, MongoDB, and PostgreSQL.
10+ years using Linux (various distributions).

AI, Machine Learning, and Statistics

Microsoft Azure Machine Learning
Descriptive Statistics (location, dispersion, skewness, kurtosis, density).
Inferential Statistics.
Regression, Classification, Clustering, Association Rules, and Sequential Patterns algorithms.
MLflow

Data Visualization

Microsoft PowerBI and Tableau.
Matplotlib, Seaborn, and Plotly.
Streamlit.

Software Engineering

Git, Github, Linux.
API development with Flask, Postman API.

Cloud, MLOps, and Data Engineering

AWS (S3, Glue, Athena, EC2, Lambda, Redshift)
MLflow, DVC, Github Actions, Evidentlyai
Airflow, Prometheus, Grafana, Terraform
Python integration with AWS via SDK
Building and maintaining data and MLOps pipelines

Generative AI and Chatbots

Langchain, Vertex AI (Agent Builder), CrewAI, AutoGen, Langflow
Building RAG (Retrieval Augmented Generation) systems
Development and monitoring of multiple AI agents
Agents for Computer Vision

Data Science Projects

Rossmann - Sales Forecasting

Objective:

Build a model to forecast sales for the Rossmann pharmacy chain.

Tools Used:

Operating System: Linux.
IDE: Jupyter Notebook
Programming Language: Python.
Frameworks and Libraries: Scikit-learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Flask.
Algorithms Used: Linear Regressor, LASSO, Random Forest, and XGBoost Regressor.
Code Versioning: Git/Github.
Deployment: Render Cloud.

See on GitHub

Default Prediction and Credit Scoring

Objective:

Build a model to predict credit card default.

Tools Used:

Operating System: Linux.
IDE: Jupyter Notebook
Programming Language: Python.
Frameworks and Libraries: Scikit-learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Imbalanced-learn.
Algorithms Used: KNN, Decision Tree Classifier, Random Forest Classifier, Logistic Regression, and XGBoost Classifier.
Balancing Techniques: Oversampling - SMOTE and ADASYN | Undersampling - Random Undersampling (RUS).
Evaluation Metrics: Precision, Recall, F1-Score, Area Under Curve (AUC), and Area Under Precision-Recall Curve (AUPRC).
Code Versioning: Git/Github.
WebApp: Streamlit.
Deployment: Render Cloud.

See on GitHub

1st Place Innovative Article - IEL Talent Award

Open Finance and Artificial Intelligence: The Union Between Finance and Technology

Objective:

This article explores the synergy between Open Finance and Artificial Intelligence (AI) and its impact on the financial industry. We analyze the concept of open finance, which encompasses the opening of financial data and services through APIs (application programming interfaces), and the application of AI in this context. We discuss the benefits of open finance, such as greater financial inclusion and innovation, and highlight how AI can be used to enhance data analysis and the personalization of financial services. We conclude that the combination of open finance and AI has the potential to transform the way we relate to finance, providing a more efficient, convenient, and personalized experience.

Read Article

Contact

Feel free to get in touch.