🚘 sermadrid

E2E project to search for public parking availability in Madrid SER zone

Warning

sermadrid is currently under active development.

sermadrid is an E2E Machine Learning project to search for public parking availability in Madrid SER zone.

The project results in the website sermadrid.org, which is publicly available, and allows users to search for parking availability based on their chosen date and address.

It uses Time Series models, with one model per neighbourhood, to make the predictions. These models have been trained on four years of parking ticket data (2020 to 2024).

The project consists of several components, including the frontend (Node.js), backend (FastAPI), infrastructure (Digital Ocean & AWS, defined with Terraform), CI/CD deployment pipelines (GitHub Actions), MLOps framework server for training workflows orchestration (ZenML), and experiment tracking and model registry server (MLFlow).

📁 Project Structure

📂 sermadrid
├── 📂 .github              - GitHub Actions CI/CD pipelines
├── 📂 backend
|   ├── 📂 app              - FastAPI app 
|   └── 📂 sermadrid        - sermadrid python package
├── 📂 frontend             - Node.js app
├── 📂 infrastructure       - Terraform IaC resources
|   ├── 📂 backend          - Digital Ocean backend infrastructure
|   └── 📂 zenml            - AWS ZenML server and stack infrastructure
└── 📂 zenml                - ZenML training workflow pipelines

🧩 Project Components

Frontend: The frontend is developed using Vue.js, with Mapbox integrated to handle mapping and geospatial functionalities. It is served by an nginx web server, which is configured with SSL certificates for secure HTTPS communication, ensuring a smooth and secure user experience.
Backend: The backend consists of a FastAPI application that serves the sermadrid Python package. This package is responsible for the core logic of predicting parking availability. The FastAPI app is designed to load machine learning models at startup, enabling efficient and responsive handling of user requests.
Infrastructure: The project’s infrastructure is managed using Terraform and is hosted across Digital Ocean and AWS. Digital Ocean is used to host the application, including components like droplets, domain management, and firewall configurations. AWS is employed for the ZenML MLOps framework components.
CI/CD Pipelines: GitHub Actions is utilized to manage CI/CD pipelines, with Docker Compose orchestrating the environment. There are three pipelines: one for deploying the Digital Ocean infrastructure, another for deploying the sermadrid web application (frontend and backend) and the last one to deploy the Machine Learning infrastructure (ZenML server & stack and MLFlow server).
MLOps Framework: ZenML is the chosen framework for orchestrating the machine learning training workflows. ZenML supports environment-agnostic execution, allowing the workflows to run both locally and on AWS. The AWS stack components are created through the Terraform code located in the infrastructure/zenml directory. The project relies on the two ZenML pipelines defined: "feature engineering" and "training", each of them containing multiple steps which process the raw data and train the Time Series models that power sermadrid. TODO: Describe the current ZenML stack components
Experiment Tracking & Model Registry: MLFlow is used to handle experiment tracking and model registry. The setup of its server is created through Terraform. The experiment tracking and model registry with MLFlow is integrated into the ZenML pipelines and the direct loading of models from MLFlow is integrated in the backend at startup.

📥 Input Data

The project utilizes three primary data sources from the Madrid City Council: Regulated Parking Service (SER) Parking Tickets, SER Streets and Parking Spaces, and SER Area Map. These datasets are processed and combined to create sermadrid, with parking ticket data aggregated for Time Series features and geospatial data enabling map overlays and address-based searches.

Dataset Details

Dataset	Description	Source	Format	Update Frequency	Usage
SER Parking Tickets	Data on parking tickets issued by the Regulated Parking Service (SER) from parking meters and mobile payment applications in Madrid. This dataset provides detailed information for each ticket, including parking meter ID, operation date, reservation start and end times, neighbourhood, zone type, permit type, reserved minutes, and amount paid.	Madrid Open Data Portal	CSV	Quarterly	Training Time Series models per neighbourhood by processing & aggregating the data by time slots.
SER Streets and Spaces	Data on the number of parking spaces within the SER zone, broken down by neighbourhood.	Madrid Open Data Portal	CSV	Quarterly	Enhancing predictions with additional parking information.
SER Area Map	Geospatial data of the Regulated Parking Service area, including the delimitation of the SER area and its subdivision into neighbourhoods.	Madrid Geoportal	Shapefile (SHP)	Last updated: 03/06/2022	Frontend visualization, showing a mask in non-SER Madrid areas, and identifying the neighbourhood selected by the user.

🔧 Technical Details

Other technical details include the use by sermadrid of:

Ruff: A fast Python linter and formatter written in Rust.
pre-commit: A tool that automates code quality checks before each commit.
Poetry: A tool for dependency management and packaging that builds a graph of dependencies and finds compatible package versions, avoiding incompatibility issues.

⚙️ Deployment

These are the guidelines to deploy the frontend and backend which compose the sermadrid web app, both locally and remotely.

Local deployment

Create .env file in the repository root with the following variables

BACKEND_CORS_ORIGINS=["http://localhost","http://0.0.0.0"]
# Mapbox token obtained from mapbox.com
MAPBOX_TOKEN={MAPBOX_TOKEN}

Run the following commands to spin up the Docker containers
```
$ docker-compose build --no-cache
$ docker-compose up -d
```

The app will then be available in http://localhost:80.

Afterwards, to stop all the running containers, run:

$ docker stop $(docker ps -a -q)

TODO: Update local deployment with ZenML & MLflow services

Remote deployment

ML Infrastructure

Create the following GitHub Actions variables in the GitHub repository:
- AWS_S3_REMOTE_STATE_BUCKET_NAME: Name to give to the AWS S3 bucket used for the Terraform remote state of this infrastructure stack
- AWS_S3_ZENML_BUCKET_NAME
- AWS_S3_MLFLOW_BUCKET_NAME
Create the following GitHub Actions secret variables in the GitHub repository:
- ZENML_USERNAME: The username for the ZenML Server
- ZENML_PASSWORD: The password for the ZenML Server
- MLFLOW_USERNAME: The username for the MLflow Tracking Server
- MLFLOW_PASSWORD: The password for the MLflow Tracking Server
- GH_PAT: GitHub Personal Access Token (obtained in GitHub > Settings > Developer Settings > Personal Access Tokens > Fine-grained tokens - Create a token with "Read access to actions and metadata repository permissions")
Run GitHub actions deploy-ml-infrastructure.yml workflow to create the project's ML infrastructure. In the outputs you will obtain the URLs of boths the ZenML server UI and the MLFlow server UI
Visit the the ZenML server UI URL and follow the on-screen instructions to create an initial admin user account

TODO: Add comment about being able to create or destroy infrastructure with TF through the Github action pipelines TODO: Add comment about retrieving the MLFlow URL & S3 & ZenML from outputs & interacting

App Infrastructure

Create a Digital Ocean account
Create a Digital Ocean API key
Create a Digital Ocean SSH key
Create an AWS account
Create an AWS access key
Register a valid website domain via a domain registrar and point to Digital Ocean name servers from it
Create the following GitHub Actions secret variables in the GitHub repository:
- DO_API_KEY_TOKEN: Digital Ocean API key token
- DO_SSH_KEY_ID: Digital Ocean SSH key ID
- DO_SSH_PRIVATE_KEY: Digital Ocean SSH private key
- AWS_ACCESS_KEY_ID: AWS access key ID
- AWS_SECRET_ACCESS_KEY: AWS secret access key
Create the following GitHub Actions variables in the GitHub repository:
- AWS_S3_BUCKET_NAME: Name to give to the AWS S3 bucket
- AWS_REGION: Region to use for the AWS S3 bucket
- DOMAIN_NAME: Registered website domain name
Run GitHub actions deploy-app-infrastructure.yml workflow to create the project's app infrastructure

App

Retrieve the Digital Ocean droplet IP from the Digital Ocean Control Panel
Create a Mapbox account and generate an access token
Create the following GitHub Actions secret variables in the GitHub repository:
- DO_DROPLET_IP: Digital Ocean droplet IP
- DO_DROPLET_USER: Digital Ocean droplet username (default username is root)
- MAPBOX TOKEN: Mapbox access token
- CERTBOT_EMAIL: Email to be used by Certbot to obtain the SSL certificate to enable HTTPS
Create the following GitHub Actions variables in the GitHub repository:
- BACKEND_CORS_ORIGINS: A comma-separated list of origins allowed to access the backend to configure the CORS policy (in this case just the frontend: ["http://{DO_DROPLET_IP}", "http://{DOMAIN_NAME}", "http://www.{DOMAIN_NAME}"])
Run GitHub actions deploy-app.yml workflow to deploy the app

TODO: Add comment about the CICD retrieving the mlflow server from latest ML Infrastructure GitHub actions workflow

➡️ Next Steps

sermadrid is under active development, with the following next steps planned:

ZenML Pipelines Deployment: Create a GitHub Actions pipeline to register ZenML pipelines on the remote AWS server and automate their execution using AWS Lambda functions.
MLFlow ZenML Integration: Integrate MLFlow for experiment tracking and model registry within the ZenML pipelines to streamline model management.
MLFlow Backend Integration: Update the backend to load models on startup directly from the MLFlow API instead of the current S3 bucket, ensuring the use of the latest production models.
Model Monitoring: Introduce model monitoring with Evidently and Grafana to track performance metrics over time.
Testing: Develop unit and integration tests using pytest for both the FastAPI backend and the ZenML training pipelines to ensure code reliability and robustness.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
frontend		frontend
infrastructure		infrastructure
sermadrid		sermadrid
zenml		zenml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚘 sermadrid

📁 Project Structure

🧩 Project Components

📥 Input Data

Dataset Details

🔧 Technical Details

⚙️ Deployment

Local deployment

Remote deployment

➡️ Next Steps

About

Releases

Packages

Languages

License

begoechavarren/sermadrid

Folders and files

Latest commit

History

Repository files navigation

🚘 sermadrid

📁 Project Structure

🧩 Project Components

📥 Input Data

Dataset Details

🔧 Technical Details

⚙️ Deployment

Local deployment

Remote deployment

➡️ Next Steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages