Warning
sermadrid
is currently under active development.
sermadrid
is an E2E Machine Learning project to search for public parking availability in Madrid SER zone.
The project results in the website sermadrid.org, which is publicly available, and allows users to search for parking availability based on their chosen date and address.
It uses Time Series models, with one model per neighbourhood, to make the predictions. These models have been trained on four years of parking ticket data (2020 to 2024).
The project consists of several components, including the frontend (Node.js), backend (FastAPI), infrastructure (Digital Ocean & AWS, defined with Terraform), CI/CD deployment pipelines (GitHub Actions), MLOps framework server for training workflows orchestration (ZenML), and experiment tracking and model registry server (MLFlow).
📂 sermadrid
├── 📂 .github - GitHub Actions CI/CD pipelines
├── 📂 backend
| ├── 📂 app - FastAPI app
| └── 📂 sermadrid - sermadrid python package
├── 📂 frontend - Node.js app
├── 📂 infrastructure - Terraform IaC resources
| ├── 📂 backend - Digital Ocean backend infrastructure
| └── 📂 zenml - AWS ZenML server and stack infrastructure
└── 📂 zenml - ZenML training workflow pipelines
-
Frontend: The frontend is developed using
Vue.js
, withMapbox
integrated to handle mapping and geospatial functionalities. It is served by annginx
web server, which is configured with SSL certificates for secure HTTPS communication, ensuring a smooth and secure user experience. -
Backend: The backend consists of a
FastAPI
application that serves the sermadrid Python package. This package is responsible for the core logic of predicting parking availability. The FastAPI app is designed to load machine learning models at startup, enabling efficient and responsive handling of user requests. -
Infrastructure: The project’s infrastructure is managed using
Terraform
and is hosted acrossDigital Ocean
andAWS
. Digital Ocean is used to host the application, including components like droplets, domain management, and firewall configurations. AWS is employed for the ZenML MLOps framework components. -
CI/CD Pipelines:
GitHub Actions
is utilized to manage CI/CD pipelines, withDocker Compose
orchestrating the environment. There are three pipelines: one for deploying the Digital Ocean infrastructure, another for deploying the sermadrid web application (frontend and backend) and the last one to deploy the Machine Learning infrastructure (ZenML server & stack and MLFlow server). -
MLOps Framework:
ZenML
is the chosen framework for orchestrating the machine learning training workflows. ZenML supports environment-agnostic execution, allowing the workflows to run both locally and on AWS. The AWS stack components are created through the Terraform code located in the infrastructure/zenml directory. The project relies on the two ZenML pipelines defined: "feature engineering" and "training", each of them containing multiple steps which process the raw data and train the Time Series models that power sermadrid. TODO: Describe the current ZenML stack components -
Experiment Tracking & Model Registry:
MLFlow
is used to handle experiment tracking and model registry. The setup of its server is created through Terraform. The experiment tracking and model registry with MLFlow is integrated into the ZenML pipelines and the direct loading of models from MLFlow is integrated in the backend at startup.
The project utilizes three primary data sources from the Madrid City Council: Regulated Parking Service (SER) Parking Tickets, SER Streets and Parking Spaces, and SER Area Map. These datasets are processed and combined to create sermadrid
, with parking ticket data aggregated for Time Series features and geospatial data enabling map overlays and address-based searches.
Dataset | Description | Source | Format | Update Frequency | Usage |
---|---|---|---|---|---|
SER Parking Tickets | Data on parking tickets issued by the Regulated Parking Service (SER) from parking meters and mobile payment applications in Madrid. This dataset provides detailed information for each ticket, including parking meter ID, operation date, reservation start and end times, neighbourhood, zone type, permit type, reserved minutes, and amount paid. | Madrid Open Data Portal | CSV | Quarterly | Training Time Series models per neighbourhood by processing & aggregating the data by time slots. |
SER Streets and Spaces | Data on the number of parking spaces within the SER zone, broken down by neighbourhood. | Madrid Open Data Portal | CSV | Quarterly | Enhancing predictions with additional parking information. |
SER Area Map | Geospatial data of the Regulated Parking Service area, including the delimitation of the SER area and its subdivision into neighbourhoods. | Madrid Geoportal | Shapefile (SHP) | Last updated: 03/06/2022 | Frontend visualization, showing a mask in non-SER Madrid areas, and identifying the neighbourhood selected by the user. |
Other technical details include the use by sermadrid
of:
- Ruff: A fast Python linter and formatter written in Rust.
- pre-commit: A tool that automates code quality checks before each commit.
- Poetry: A tool for dependency management and packaging that builds a graph of dependencies and finds compatible package versions, avoiding incompatibility issues.
These are the guidelines to deploy the frontend and backend which compose the sermadrid
web app, both locally and remotely.
-
Create
.env
file in the repository root with the following variablesBACKEND_CORS_ORIGINS=["http://localhost","http://0.0.0.0"] # Mapbox token obtained from mapbox.com MAPBOX_TOKEN={MAPBOX_TOKEN}
-
Run the following commands to spin up the Docker containers
$ docker-compose build --no-cache $ docker-compose up -d
The app will then be available in http://localhost:80.
Afterwards, to stop all the running containers, run:
$ docker stop $(docker ps -a -q)
TODO: Update local deployment with ZenML & MLflow services
ML Infrastructure
- Create the following GitHub Actions variables in the GitHub repository:
AWS_S3_REMOTE_STATE_BUCKET_NAME
: Name to give to the AWS S3 bucket used for the Terraform remote state of this infrastructure stackAWS_S3_ZENML_BUCKET_NAME
AWS_S3_MLFLOW_BUCKET_NAME
- Create the following GitHub Actions secret variables in the GitHub repository:
ZENML_USERNAME
: The username for the ZenML ServerZENML_PASSWORD
: The password for the ZenML ServerMLFLOW_USERNAME
: The username for the MLflow Tracking ServerMLFLOW_PASSWORD
: The password for the MLflow Tracking ServerGH_PAT
: GitHub Personal Access Token (obtained in GitHub > Settings > Developer Settings > Personal Access Tokens > Fine-grained tokens - Create a token with "Read access to actions and metadata repository permissions")
- Run GitHub actions
deploy-ml-infrastructure.yml
workflow to create the project's ML infrastructure. In the outputs you will obtain the URLs of boths the ZenML server UI and the MLFlow server UI - Visit the the ZenML server UI URL and follow the on-screen instructions to create an initial admin user account
TODO: Add comment about being able to create or destroy infrastructure with TF through the Github action pipelines TODO: Add comment about retrieving the MLFlow URL & S3 & ZenML from outputs & interacting
App Infrastructure
- Create a Digital Ocean account
- Create a Digital Ocean API key
- Create a Digital Ocean SSH key
- Create an AWS account
- Create an AWS access key
- Register a valid website domain via a domain registrar and point to Digital Ocean name servers from it
- Create the following GitHub Actions secret variables in the GitHub repository:
DO_API_KEY_TOKEN
: Digital Ocean API key tokenDO_SSH_KEY_ID
: Digital Ocean SSH key IDDO_SSH_PRIVATE_KEY
: Digital Ocean SSH private keyAWS_ACCESS_KEY_ID
: AWS access key IDAWS_SECRET_ACCESS_KEY
: AWS secret access key
- Create the following GitHub Actions variables in the GitHub repository:
AWS_S3_BUCKET_NAME
: Name to give to the AWS S3 bucketAWS_REGION
: Region to use for the AWS S3 bucketDOMAIN_NAME
: Registered website domain name
- Run GitHub actions
deploy-app-infrastructure.yml
workflow to create the project's app infrastructure
App
- Retrieve the Digital Ocean droplet IP from the Digital Ocean Control Panel
- Create a Mapbox account and generate an access token
- Create the following GitHub Actions secret variables in the GitHub repository:
DO_DROPLET_IP
: Digital Ocean droplet IPDO_DROPLET_USER
: Digital Ocean droplet username (default username is root)MAPBOX TOKEN
: Mapbox access tokenCERTBOT_EMAIL
: Email to be used by Certbot to obtain the SSL certificate to enable HTTPS
- Create the following GitHub Actions variables in the GitHub repository:
BACKEND_CORS_ORIGINS
: A comma-separated list of origins allowed to access the backend to configure the CORS policy (in this case just the frontend: ["http://{DO_DROPLET_IP}", "http://{DOMAIN_NAME}", "http://www.{DOMAIN_NAME}"])
- Run GitHub actions
deploy-app.yml
workflow to deploy the app
TODO: Add comment about the CICD retrieving the mlflow server from latest ML Infrastructure GitHub actions workflow
sermadrid
is under active development, with the following next steps planned:
- ZenML Pipelines Deployment: Create a GitHub Actions pipeline to register
ZenML
pipelines on the remote AWS server and automate their execution using AWS Lambda functions. - MLFlow ZenML Integration: Integrate
MLFlow
for experiment tracking and model registry within the ZenML pipelines to streamline model management. - MLFlow Backend Integration: Update the backend to load models on startup directly from the
MLFlow API
instead of the current S3 bucket, ensuring the use of the latest production models. - Model Monitoring: Introduce model monitoring with
Evidently
andGrafana
to track performance metrics over time. - Testing: Develop unit and integration tests using
pytest
for both the FastAPI backend and the ZenML training pipelines to ensure code reliability and robustness.