Skip to content

E2E project to search for public parking availability in Madrid SER zone

License

Notifications You must be signed in to change notification settings

begoechavarren/sermadrid

Repository files navigation

🚘 sermadrid

E2E project to search for public parking availability in Madrid SER zone

sermadrid.org

Warning

sermadrid is currently under active development.

sermadrid is an E2E Machine Learning project to search for public parking availability in Madrid SER zone.

The project results in the website sermadrid.org, which is publicly available, and allows users to search for parking availability based on their chosen date and address.

It uses Time Series models, with one model per neighbourhood, to make the predictions. These models have been trained on four years of parking ticket data (2020 to 2024).

The project consists of several components, including the frontend (Node.js), backend (FastAPI), infrastructure (Digital Ocean & AWS, defined with Terraform), CI/CD deployment pipelines (GitHub Actions), MLOps framework server for training workflows orchestration (ZenML), and experiment tracking and model registry server (MLFlow).

sernadrid_screenshot

📁 Project Structure

📂 sermadrid
├── 📂 .github              - GitHub Actions CI/CD pipelines
├── 📂 backend
|   ├── 📂 app              - FastAPI app 
|   └── 📂 sermadrid        - sermadrid python package
├── 📂 frontend             - Node.js app
├── 📂 infrastructure       - Terraform IaC resources
|   ├── 📂 backend          - Digital Ocean backend infrastructure
|   └── 📂 zenml            - AWS ZenML server and stack infrastructure
└── 📂 zenml                - ZenML training workflow pipelines 

🧩 Project Components

  • Frontend: The frontend is developed using Vue.js, with Mapbox integrated to handle mapping and geospatial functionalities. It is served by an nginx web server, which is configured with SSL certificates for secure HTTPS communication, ensuring a smooth and secure user experience.

  • Backend: The backend consists of a FastAPI application that serves the sermadrid Python package. This package is responsible for the core logic of predicting parking availability. The FastAPI app is designed to load machine learning models at startup, enabling efficient and responsive handling of user requests.

  • Infrastructure: The project’s infrastructure is managed using Terraform and is hosted across Digital Ocean and AWS. Digital Ocean is used to host the application, including components like droplets, domain management, and firewall configurations. AWS is employed for the ZenML MLOps framework components.

  • CI/CD Pipelines: GitHub Actions is utilized to manage CI/CD pipelines, with Docker Compose orchestrating the environment. There are three pipelines: one for deploying the Digital Ocean infrastructure, another for deploying the sermadrid web application (frontend and backend) and the last one to deploy the Machine Learning infrastructure (ZenML server & stack and MLFlow server).

  • MLOps Framework: ZenML is the chosen framework for orchestrating the machine learning training workflows. ZenML supports environment-agnostic execution, allowing the workflows to run both locally and on AWS. The AWS stack components are created through the Terraform code located in the infrastructure/zenml directory. The project relies on the two ZenML pipelines defined: "feature engineering" and "training", each of them containing multiple steps which process the raw data and train the Time Series models that power sermadrid. TODO: Describe the current ZenML stack components

  • Experiment Tracking & Model Registry: MLFlow is used to handle experiment tracking and model registry. The setup of its server is created through Terraform. The experiment tracking and model registry with MLFlow is integrated into the ZenML pipelines and the direct loading of models from MLFlow is integrated in the backend at startup.

📥 Input Data

The project utilizes three primary data sources from the Madrid City Council: Regulated Parking Service (SER) Parking Tickets, SER Streets and Parking Spaces, and SER Area Map. These datasets are processed and combined to create sermadrid, with parking ticket data aggregated for Time Series features and geospatial data enabling map overlays and address-based searches.

Dataset Details

Dataset Description Source Format Update Frequency Usage
SER Parking Tickets Data on parking tickets issued by the Regulated Parking Service (SER) from parking meters and mobile payment applications in Madrid. This dataset provides detailed information for each ticket, including parking meter ID, operation date, reservation start and end times, neighbourhood, zone type, permit type, reserved minutes, and amount paid. Madrid Open Data Portal CSV Quarterly Training Time Series models per neighbourhood by processing & aggregating the data by time slots.
SER Streets and Spaces Data on the number of parking spaces within the SER zone, broken down by neighbourhood. Madrid Open Data Portal CSV Quarterly Enhancing predictions with additional parking information.
SER Area Map Geospatial data of the Regulated Parking Service area, including the delimitation of the SER area and its subdivision into neighbourhoods. Madrid Geoportal Shapefile (SHP) Last updated: 03/06/2022 Frontend visualization, showing a mask in non-SER Madrid areas, and identifying the neighbourhood selected by the user.

🔧 Technical Details

Other technical details include the use by sermadrid of:

  • Ruff: A fast Python linter and formatter written in Rust.
  • pre-commit: A tool that automates code quality checks before each commit.
  • Poetry: A tool for dependency management and packaging that builds a graph of dependencies and finds compatible package versions, avoiding incompatibility issues.

⚙️ Deployment

These are the guidelines to deploy the frontend and backend which compose the sermadrid web app, both locally and remotely.

Local deployment

  1. Create .env file in the repository root with the following variables

    BACKEND_CORS_ORIGINS=["http://localhost","http://0.0.0.0"]
    # Mapbox token obtained from mapbox.com
    MAPBOX_TOKEN={MAPBOX_TOKEN}
    
  2. Run the following commands to spin up the Docker containers

    $ docker-compose build --no-cache
    $ docker-compose up -d
    

The app will then be available in http://localhost:80.

Afterwards, to stop all the running containers, run:

$ docker stop $(docker ps -a -q) 

TODO: Update local deployment with ZenML & MLflow services

Remote deployment

ML Infrastructure

  1. Create the following GitHub Actions variables in the GitHub repository:
    • AWS_S3_REMOTE_STATE_BUCKET_NAME: Name to give to the AWS S3 bucket used for the Terraform remote state of this infrastructure stack
    • AWS_S3_ZENML_BUCKET_NAME
    • AWS_S3_MLFLOW_BUCKET_NAME
  2. Create the following GitHub Actions secret variables in the GitHub repository:
    • ZENML_USERNAME: The username for the ZenML Server
    • ZENML_PASSWORD: The password for the ZenML Server
    • MLFLOW_USERNAME: The username for the MLflow Tracking Server
    • MLFLOW_PASSWORD: The password for the MLflow Tracking Server
    • GH_PAT: GitHub Personal Access Token (obtained in GitHub > Settings > Developer Settings > Personal Access Tokens > Fine-grained tokens - Create a token with "Read access to actions and metadata repository permissions")
  3. Run GitHub actions deploy-ml-infrastructure.yml workflow to create the project's ML infrastructure. In the outputs you will obtain the URLs of boths the ZenML server UI and the MLFlow server UI
  4. Visit the the ZenML server UI URL and follow the on-screen instructions to create an initial admin user account

TODO: Add comment about being able to create or destroy infrastructure with TF through the Github action pipelines TODO: Add comment about retrieving the MLFlow URL & S3 & ZenML from outputs & interacting

App Infrastructure

  1. Create a Digital Ocean account
  2. Create a Digital Ocean API key
  3. Create a Digital Ocean SSH key
  4. Create an AWS account
  5. Create an AWS access key
  6. Register a valid website domain via a domain registrar and point to Digital Ocean name servers from it
  7. Create the following GitHub Actions secret variables in the GitHub repository:
    • DO_API_KEY_TOKEN: Digital Ocean API key token
    • DO_SSH_KEY_ID: Digital Ocean SSH key ID
    • DO_SSH_PRIVATE_KEY: Digital Ocean SSH private key
    • AWS_ACCESS_KEY_ID: AWS access key ID
    • AWS_SECRET_ACCESS_KEY: AWS secret access key
  8. Create the following GitHub Actions variables in the GitHub repository:
    • AWS_S3_BUCKET_NAME: Name to give to the AWS S3 bucket
    • AWS_REGION: Region to use for the AWS S3 bucket
    • DOMAIN_NAME: Registered website domain name
  9. Run GitHub actions deploy-app-infrastructure.yml workflow to create the project's app infrastructure

App

  1. Retrieve the Digital Ocean droplet IP from the Digital Ocean Control Panel
  2. Create a Mapbox account and generate an access token
  3. Create the following GitHub Actions secret variables in the GitHub repository:
    • DO_DROPLET_IP: Digital Ocean droplet IP
    • DO_DROPLET_USER: Digital Ocean droplet username (default username is root)
    • MAPBOX TOKEN: Mapbox access token
    • CERTBOT_EMAIL: Email to be used by Certbot to obtain the SSL certificate to enable HTTPS
  4. Create the following GitHub Actions variables in the GitHub repository:
  5. Run GitHub actions deploy-app.yml workflow to deploy the app

TODO: Add comment about the CICD retrieving the mlflow server from latest ML Infrastructure GitHub actions workflow

➡️ Next Steps

sermadrid is under active development, with the following next steps planned:

  • ZenML Pipelines Deployment: Create a GitHub Actions pipeline to register ZenML pipelines on the remote AWS server and automate their execution using AWS Lambda functions.
  • MLFlow ZenML Integration: Integrate MLFlow for experiment tracking and model registry within the ZenML pipelines to streamline model management.
  • MLFlow Backend Integration: Update the backend to load models on startup directly from the MLFlow API instead of the current S3 bucket, ensuring the use of the latest production models.
  • Model Monitoring: Introduce model monitoring with Evidently and Grafana to track performance metrics over time.
  • Testing: Develop unit and integration tests using pytest for both the FastAPI backend and the ZenML training pipelines to ensure code reliability and robustness.

About

E2E project to search for public parking availability in Madrid SER zone

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published