Skip to content
View guptam's full-sized avatar

Block or report guptam

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

AI-data warehouse to enrich, transform and analyze unstructured data

Python 1,988 90 Updated Nov 22, 2024

PySpark test helper methods with beautiful error messages

Python 621 68 Updated Oct 24, 2024

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,106 915 Updated Nov 22, 2024

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation

Ruby 1,548 65 Updated Nov 22, 2024

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 18,088 1,298 Updated Nov 22, 2024

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 9,778 751 Updated Oct 23, 2024

OO for LLMs

Python 486 36 Updated Nov 22, 2024

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 18,933 1,453 Updated Nov 21, 2024

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 3,269 250 Updated Aug 10, 2024

🚀 A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.

8,451 666 Updated Oct 29, 2024

PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

Python 41 3 Updated Jul 10, 2024

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python 602 23 Updated Nov 22, 2024

Automated migrations to Unity Catalog

Python 237 83 Updated Nov 22, 2024

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

Python 3,337 275 Updated Nov 18, 2024

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

Go 25,506 2,707 Updated Nov 21, 2024

Open, Multi-modal Catalog for Data & AI

Java 2,438 393 Updated Nov 22, 2024

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Go 98,874 7,868 Updated Nov 22, 2024

A self-organizing file system with llama 3

Jupyter Notebook 4,964 314 Updated Oct 24, 2024

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform…

Python 2,204 191 Updated Aug 18, 2024

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 9,211 764 Updated Nov 22, 2024

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 8,403 1,416 Updated Nov 21, 2024

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 5,711 529 Updated Nov 21, 2024

Build ChatGPT over your data, all with natural language

Python 6,312 648 Updated Apr 5, 2024

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 13,794 1,857 Updated Nov 22, 2024

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these al…

C++ 388 72 Updated Nov 20, 2024

Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.

Java 529 161 Updated Nov 1, 2024

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Python 3,995 378 Updated Nov 22, 2024

We write your reusable computer vision tools. 💜

Python 24,258 1,802 Updated Nov 21, 2024

#1 Locally hosted web application that allows you to perform various operations on PDF files

Java 46,483 3,792 Updated Nov 22, 2024

whiteboard / infinite canvas SDK

TypeScript 35,921 2,210 Updated Nov 22, 2024
Next