guptam

Manish Gupta guptam

Engineer, Architect, Data-Geek, Bookworm, Multi-tasking, Serial Coder, Parallel Thinker, Naturally Intelligent in Artificial Intelligence.

19 followers · 27 following

Tiger Analytics
India
https://twitter.com/manishpedia

Lists (1)

Sort

AI

3 repositories

Starred repositories

iterative / datachain

AI-data warehouse to enrich, transform and analyze unstructured data

Python 1,988 90 Updated Nov 22, 2024

MrPowers / chispa

PySpark test helper methods with beautiful error messages

Python 621 68 Updated Oct 24, 2024

apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,106 915 Updated Nov 22, 2024

Multiwoven / multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation

Ruby 1,548 65 Updated Nov 22, 2024

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 18,088 1,298 Updated Nov 22, 2024

cleanlab / cleanlab

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 9,778 751 Updated Oct 23, 2024

google / langfun

OO for LLMs

Python 486 36 Updated Nov 22, 2024

mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 18,933 1,453 Updated Nov 21, 2024

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python 3,269 250 Updated Aug 10, 2024

mehdihadeli / awesome-software-architecture

🚀 A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.

8,451 666 Updated Oct 29, 2024

rafaelpierre / pyjaws

PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows

Python 41 3 Updated Jul 10, 2024

Nike-Inc / koheesio

Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.

Python 602 23 Updated Nov 22, 2024

databrickslabs / ucx

Automated migrations to Unity Catalog

Python 237 83 Updated Nov 22, 2024

truefoundry / cognita

RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

Python 3,337 275 Updated Nov 18, 2024

danielmiessler / fabric

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.

Go 25,506 2,707 Updated Nov 21, 2024

unitycatalog / unitycatalog

Open, Multi-modal Catalog for Data & AI

Java 2,438 393 Updated Nov 22, 2024

ollama / ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Go 98,874 7,868 Updated Nov 22, 2024

iyaja / llama-fs

A self-organizing file system with llama 3

Jupyter Notebook 4,964 314 Updated Oct 24, 2024

uptrain-ai / uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform…

Python 2,204 191 Updated Aug 18, 2024

Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 9,211 764 Updated Nov 22, 2024

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 8,403 1,416 Updated Nov 21, 2024

pymupdf / PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Python 5,711 529 Updated Nov 21, 2024

run-llama / rags

Build ChatGPT over your data, all with natural language

Python 6,312 648 Updated Apr 5, 2024

eosphoros-ai / DB-GPT

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 13,794 1,857 Updated Nov 22, 2024

Desbordante / desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these al…

C++ 388 72 Updated Nov 20, 2024