-
Tiger Analytics
- India
- https://twitter.com/manishpedia
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
AI-data warehouse to enrich, transform and analyze unstructured data
PySpark test helper methods with beautiful error messages
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
🚀 A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
Open, Multi-modal Catalog for Data & AI
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
A self-organizing file system with llama 3
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform…
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Build ChatGPT over your data, all with natural language
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these al…
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
We write your reusable computer vision tools. 💜
#1 Locally hosted web application that allows you to perform various operations on PDF files