This is a solo llama connector also; being able to work independently.
pip install llama-core
python -m llama_core
Prompt to user interface selection menu above; while chosen, GGUF file(s) in the current directory will be searched and detected (if any) as below.
from llama_core import menu
from llama_core import reader
from llama_core import writer
Other functions are same as llama-cpp-python; for CUDA(GPU, Nvida) and Metal(M1/M2/M3, Apple) supported settings, please specify CMAKE_ARGS
following Abetlen's repo below; if you want to install it by source file (under releases), you should opt to do it by .tar.gz file (then build your machine-customized installable package) rather than .whl (wheel; a pre-built binary package) with an appropriate cmake tag(s).
repo llama-cpp-python llama.cpp page gguf.us
According to the latest note inside vs code, msys64 was recommended by Microsoft; or you could opt w64devkit or etc. as source/location of your gcc and g++ compilers.
$env:CMAKE_GENERATOR = "MinGW Makefiles"
$env:CMAKE_ARGS = "-DCMAKE_C_COMPILER=C:/msys64/mingw64/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/msys64/mingw64/bin/g++.exe"
pip install llama_core-(version).tar.gz
In mac, xcode command line tools were recommended by Apple for dealing all coding related issue(s); or you could bypass it for your own good/preference.
pip3 install llama_core-(version).tar.gz
example setup for metal (M1/M2/M3 - Apple) - faster
CMAKE_ARGS="-DGGML_METAL=on" pip3 install llama_core-(version).tar.gz
example setup for cuda (GPU - Nvida) - faster x2; depends on your model (how rich you are)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama_core-(version).tar.gz
make sure your gcc and g++ are >=11; you can check it by: gcc --version and g++ --version; other setting(s) include: cmake>=3.21, etc.; however, if you opt to install it by the pre-built wheel (.whl) file then you don't need to worry about that.