Seriously useful local non-cloud AI tool

There are plenty of centralized cloud connected AI programs like Copilot and Grok. Are people achieving seriously good results with truly Open Source AI that runs locally?

The goal is continued learning in programming.

I’m presuming that OLlama is the platform on which to host the model on Linux. Maybe using Distrobox or Toolbx.

The host machine has 384GB RAM, AMD 5700 XT GPU and 32 cores on an Epyc ROME server platform (PCIe 4.0, DDR4). So can handle a reasonably beefy model.

Thanks!

I think ollama is the best tool to run the llm or tokenizer model. Its as simple as ollama pull <model> and ollama run <model> once you have it installed.

AMD 5700 XT GPU - 8GB Ram?

This is going to be quite limiting, even though you have plenty of CPU and system RAM. I think you might be able to run deepseek-r1 or qwen3 models at 8b parameters. I have a total of 40GB of GPU memory, and was able to run a 70b deepseek - it gave pretty good answers and noticeably better than the smaller versions but was slow. In practice I would consider running a small model for speed, and then replaying through the bigger model whilst I make some tea, to get a more polished answer. You could also play with smaller models on your GPU, but outsource your llm to an online service when you need better AI.

I have tried some OS stuff like this: Chat With Your Codebase: Build a Local LLM CLI Powered by Ollama + ChromaDB | by Rafał Kędziorski | Apr, 2025 | Medium

This introduces chromadb for the indexing, certainly not the only choice for OS vector database, but seems reasonably easy to install and get started with.

The resulting system from the above article was pretty useless though, and not just for Elm. The main issue seems to be that the indexing of the code is just chunking it like a text file with no context sensitivity.

Next up, I looked at: 🔍 Ask Code Anything: GitHub Repo RAG MCP Server — Your AI-Powered Dev Assistant | by Pratiksworking | May, 2025 | Medium

This seems a more promising approach as it chunks the code into functions/classes/whatever so the indexing is already going to be better due to that. But no parser for Elm built in.

@jxxcarlson Has recently being hacking on the Elm compiler to output the AST in the format used by this repo rag tool, very much work in progress: elm-compiler/README.md at master · jxxcarlson/elm-compiler · GitHub

I am also interested in getting a decent open source AI search on a codebase running locally. For Elm it would be nice to experiment with. For my work, I have huge amounts of Java, typescript, python that I need to find my way around quickly, but also I cannot put any of this code online, the repos, the AI, the index, because it is a proprietary codebase and I signed off on company regulations around shadow IT and an NDA.

So far I have not found an open source system that I can just install and get working easily, just pieces I can try and work with. I have not really gotten any value out of it yet either, things I have tried so far have not worked well.

2 Likes

Yeah, GPU is 8GB RAM. I figured it would be a bottleneck. The CPU is from 2019 and maxes out at 3.35GHz. It is a machine built for running a lot of threads, Virtual Machines and Containers. An AI could stretch out on it but from what you’ve indicated the GPU is the heart of the compute.

It seems as if I should just find the best online generative AI service and use that for a while. I need something I can ask a lot of questions quickly. A personal trainer rather than a code monkey in the editor. So probably not a Github Copilot.

Too bad about the need for significant compute for local AI. Thanks for your great response it was very helpful.

You certainly can run AI in 8GB, and it may have quite high token throughput which will make it pleasant to use. You just miss out a bit on quality of the answers.

Yeah, thanks there is plenty here to consider. Great points and links :folded_hands:

To try models locally, LM Stutio is the most convenient way IMHO, thanks to its user-friendly GUI.

I run locally Qwen2.5 Coder for autocompletion with great results on my potato laptop (4GB iGPU) with VSCode llama.cpp extension.

After reading about your linked vscode llama.cpp extension, I learned that if you are using ollama, you can link that up with the combination of ollama + vscode + continue.dev vscode plugin.

I have low expectations but seems easy enough to try since I already have vscode + ollama installed. I suspect that it might generate stuff using the current file as context, but search indexing of the whole project will be missing, lets see…